From kvn at openjdk.org Thu May 1 00:07:48 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 May 2025 00:07:48 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v12] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 22:54:38 GMT, Vladimir Ivanov wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> address Ioi's comments > > src/hotspot/share/code/aotCodeCache.cpp line 69: > >> 67: vm_abort(false); >> 68: } >> 69: log_info(aot, codecache, exit)("Unable to create AOT Code Cache."); > > Same here (`log_warning`?). I will remove `exit_vm_on_` from these 2 methods to avoid confusion. And will add comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2069684684 From kvn at openjdk.org Thu May 1 00:32:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 May 2025 00:32:51 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v12] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 02:05:41 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > address Ioi's comments > How does _cache notice a store failure during dumping phase? `set_failed()` sets flags AOTCodeCache::_failed. Which is checked when we trying to get _cache pointer: ------------- PR Comment: https://git.openjdk.org/jdk/pull/24740#issuecomment-2843795345 From kvn at openjdk.org Thu May 1 00:04:48 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 May 2025 00:04:48 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v12] In-Reply-To: References: Message-ID: <1NHR78vH371vWOdTep3F2l0sql9M-qJ_TIMUyAAImS4=.2ece512b-51f9-47ca-aae7-d600806f53cc@github.com> On Wed, 30 Apr 2025 22:54:07 GMT, Vladimir Ivanov wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> address Ioi's comments > > src/hotspot/share/code/aotCodeCache.cpp line 60: > >> 58: vm_exit_during_initialization("Unable to use AOT Code Cache.", nullptr); >> 59: } >> 60: log_info(aot, codecache, init)("Unable to use AOT Code Cache."); > > Should it be a warning instead? changed > src/hotspot/share/code/codeBlob.hpp line 208: > >> 206: CodeBlob* as_codeblob() const { return (CodeBlob*) this; } >> 207: AdapterBlob* as_adapter_blob() const { assert(is_adapter_blob(), "must be adapter blob"); return (AdapterBlob*) this; } >> 208: ExceptionBlob* as_exception_blob() const { assert(is_exception_stub(), "must be exception stub"); return (ExceptionBlob*) this; } > > `ExceptionBlob` is C2-specific, but `as_exception_blob()` is unused. removed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2069680668 PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2069680012 From kvn at openjdk.org Thu May 1 00:26:47 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 May 2025 00:26:47 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v12] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 22:59:03 GMT, Vladimir Ivanov wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> address Ioi's comments > > src/hotspot/share/runtime/sharedRuntime.cpp line 2852: > >> 2850: entry_offset[2] = handler->get_c2i_unverified_entry() - i2c_entry; >> 2851: entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; >> 2852: AOTCodeCache::store_code_blob(*adapter_blob, AOTCodeEntry::Adapter, id, name, AdapterHandlerEntry::ENTRIES_COUNT, entry_offset); > > What the intended behavior here when `AOTCodeCache::store_code_blob` fails? If something happened when we tried to store blob (adapter in this case) into buffer (for example, if reserved for AOT code buffer is too small) the next will be called: set_failed(); exit_vm_on_store_failure(); So we either abort VM based on the flag or issue warning (as you suggested) and continue execution. `set_failed()` will prevent following attempts to cache blobs and prevent final dump any cached code into AOT cache: bool for_dump() const { return _for_dump && !_failed; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2069703345 From kvn at openjdk.org Thu May 1 01:08:40 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 May 2025 01:08:40 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v13] In-Reply-To: References: Message-ID: > [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. > > We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. > > Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): > > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) > > > New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): > > > -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters > -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching > -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure > > By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. > This flag is ignored when `AOTCache` is not specified. > > To use AOT adapters follow process described in JEP 483: > > > java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App > java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar > java -XX:AOTCache=app.aot -cp app.jar App > > > There are several new UL flag combinations to trace the AOT code caching process: > > > -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs > > > @ashu-mehra is main author of changes. He implemented adapters caching. > I did main framework (`AOTCodeCache` class) for saving and loading AOT code. > > Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Address Vladimir's comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24740/files - new: https://git.openjdk.org/jdk/pull/24740/files/78f2828d..70bd0294 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=11-12 Stats: 38 lines in 6 files changed: 0 ins; 19 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/24740.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24740/head:pull/24740 PR: https://git.openjdk.org/jdk/pull/24740 From kvn at openjdk.org Thu May 1 00:55:56 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 May 2025 00:55:56 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v7] In-Reply-To: References: Message-ID: On Thu, 24 Apr 2025 01:53:55 GMT, Vladimir Ivanov wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix message > > Finished the first pass over the code. > > > Overall, looks good. Some feedback follows. Thank you, @iwanowww, for review. I think I addressed your comments and answered your question. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24740#issuecomment-2843828077 From vlivanov at openjdk.org Thu May 1 01:56:48 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 1 May 2025 01:56:48 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v13] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 01:08:40 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Address Vladimir's comments Looks good! Minor suggestions follow. src/hotspot/share/code/aotCodeCache.cpp line 56: > 54: #include > 55: > 56: static void load_failure() { Maybe `on_(load|store)_failure()` or `report_(load|store)_failure()`? ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24740#pullrequestreview-2809000158 PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2069761472 From fyang at openjdk.org Thu May 1 00:59:49 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 1 May 2025 00:59:49 GMT Subject: RFR: 8355980: RISC-V: remove vmclr_m before vmsXX and vmfXX In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 14:46:23 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple improvement? > > By rvv spec, > "integer compare instructions write 1 to the destination mask register element if the comparison evaluates to true, and 0 otherwise." > "These vector FP compare instructions compare two source operands and write the comparison result to a mask register. " > > So, it's not always necessary to clear the mask register before vector comparison operation, e.g. when `vm != Assembler::v0_t`. > > Thanks! Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24968#pullrequestreview-2808915411 From kvn at openjdk.org Thu May 1 02:24:48 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 May 2025 02:24:48 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v13] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 01:53:20 GMT, Vladimir Ivanov wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Address Vladimir's comments > > src/hotspot/share/code/aotCodeCache.cpp line 56: > >> 54: #include >> 55: >> 56: static void load_failure() { > > Maybe `on_(load|store)_failure()` or `report_(load|store)_failure()`? I will use `report_` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2069775755 From vlivanov at openjdk.org Thu May 1 01:56:50 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 1 May 2025 01:56:50 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v12] In-Reply-To: References: Message-ID: <6QqEVIYBum6lJLSUHs_yZ-Zbk1vEDXqqQBYM6usjoJ8=.295c3db4-c458-42c9-a5ea-adc585a645d0@github.com> On Thu, 1 May 2025 00:24:07 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/runtime/sharedRuntime.cpp line 2852: >> >>> 2850: entry_offset[2] = handler->get_c2i_unverified_entry() - i2c_entry; >>> 2851: entry_offset[3] = handler->get_c2i_no_clinit_check_entry() - i2c_entry; >>> 2852: AOTCodeCache::store_code_blob(*adapter_blob, AOTCodeEntry::Adapter, id, name, AdapterHandlerEntry::ENTRIES_COUNT, entry_offset); >> >> What the intended behavior here when `AOTCodeCache::store_code_blob` fails? > > If something happened when we tried to store blob (adapter in this case) into buffer (for example, if reserved for AOT code buffer is too small) the next will be called: > > set_failed(); > exit_vm_on_store_failure(); > > So we either abort VM based on the flag or issue warning (as you suggested) and continue execution. > > `set_failed()` will prevent following attempts to cache blobs and prevent final dump any cached code into AOT cache: > > bool for_dump() const { return _for_dump && !_failed; } Maybe add an assert here? bool success = AOTCodeCache::store_code_blob(...); assert(success || !AOTCodeCache::is_dumping_adapters(), ""); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2069760236 From kvn at openjdk.org Thu May 1 02:47:39 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 May 2025 02:47:39 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v14] In-Reply-To: References: Message-ID: > [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. > > We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. > > Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): > > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) > > > New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): > > > -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters > -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching > -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure > > By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. > This flag is ignored when `AOTCache` is not specified. > > To use AOT adapters follow process described in JEP 483: > > > java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App > java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar > java -XX:AOTCache=app.aot -cp app.jar App > > > There are several new UL flag combinations to trace the AOT code caching process: > > > -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs > > > @ashu-mehra is main author of changes. He implemented adapters caching. > I did main framework (`AOTCodeCache` class) for saving and loading AOT code. > > Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: renaming and new assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24740/files - new: https://git.openjdk.org/jdk/pull/24740/files/70bd0294..9c3e3688 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=12-13 Stats: 10 lines in 2 files changed: 1 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24740.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24740/head:pull/24740 PR: https://git.openjdk.org/jdk/pull/24740 From vlivanov at openjdk.org Thu May 1 02:54:50 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 1 May 2025 02:54:50 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v14] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 02:47:39 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > renaming and new assert Marked as reviewed by vlivanov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24740#pullrequestreview-2809064479 From kvn at openjdk.org Thu May 1 02:58:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 May 2025 02:58:53 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v14] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 02:51:53 GMT, Vladimir Ivanov wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> renaming and new assert > > Marked as reviewed by vlivanov (Reviewer). Thank you, @iwanowww. I need 2 re-reviews so I asked @ashu-mehra to review again. Meanwhile I am re-running testing with latest mainline code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24740#issuecomment-2843969503 From epeter at openjdk.org Thu May 1 06:23:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 May 2025 06:23:01 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v52] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 06:18:07 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> alignment note > > src/hotspot/share/opto/rangeinference.cpp line 99: > >> 97: // >> 98: // In practice, since the algorithm always ensures that the returned value >> 99: // satisfies bits, we only need to check if it is not less than lo. > > Ah nice. Ok. So now we just have to prove that the result satisfies bits in all cases :) I'll check the proofs again to see if this is clear enough later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069894004 From jbhateja at openjdk.org Thu May 1 06:23:45 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 1 May 2025 06:23:45 GMT Subject: RFR: 8354473: Incorrect results for compress/expand tests with -XX:+EnableX86ECoreOpts In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 03:54:09 GMT, Volodymyr Paprotski wrote: > It looks like the `permv` mask isnt always 'all-ones' or 'all-zeroes'. (Which is OK for real blend, but needs to be enforced via the flag for blend emulation) > > Before the fix, `make test TEST="jdk/incubator/vector"` (on ECore machine) > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP >>> jtreg:test/jdk/jdk/incubator/vector 83 71 10 0 2 << > ============================== > TEST FAILURE > > After the fix: > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP > jtreg:test/jdk/jdk/incubator/vector 83 81 0 0 2 > ============================== > TEST SUCCESS > > And on an AVX512 machine: > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP > jtreg:test/jdk/jdk/incubator/vector 83 81 0 0 2 > ============================== > TEST SUCCESS Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24645#pullrequestreview-2809225089 From epeter at openjdk.org Thu May 1 06:23:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 May 2025 06:23:02 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v47] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 15:57:30 GMT, Quan Anh Mai wrote: >>> We want to obtain a value that is larger than lo, has the bit at a certain position set and all bits after that unset. >> >> So much I knew already, but could still not understand the formula ? >> >>> This is the standard operation for alignment when we know that lo is unaligned. >> >> To me it is still not "standard". And you use a similar formula elsewhere, so maybe it could be helpful to explain if in more detail somewhere? >> >> - if `first_violation == 0`: `alignment = 100..00 = -alignment`. So if `lo >= alignment` -> `lo & -alignment = 100..000`, and `new_lo = 0`, we have an overflow it seems? I guess that would make sense. And if `lo < alignment`, the result is rounded up to `100..000`, also good. >> - if `first_violation > 0`: `alignment = 0..010..0`. So `-alignment = 1..110..0`. Now you could probably continue with arguing about the bits of `lo`, and continue that way in a case distinction. >> >> To me this seems less than immediately clear or trivial. Maybe I'm just missing some "standard" math, that is well possible ? > > Maybe the thing you are missing here is that if a value `v` is a power of 2, then its negation `-v` will have all the bit set upto the location that is set in `v`. E.g. `4 = 0x00...0100 -> -4 = 0b11...1100`. So `lo & -alignment` is the act of unsetting all the bits after the set bit in `alignment`. A.k.a rounding down according to `alignment`. So if you want to round up you just add `alignment` since `lo` is known to not be divisible by `alignment`. Nice, the new code comments really help :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069892353 From jbhateja at openjdk.org Thu May 1 06:23:45 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 1 May 2025 06:23:45 GMT Subject: RFR: 8354473: Incorrect results for compress/expand tests with -XX:+EnableX86ECoreOpts In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 16:21:51 GMT, Volodymyr Paprotski wrote: > > @vpaprotsk Can you please give a little more details about what exactly went wrong here, and why your change is correct? > > @eme64 Thanks for looking. Point form in attempt to be concise: > > * Jatin brought this to my attention, we weren't sure whose code was at fault (i.e. I wrote the blend emulation, he wrote the compress_expand) and I got to the investigation first (i.e. see https://bugs.openjdk.org/browse/JDK-8354473) > * The mask for vblendvps instruction; actual instruction only cares about the MSB but for emulation we must have the mask to be either `FFF..FF` or `000..00`. In many places blend is used, this is already the case, so no need to recompute the mask. That's why the flag is provided (i.e. optimization). > * (Without fully understanding the entirety of compress_expand), it appears to me that in this function the mask in `permv` _must_ be computed explicitly. That's why the flag is changed. Hi @vpaprotsk , @eme64, Just to fill in the missing details about compress/expand handling on AVX2, we maintain an in-memory lookup table of permutation indices corresponding to a mask value. Each row of lookup table either holds a valid permute index, which is a positive index value less than the vector lane count OR a -1 index. Since blend emulation always expects to operate over a blend mask vector whose lanes either hold a -1 or a 0 value hence there is a need to re-compose the desired blend mask by signed extending the MSB bits to fill the entire lane. Your fix to recompute the mask looks good to me. Best Regards, Jatin ------------- PR Comment: https://git.openjdk.org/jdk/pull/24645#issuecomment-2844181540 From epeter at openjdk.org Thu May 1 06:32:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 May 2025 06:32:50 GMT Subject: RFR: 8354473: Incorrect results for compress/expand tests with -XX:+EnableX86ECoreOpts In-Reply-To: References: Message-ID: On Thu, 1 May 2025 06:17:44 GMT, Jatin Bhateja wrote: >>> @vpaprotsk Can you please give a little more details about what exactly went wrong here, and why your change is correct? >> >> @eme64 Thanks for looking. Point form in attempt to be concise: >> - Jatin brought this to my attention, we weren't sure whose code was at fault (i.e. I wrote the blend emulation, he wrote the compress_expand) and I got to the investigation first (i.e. see https://bugs.openjdk.org/browse/JDK-8354473) >> - The mask for vblendvps instruction; actual instruction only cares about the MSB but for emulation we must have the mask to be either `FFF..FF` or `000..00`. In many places blend is used, this is already the case, so no need to recompute the mask. That's why the flag is provided (i.e. optimization). >> - (Without fully understanding the entirety of compress_expand), it appears to me that in this function the mask in `permv` _must_ be computed explicitly. That's why the flag is changed. > >> > @vpaprotsk Can you please give a little more details about what exactly went wrong here, and why your change is correct? >> >> @eme64 Thanks for looking. Point form in attempt to be concise: >> >> * Jatin brought this to my attention, we weren't sure whose code was at fault (i.e. I wrote the blend emulation, he wrote the compress_expand) and I got to the investigation first (i.e. see https://bugs.openjdk.org/browse/JDK-8354473) >> * The mask for vblendvps instruction; actual instruction only cares about the MSB but for emulation we must have the mask to be either `FFF..FF` or `000..00`. In many places blend is used, this is already the case, so no need to recompute the mask. That's why the flag is provided (i.e. optimization). >> * (Without fully understanding the entirety of compress_expand), it appears to me that in this function the mask in `permv` _must_ be computed explicitly. That's why the flag is changed. > > Hi @vpaprotsk , @eme64, > > Just to fill in the missing details about compress/expand handling on AVX2, we maintain an in-memory lookup table of permutation indices corresponding to a mask value. Each row of lookup table either holds a valid permute index, which is a positive index value less than the vector lane count OR a -1 index. > > Since blend emulation always expects to operate over a blend mask vector whose lanes either hold a -1 or a 0 value hence there is a need to re-compose the desired blend mask by signed extending the MSB bits to fill the entire lane. Your fix to recompute the mask looks good to me. > > > Best Regards, > Jatin @jatin-bhateja Thanks for reviewing! @vpaprotsk I'm realdy to give the approval too, just want to run some internal testing first - please ping me again in 24h :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24645#issuecomment-2844193698 From epeter at openjdk.org Thu May 1 06:23:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 May 2025 06:23:00 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v52] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 16:11:45 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > alignment note Thanks for the updates, looks much better already :) Left some suggestions for `a such` -> `such a`. src/hotspot/share/opto/rangeinference.cpp line 91: > 89: // > 90: // If there exists a value not less than lo and satisfies bits, then this > 91: // function will always find a such value. The conversion is also true, that is Suggestion: // function will always find such a value. The conversion is also true, that is src/hotspot/share/opto/rangeinference.cpp line 93: > 91: // function will always find a such value. The conversion is also true, that is > 92: // if this function finds a value not less than lo and satisfies bits, then it > 93: // must trivially be the case that there exists a such value. As a result, the Suggestion: // must trivially be the case that there exists such a value. As a result, the src/hotspot/share/opto/rangeinference.cpp line 96: > 94: // negation of those statements are also equivalent, there does not exists a > 95: // value not less than lo and satisfies bits if and only if this function does > 96: // not return a such value. Suggestion: // not return such a value. src/hotspot/share/opto/rangeinference.cpp line 99: > 97: // > 98: // In practice, since the algorithm always ensures that the returned value > 99: // satisfies bits, we only need to check if it is not less than lo. Ah nice. Ok. So now we just have to prove that the result satisfies bits in all cases :) ------------- PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2809219375 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069889936 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069890407 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069890666 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069893090 From epeter at openjdk.org Thu May 1 06:44:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 May 2025 06:44:46 GMT Subject: RFR: 8354473: Incorrect results for compress/expand tests with -XX:+EnableX86ECoreOpts In-Reply-To: References: Message-ID: On Thu, 1 May 2025 06:17:44 GMT, Jatin Bhateja wrote: >>> @vpaprotsk Can you please give a little more details about what exactly went wrong here, and why your change is correct? >> >> @eme64 Thanks for looking. Point form in attempt to be concise: >> - Jatin brought this to my attention, we weren't sure whose code was at fault (i.e. I wrote the blend emulation, he wrote the compress_expand) and I got to the investigation first (i.e. see https://bugs.openjdk.org/browse/JDK-8354473) >> - The mask for vblendvps instruction; actual instruction only cares about the MSB but for emulation we must have the mask to be either `FFF..FF` or `000..00`. In many places blend is used, this is already the case, so no need to recompute the mask. That's why the flag is provided (i.e. optimization). >> - (Without fully understanding the entirety of compress_expand), it appears to me that in this function the mask in `permv` _must_ be computed explicitly. That's why the flag is changed. > >> > @vpaprotsk Can you please give a little more details about what exactly went wrong here, and why your change is correct? >> >> @eme64 Thanks for looking. Point form in attempt to be concise: >> >> * Jatin brought this to my attention, we weren't sure whose code was at fault (i.e. I wrote the blend emulation, he wrote the compress_expand) and I got to the investigation first (i.e. see https://bugs.openjdk.org/browse/JDK-8354473) >> * The mask for vblendvps instruction; actual instruction only cares about the MSB but for emulation we must have the mask to be either `FFF..FF` or `000..00`. In many places blend is used, this is already the case, so no need to recompute the mask. That's why the flag is provided (i.e. optimization). >> * (Without fully understanding the entirety of compress_expand), it appears to me that in this function the mask in `permv` _must_ be computed explicitly. That's why the flag is changed. > > Hi @vpaprotsk , @eme64, > > Just to fill in the missing details about compress/expand handling on AVX2, we maintain an in-memory lookup table of permutation indices corresponding to a mask value. Each row of lookup table either holds a valid permute index, which is a positive index value less than the vector lane count OR a -1 index. > > Since blend emulation always expects to operate over a blend mask vector whose lanes either hold a -1 or a 0 value hence there is a need to re-compose the desired blend mask by signed extending the MSB bits to fill the entire lane. Your fix to recompute the mask looks good to me. > > > Best Regards, > Jatin @jatin-bhateja It seems the flag `-XX:+EnableX86ECoreOpts` only is enabled on some very specific machines. How important / wide spread are these machines? Will they become more wide spread over time? Or is this rather rare, and not worth investing too many resources? How does their importance compare to AVX and AVX2, or machines with only SSE2 or SSE4.1? Because we put a focus on SSE/AVX in internal testing, but I'm wondering if we should also test `EnableX86ECoreOpts` more. How does this flag interact with AVX features? Do ECore machines always have AVX2 for example? What would be good flag combinations here? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24645#issuecomment-2844205682 From epeter at openjdk.org Thu May 1 07:00:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 May 2025 07:00:48 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v48] In-Reply-To: <6wRetlYEA7DpdE-aPXlUVbGtnMu2YFFJF01MbqpD198=.3ce0cd6b-5822-4091-95ee-f9f58a622a7b@github.com> References: <1JdbQdQdyLjm1VeDO6ESqsuu15kHSZs87duzD3BlhyE=.900ff431-6fe6-435b-9056-9a5d9301f3c4@github.com> <6wRetlYEA7DpdE-aPXlUVbGtnMu2YFFJF01MbqpD198=.3ce0cd6b-5822-4091-95ee-f9f58a622a7b@github.com> Message-ID: On Wed, 30 Apr 2025 15:50:10 GMT, Quan Anh Mai wrote: >> @merykitty >> Hmm, I'm a little nervous about the case where `there does not exist one such number`. Because all your proof does is basically assume that there is such a `r`, and then shows that we compute it correctly. >> >> But such proofs do not give us the guarantee that if there is no such `r`, that the computation indeed overflows, i.e. produces a number smaller than `lo`. That would be required for the correctness, no? >> >> So I guess the proof / examples with overflow should happen further down, I'm just mentioning it up here because it is here that you say there can be an overflow. Hope that makes sense ? > > Thanks for the clarification, I have added a section for this case. > > The fundamental logic here is that if there exists a result `r`, then the algorithm will find it. The opposite is also true, if the algorithm finds a satisfying `r`, then of course there exists one. This makes the negation of those statements also equivalent, the algorithm does not find a satisfying result if and only if there is no such value. Right, I think the missing part was that then the result is smaller than `lo`, which depends on that the result still satisfies bits... you now wrote a section about that, so that's better now :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069910446 From epeter at openjdk.org Thu May 1 07:00:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 May 2025 07:00:48 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v53] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 06:57:29 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - wording > - grammar, more details for non-existence src/hotspot/share/opto/rangeinference.cpp line 306: > 304: // a value not less than lo and satisfies bits. This is because there is > 305: // always a bit up to first_violation that is 0 in both lo and zeros > 306: // (trivially, it is the bit at first_violation). Essencially, you are saying that the addition in the alignment for `new_lo` cannot overflow, right? src/hotspot/share/opto/rangeinference.cpp line 362: > 360: // if there is no bit up to first_violation that is 0 in both lo and zeros, > 361: // i.e. tmp == 0. In such cases, alignment == 0 && lo == bits._ones. It is > 362: // the only case when this function does not return a valid answer. Wow, that sounds like your algorithm is broken. Or is it still valid, it just overflows, and gets you a result smaller than `lo`, but that is actually expected? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069914112 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069915747 From qamai at openjdk.org Thu May 1 07:00:46 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 1 May 2025 07:00:46 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v53] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: - wording - grammar, more details for non-existence ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/0eafb3ab..654f8333 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=52 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=51-52 Stats: 17 lines in 1 file changed: 8 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Thu May 1 07:00:47 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 1 May 2025 07:00:47 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: <-8hODpwHoSEctN2Oo2SrCY1aRpkvZ_kcpnOntQLXgC4=.ecf517f4-9442-4812-81dc-1c177f9d70bf@github.com> <3kJkhxljNnUZ_b6c3jGsi4YA-JEDnZG79jblC_G47Jc=.bd8bf380-8150-4c2c-b4c9-a1b11884c01a@github.com> Message-ID: On Wed, 30 Apr 2025 15:37:20 GMT, Emanuel Peter wrote: >> @eme64 Ping. Please don't be annoyed as I think I will ping you more frequently in case you forget. > > I have to say I'm really impressed by all the bit tricks you are using here @merykitty . I'm learning a lot, and I'm very thankful for your patience with me here, and constructing the proofs ? @eme64 > Left some suggestions for `a such` -> `such a`. The correct grammar here is `one such`, not sure why I used `a` instead > I'll check the proofs again to see if this is clear enough later. I added some more comments at the return point for each case to illustrate when it can happen that no valid answer exists. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2844214693 From qamai at openjdk.org Thu May 1 07:04:00 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 1 May 2025 07:04:00 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v53] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 06:54:52 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - wording >> - grammar, more details for non-existence > > src/hotspot/share/opto/rangeinference.cpp line 306: > >> 304: // a value not less than lo and satisfies bits. This is because there is >> 305: // always a bit up to first_violation that is 0 in both lo and zeros >> 306: // (trivially, it is the bit at first_violation). > > Essencially, you are saying that the addition in the alignment for `new_lo` cannot overflow, right? Yes, because the bit at alignment is 0, this operation just sets it to 1 and set all bits after to 0. So there is no overflow here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069918044 From qamai at openjdk.org Thu May 1 07:07:01 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 1 May 2025 07:07:01 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v53] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 06:57:29 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - wording >> - grammar, more details for non-existence > > src/hotspot/share/opto/rangeinference.cpp line 362: > >> 360: // if there is no bit up to first_violation that is 0 in both lo and zeros, >> 361: // i.e. tmp == 0. In such cases, alignment == 0 && lo == bits._ones. It is >> 362: // the only case when this function does not return a valid answer. > > Wow, that sounds like your algorithm is broken. Or is it still valid, it just overflows, and gets you a result smaller than `lo`, but that is actually expected? It is the latter. From the overview, we can see that this function returns an invalid answer if and only if there exists no valid answer (no value not less than `lo` and satisfies `bits`). This clarifies further that in such cases, the algorithm will always return `bits._ones`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069919824 From qamai at openjdk.org Thu May 1 07:27:01 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 1 May 2025 07:27:01 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v53] In-Reply-To: References: Message-ID: <4fxhvRpgY4lA7OGWkErTHhybhV7PCMhuM09DNjJNdy8=.27d70aee-1490-4286-890d-11249be9dad7@github.com> On Thu, 1 May 2025 07:01:26 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/rangeinference.cpp line 306: >> >>> 304: // a value not less than lo and satisfies bits. This is because there is >>> 305: // always a bit up to first_violation that is 0 in both lo and zeros >>> 306: // (trivially, it is the bit at first_violation). >> >> Essencially, you are saying that the addition in the alignment for `new_lo` cannot overflow, right? > > Yes, because the bit at alignment is 0, this operation just sets it to 1 and set all bits after to 0. So there is no overflow here. Added a sentence to clarify that this computation cannot overflow. >> src/hotspot/share/opto/rangeinference.cpp line 362: >> >>> 360: // if there is no bit up to first_violation that is 0 in both lo and zeros, >>> 361: // i.e. tmp == 0. In such cases, alignment == 0 && lo == bits._ones. It is >>> 362: // the only case when this function does not return a valid answer. >> >> Wow, that sounds like your algorithm is broken. Or is it still valid, it just overflows, and gets you a result smaller than `lo`, but that is actually expected? > > It is the latter. From the overview, we can see that this function returns an invalid answer if and only if there exists no valid answer (no value not less than `lo` and satisfies `bits`). This clarifies further that in such cases, the algorithm will always return `bits._ones`. Added a section to describe specifically which computation overflows in this case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069927921 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069928220 From epeter at openjdk.org Thu May 1 07:27:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 May 2025 07:27:00 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v53] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 07:00:46 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - wording > - grammar, more details for non-existence src/hotspot/share/opto/rangeinference.cpp line 263: > 261: // is a 0 that should be a 1. Obviously, since the bit at that position in > 262: // ones is 1, the same bit in zeros is 0. Which means this is the value of > 263: // i we are looking for. Suggestion: // This means that the first bit that does not satisfy the bit requirement // is a 0 that should be a 1. Obviously, since the bit at that position in // ones is 1, the same bit in zeros is 0. Which means this is the value of // we are looking for. // We know i is the largest bit index such that: // - lo[x] satisfies bits for 0 <= x < i (2.2) // - zeros[i] = 0 (2.3) // - lo[i] = 0 (2.4) // For the given i, we know that lo satisfies all bits before i, hence (2.2) // holds. Further, lo[i] = 0 (2.3), and we have a one violation at i, hence // zero[i] = 0 (2.4). Any smaller i would not be the largest possible such // index. Any larger i would violate (2.2), since lo[i] does not satisfy bits. I just realized that we should explicitly tie this back in to the proof we wrote above, and link it to the numbers there. src/hotspot/share/opto/rangeinference.cpp line 285: > 283: // all bits after to zero. This is similar to an operation that aligns lo > 284: // up to this modulo > 285: // 0 0 0 1 0 0 0 0 Suggestion: // This is the bit at which we want to change the bit 0 in lo to a 1, and // all bits after to zero. This is similar to an operation that aligns lo // up to this modulo // 0 0 0 1 0 0 0 0 // This is in preparation for (2.6) in the construction: v[i] = 1. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069929837 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069931509 From epeter at openjdk.org Thu May 1 07:27:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 May 2025 07:27:00 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v53] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 07:21:54 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - wording >> - grammar, more details for non-existence > > src/hotspot/share/opto/rangeinference.cpp line 263: > >> 261: // is a 0 that should be a 1. Obviously, since the bit at that position in >> 262: // ones is 1, the same bit in zeros is 0. Which means this is the value of >> 263: // i we are looking for. > > Suggestion: > > // This means that the first bit that does not satisfy the bit requirement > // is a 0 that should be a 1. Obviously, since the bit at that position in > // ones is 1, the same bit in zeros is 0. Which means this is the value of > // we are looking for. > // We know i is the largest bit index such that: > // - lo[x] satisfies bits for 0 <= x < i (2.2) > // - zeros[i] = 0 (2.3) > // - lo[i] = 0 (2.4) > // For the given i, we know that lo satisfies all bits before i, hence (2.2) > // holds. Further, lo[i] = 0 (2.3), and we have a one violation at i, hence > // zero[i] = 0 (2.4). Any smaller i would not be the largest possible such > // index. Any larger i would violate (2.2), since lo[i] does not satisfy bits. > > I just realized that we should explicitly tie this back in to the proof we wrote above, and link it to the numbers there. Now below, we should try to tie the construction back to (2.5 - 2.7), let me give it a try. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069930283 From duke at openjdk.org Thu May 1 07:32:09 2025 From: duke at openjdk.org (erifan) Date: Thu, 1 May 2025 07:32:09 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v4] In-Reply-To: References: Message-ID: > This patch optimizes the following patterns: > For integer types: > > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. > > For float and double types: > > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: > > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 > testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 > testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 > testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 > testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 > testCompareLTMaskNotInt ops/s 1672180.09 995.238142 2353757.863 853.774734 1.4 > testCompareLTMaskNotLong ops/s 856502.26... erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Update the jtreg test - Merge branch 'master' into JDK-8354242 - Addressed some review comments 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. 2. Improve code comments. - Merge branch 'master' into JDK-8354242 - Merge branch 'master' into JDK-8354242 - 8354242: VectorAPI: combine vector not operation with compare This patch optimizes the following patterns: For integer types: ``` (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) => (VectorMaskCmp src1 src2 ncond) (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) => (VectorMaskCmp src1 src2 ncond) ``` cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. For float and double types: ``` (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) ``` cond can be eq or ne. Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: ``` Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 testCompareLTMaskNotInt ops/s 1672180.09 995.238142 2353757.863 853.774734 1.4 testCompareLTMaskNotLong ops/s 856502.2695 12276.82851 1177671.815 496.723302 1.37 testCompareLTMaskNotShort ops/s 3325798.025 2412.702501 4711554.181 1779.302112 1.41 testCompareNEMaskNotByte ops/s 7910002.518 2771.82477 10245315.33 16321.93935 1.29 testCompareNEMaskNotDouble ops/s 863754.6022 523.140788 1179133.982 476.572178 1.36 testCompareNEMaskNotFloat ops/s 1723321.883 2598.484803 2358492.186 877.1401 1.36 testCompareNEMaskNotInt ops/s 1670288.841 751.774826 2354158.125 835.720163 1.4 testCompareNEMaskNotLong ops/s 836327.6835 410.525466 1178178.825 308.757932 1.4 testCompareNEMaskNotShort ops/s 3327815.841 1511.978763 4711379.136 2336.505531 1.41 testCompareUGEMaskNotByte ops/s 7906699.024 3200.936474 10253843.74 15067.59401 1.29 testCompareUGEMaskNotInt ops/s 1674003.923 3287.191727 2353340.666 951.381021 1.4 testCompareUGEMaskNotLong ops/s 852424.5562 8920.408939 1177943.609 389.6621 1.38 testCompareUGEMaskNotShort ops/s 3327255.858 1584.885143 4711622.355 1247.215277 1.41 testCompareUGTMaskNotByte ops/s 7909249.189 4435.283667 10245541.34 10993.34739 1.29 testCompareUGTMaskNotInt ops/s 1693713.433 20650.00213 2353153.787 1055.343846 1.38 testCompareUGTMaskNotLong ops/s 851022.3395 7079.065268 1177910.677 538.604598 1.38 testCompareUGTMaskNotShort ops/s 3327236.988 1616.886789 4711209.865 3098.494145 1.41 testCompareULEMaskNotByte ops/s 7909350.825 3251.262342 10261449.03 7273.831341 1.29 testCompareULEMaskNotInt ops/s 1672350.925 1545.304304 2353231.755 914.231193 1.4 testCompareULEMaskNotLong ops/s 853349.4765 9804.906913 1177967.254 435.044367 1.38 testCompareULEMaskNotShort ops/s 3325757.891 1555.062257 4712873.187 1650.986905 1.41 testCompareULTMaskNotByte ops/s 7912218.621 2633.477744 10242095.98 21921.39902 1.29 testCompareULTMaskNotInt ops/s 1673994.849 2672.507666 2353449.22 946.105757 1.4 testCompareULTMaskNotLong ops/s 849032.5868 10406.06689 1177586.047 506.541456 1.38 testCompareULTMaskNotShort ops/s 3328062.026 1892.991844 4713247.216 1855.983724 1.41 ``` With option `-XX:UseSVE=0`: ``` Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 7895961.919 72712.90804 7746493.731 71481.92938 0.98 testCompareEQMaskNotDouble ops/s 789811.0455 384.493088 766473.7994 2216.581793 0.97 testCompareEQMaskNotFloat ops/s 1806305.818 638.010451 1819616.613 3295.38958 1 testCompareEQMaskNotInt ops/s 1815820.144 1225.336135 1849538.401 766.29902 1.01 testCompareEQMaskNotLong ops/s 807336.492 335.451807 792732.9483 277.954432 0.98 testCompareEQMaskNotShort ops/s 4818266.38 1927.862665 4668903.001 1922.782715 0.96 testCompareGEMaskNotByte ops/s 7818439.678 75374.97739 16498003.98 41440.49653 2.11 testCompareGEMaskNotInt ops/s 1815159.05 1090.912209 2372095.779 1664.397112 1.3 testCompareGEMaskNotLong ops/s 804324.5575 2301.686878 927919.8507 371.766719 1.15 testCompareGEMaskNotShort ops/s 4818966.563 2443.643652 5385561.038 29558.37423 1.11 testCompareGTMaskNotByte ops/s 7893406.157 82687.74264 16470663.2 22165.55812 2.08 testCompareGTMaskNotInt ops/s 1815316.812 915.894106 2370447.198 655.016338 1.3 testCompareGTMaskNotLong ops/s 807019.456 526.525482 928079.0541 330.582693 1.15 testCompareGTMaskNotShort ops/s 4820552.881 1684.247747 5355902.93 5893.2915 1.11 testCompareLEMaskNotByte ops/s 7816263.323 79560.0015 16473621.19 56688.99585 2.1 testCompareLEMaskNotInt ops/s 1814915.724 926.998625 2368790.306 932.594778 1.3 testCompareLEMaskNotLong ops/s 806483.9 935.718082 928110.9074 407.096695 1.15 testCompareLEMaskNotShort ops/s 4813660.241 6817.870509 5357107.852 10061.47975 1.11 testCompareLTMaskNotByte ops/s 7838948.962 69136.4504 16424405.96 24464.75469 2.09 testCompareLTMaskNotInt ops/s 1815056.833 1187.6453 2369892.187 1103.819634 1.3 testCompareLTMaskNotLong ops/s 806602.1804 287.923365 928346.4118 617.682824 1.15 testCompareLTMaskNotShort ops/s 4817940.643 2767.1509 5372537.84 15397.47169 1.11 testCompareNEMaskNotByte ops/s 9078493.798 4630.339307 16484348.42 18925.88346 1.81 testCompareNEMaskNotDouble ops/s 661769.6272 398.712981 926763.5839 1808.843788 1.4 testCompareNEMaskNotFloat ops/s 1570527.252 563.642144 2312425.678 1815.844846 1.47 testCompareNEMaskNotInt ops/s 1619146.58 626.793854 2369711.543 942.330478 1.46 testCompareNEMaskNotLong ops/s 680201.5381 2252.836482 927808.6147 414.917863 1.36 testCompareNEMaskNotShort ops/s 3763508.054 3622.560798 5367808.015 8591.466599 1.42 testCompareUGEMaskNotByte ops/s 7886373.129 75917.74675 16480928.93 27524.31005 2.08 testCompareUGEMaskNotInt ops/s 1815636.832 750.036241 2369683.015 901.609404 1.3 testCompareUGEMaskNotLong ops/s 806862.5826 287.819616 928001.4394 361.063837 1.15 testCompareUGEMaskNotShort ops/s 4820581.361 2098.537435 5375854.248 25619.40165 1.11 testCompareUGTMaskNotByte ops/s 7891591.465 96614.93542 16410405.93 15012.37096 2.07 testCompareUGTMaskNotInt ops/s 1814871.179 662.825588 2371325.903 1170.491164 1.3 testCompareUGTMaskNotLong ops/s 804013.7658 2240.534209 928062.2169 531.306897 1.15 testCompareUGTMaskNotShort ops/s 4818150.337 3051.717685 5381449.337 21212.34187 1.11 testCompareULEMaskNotByte ops/s 7831540.628 81306.67253 16495250.78 38682.19675 2.1 testCompareULEMaskNotInt ops/s 1814484.14 687.860656 2369265.075 940.609586 1.3 testCompareULEMaskNotLong ops/s 807780.5749 769.876816 927538.0732 1278.267724 1.14 testCompareULEMaskNotShort ops/s 4817437.42 5141.336541 5356183.359 7015.608124 1.11 testCompareULTMaskNotByte ops/s 7849078.225 56753.59764 16395975.27 34043.67295 2.08 testCompareULTMaskNotInt ops/s 1814328.226 2697.219111 2370700.47 1991.841988 1.3 testCompareULTMaskNotLong ops/s 807166.8197 253.061506 927926.2803 252.933462 1.14 testCompareULTMaskNotShort ops/s 4821098.216 1625.959044 5348980.243 4100.768121 1.1 ``` Benchmarks on AMD EPYC 9124 16-Core Processor: With option `-XX:UseAVX=3`: ``` Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 16607323.35 1233692.631 18381557.66 1163201.522 1.1 testCompareEQMaskNotDouble ops/s 2114285.245 58782.2534 2959946.353 43016.0445 1.39 testCompareEQMaskNotFloat ops/s 4480874.437 89975.29074 6960151.436 64799.143 1.55 testCompareEQMaskNotInt ops/s 4370906.91 51784.80889 6856955.043 313858.5504 1.56 testCompareEQMaskNotLong ops/s 2080065.895 26762.06732 2939142.143 67179.05314 1.41 testCompareEQMaskNotShort ops/s 7968282.563 210437.2781 12701214.56 473152.6407 1.59 testCompareGEMaskNotByte ops/s 18419141.89 473408.9451 19880059.68 321638.0397 1.07 testCompareGEMaskNotInt ops/s 4419015.62 77352.98633 7037639.227 151066.0383 1.59 testCompareGEMaskNotLong ops/s 2147982.48 49227.42782 3000275.928 39298.75344 1.39 testCompareGEMaskNotShort ops/s 8469039.613 17833.19707 12288229.49 244317.8812 1.45 testCompareGTMaskNotByte ops/s 18728997.5 468328.8358 20544730.05 392264.6466 1.09 testCompareGTMaskNotInt ops/s 4510009.705 78812.57357 7364629.942 70970.78473 1.63 testCompareGTMaskNotLong ops/s 2124104.969 40917.89257 2953536.279 35199.19687 1.39 testCompareGTMaskNotShort ops/s 8690557.621 311534.1159 12344017.51 457931.8741 1.42 testCompareLEMaskNotByte ops/s 17758400.53 478383.4945 19209183.26 1143297.241 1.08 testCompareLEMaskNotInt ops/s 4363664.862 43443.18063 7054093.064 78141.11476 1.61 testCompareLEMaskNotLong ops/s 2068632.213 29844.78023 2954766.412 50667.22502 1.42 testCompareLEMaskNotShort ops/s 8637608.548 183538.5511 12719010.27 473568.8825 1.47 testCompareLTMaskNotByte ops/s 14406138.95 423105.0163 17292417.96 371386.9689 1.2 testCompareLTMaskNotInt ops/s 4546707.266 131977.3144 7040483.394 213590.4657 1.54 testCompareLTMaskNotLong ops/s 2123277.356 47243.21499 2848720.442 58896.97045 1.34 testCompareLTMaskNotShort ops/s 7570169.363 649873.6295 11945383.75 988276.5955 1.57 testCompareNEMaskNotByte ops/s 18274529.55 683396.7384 19081938.8 1118739.778 1.04 testCompareNEMaskNotDouble ops/s 2112533.61 43295.50012 2912115.441 78189.51083 1.37 testCompareNEMaskNotFloat ops/s 4628683.814 93817.07362 6967208.729 145135.8544 1.5 testCompareNEMaskNotInt ops/s 4470900.214 75974.50842 7286913.662 116328.5277 1.62 testCompareNEMaskNotLong ops/s 2134091.061 46377.94061 2934667.477 81675.46021 1.37 testCompareNEMaskNotShort ops/s 8790384.287 396161.8599 13076858.35 286272.1155 1.48 testCompareUGEMaskNotByte ops/s 18009150.9 660803.8886 17551258.33 1667014.843 0.97 testCompareUGEMaskNotInt ops/s 4442928.74 83190.81019 6854088.277 329008.8901 1.54 testCompareUGEMaskNotLong ops/s 2088357.736 71696.24791 2973202.26 63278.78974 1.42 testCompareUGEMaskNotShort ops/s 8348624.02 116562.7876 12832250.78 546869.3006 1.53 testCompareUGTMaskNotByte ops/s 17871101.25 800199.6321 19902619.81 214003.3262 1.11 testCompareUGTMaskNotInt ops/s 4088304.421 137797.9723 7135454.33 124553.651 1.74 testCompareUGTMaskNotLong ops/s 2070610.42 19881.82182 2991536.365 36260.60767 1.44 testCompareUGTMaskNotShort ops/s 8637099.341 155822.1608 12756579.77 186068.199 1.47 testCompareULEMaskNotByte ops/s 17940901.36 1258029.364 18932484.94 694554.6305 1.05 testCompareULEMaskNotInt ops/s 4369177.511 74982.31936 6392773.082 550171.2266 1.46 testCompareULEMaskNotLong ops/s 2135905.761 43693.63178 2877579.631 41651.56289 1.34 testCompareULEMaskNotShort ops/s 8607710.544 132655.1676 12446370.04 441718.3035 1.44 testCompareULTMaskNotByte ops/s 17409912.23 1033204.537 20607479.99 362000.5056 1.18 testCompareULTMaskNotInt ops/s 4386455.9 119192.1635 6920123.264 186158.2845 1.57 testCompareULTMaskNotLong ops/s 2064995.149 38622.2734 2988343.589 39037.90006 1.44 testCompareULTMaskNotShort ops/s 8642182.752 230919.2442 13029582.09 437101.4923 1.5 ``` The small amount of performance degradation is due to test fluctuations. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24674/files - new: https://git.openjdk.org/jdk/pull/24674/files/34eae981..4fbf84e3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=02-03 Stats: 13350 lines in 388 files changed: 9247 ins; 1843 del; 2260 mod Patch: https://git.openjdk.org/jdk/pull/24674.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674 PR: https://git.openjdk.org/jdk/pull/24674 From shade at openjdk.org Thu May 1 07:33:02 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 1 May 2025 07:33:02 GMT Subject: Integrated: 8355769: Optimize nmethod dependency recording In-Reply-To: References: Message-ID: <6iWJ8ggUQC89qdDbkBQDWkhWlb4Cp6zk1ePs-fi4iMw=.23eca736-e102-4633-bb35-41554f18a4f1@github.com> On Mon, 28 Apr 2025 18:01:42 GMT, Aleksey Shipilev wrote: > During nmethod installation, we record the dependencies between InstanceKlass/CallSite and newly coming `nmethod`. In `DependencyContext::add_dependent_nmethod`, we are linearly scanning to see if the `nmethod` is already in the dependencies list. This costs quite a bit, especially with lots of compiled methods per IK. > > This is not a significant issue for normal JIT compilations, where the JIT costs dominate. But for Leyden, this kind of scan is a significant part of AOT code installation. For example in well-trained javac runs, there are chains of 500+ `nmethods` for some IKs that take 10+ us to scan. This is easily half of the entire AOT method installation cost. > > Fortunately, the way we do the nmethod dependency recording, it allows us to shortcut the scan. Since dependency recording holds the `CodeCache_lock` while adding new `nmethod` all over the various dependency lists, those dependency lists are ever in two states: no `nmethod` in the chain (no need to scan!), or `nmethod` is at the head of the chain (no need to scan!). > > Additional testing: > - [x] Ad-hoc benchmarks > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` This pull request has now been integrated. Changeset: 8ef7e256 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/8ef7e256a09115b475601b621c9b6842185289d0 Stats: 31 lines in 3 files changed: 24 ins; 2 del; 5 mod 8355769: Optimize nmethod dependency recording Reviewed-by: adinn, vlivanov, kvn ------------- PR: https://git.openjdk.org/jdk/pull/24933 From epeter at openjdk.org Thu May 1 07:37:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 May 2025 07:37:00 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v7] In-Reply-To: References: <-8hODpwHoSEctN2Oo2SrCY1aRpkvZ_kcpnOntQLXgC4=.ecf517f4-9442-4812-81dc-1c177f9d70bf@github.com> <3kJkhxljNnUZ_b6c3jGsi4YA-JEDnZG79jblC_G47Jc=.bd8bf380-8150-4c2c-b4c9-a1b11884c01a@github.com> Message-ID: On Thu, 1 May 2025 06:51:21 GMT, Quan Anh Mai wrote: >> I have to say I'm really impressed by all the bit tricks you are using here @merykitty . I'm learning a lot, and I'm very thankful for your patience with me here, and constructing the proofs ? > > @eme64 > >> Left some suggestions for `a such` -> `such a`. > > The correct grammar here is `one such`, not sure why I used `a` instead > >> I'll check the proofs again to see if this is clear enough later. > > I added some more comments at the return point for each case to illustrate when it can happen that no valid answer exists. @merykitty with these suggestions, we link up the bit tricky back to the proof / construction of `r` :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2844256677 From duke at openjdk.org Thu May 1 07:34:52 2025 From: duke at openjdk.org (erifan) Date: Thu, 1 May 2025 07:34:52 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3] In-Reply-To: <1Qo4pB9I7Ok4ntXSE-KkE0sv-Tp5EVCWriWnjcf2iEE=.a7e28640-85df-436a-9c82-3c067cc88dee@github.com> References: <1Qo4pB9I7Ok4ntXSE-KkE0sv-Tp5EVCWriWnjcf2iEE=.a7e28640-85df-436a-9c82-3c067cc88dee@github.com> Message-ID: On Tue, 29 Apr 2025 10:22:22 GMT, Emanuel Peter wrote: >> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Addressed some review comments >> >> 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >> 2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >> This patch optimizes the following patterns: >> For integer types: >> ``` >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> ``` >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> ``` >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> ``` >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: >> With option `-XX:UseSVE=2`: >> ``` >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393... > > Yes, this discussion is down to `requires` vs `applyIf`. This is my argument for `applyIf`, quoted from above, I have not yet seen an argument against it: > >> If you use @require, then the person does not realize there is a test AND the test is not run. If you use applyIf, the person does not realize there is a test, but it is run at least for result verifiation - and then the person MIGHT realize if the test catches a wrong result / crash. > > In my understanding, `requires` should only be used if the test really **requires** a certain platform or feature. That can be because some flags are only available under certain platforms for example. But for IR tests, we should try to always use `applyIf`, because it allows testing on other platforms. > > Actually, I filed this RFE a while ago: https://bugs.openjdk.org/browse/JDK-8310891 > We should try to move as many tests from using `requires` to `applyIf`, so that we have an increased test coverage. @eme64 @jatin-bhateja I have updated the test, thanks for your suggestion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2844256626 From shade at openjdk.org Thu May 1 07:32:57 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 1 May 2025 07:32:57 GMT Subject: RFR: 8355769: Optimize nmethod dependency recording [v2] In-Reply-To: <1uNBgSbbBgdiHXyYL1fkY6HIKaH9CgwF5kN8ABlt3xo=.07c945af-3840-45d8-a1b3-2fe55c52df81@github.com> References: <1uNBgSbbBgdiHXyYL1fkY6HIKaH9CgwF5kN8ABlt3xo=.07c945af-3840-45d8-a1b3-2fe55c52df81@github.com> Message-ID: On Tue, 29 Apr 2025 12:47:25 GMT, Aleksey Shipilev wrote: >> During nmethod installation, we record the dependencies between InstanceKlass/CallSite and newly coming `nmethod`. In `DependencyContext::add_dependent_nmethod`, we are linearly scanning to see if the `nmethod` is already in the dependencies list. This costs quite a bit, especially with lots of compiled methods per IK. >> >> This is not a significant issue for normal JIT compilations, where the JIT costs dominate. But for Leyden, this kind of scan is a significant part of AOT code installation. For example in well-trained javac runs, there are chains of 500+ `nmethods` for some IKs that take 10+ us to scan. This is easily half of the entire AOT method installation cost. >> >> Fortunately, the way we do the nmethod dependency recording, it allows us to shortcut the scan. Since dependency recording holds the `CodeCache_lock` while adding new `nmethod` all over the various dependency lists, those dependency lists are ever in two states: no `nmethod` in the chain (no need to scan!), or `nmethod` is at the head of the chain (no need to scan!). >> >> Additional testing: >> - [x] Ad-hoc benchmarks >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Fiddle with locks Thanks all! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24933#issuecomment-2844253066 From epeter at openjdk.org Thu May 1 07:37:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 May 2025 07:37:01 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v53] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 07:00:46 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - wording > - grammar, more details for non-existence Can you do the analogue with the else (one violation) case? That one is probably a bit harder, but I have faith in you ;) src/hotspot/share/opto/rangeinference.cpp line 297: > 295: // lo up to a multiple of alignment, we add alignment to the rounded down > 296: // value. > 297: // 1 1 0 1 0 0 0 0 Suggestion: // This is the first value which have the violated bit being 1, which means // that the result should not be smaller than this. This is a standard // operation to align a value up to a certain power of 2. // Since alignment is a power of 2, -alignment is a value having all the // bits being 1 upto the location of the bit in alignment (in the example, // -alignment = 11110000). As a result, lo & -alignment set all bits after // the bit in alignment to 0, which is equivalent to rounding lo down to a // multiple of alignment. Since lo is not divisible by alignment, to round // lo up to a multiple of alignment, we add alignment to the rounded down // value. // 1 1 0 1 0 0 0 0 // We now have: // - new_lo[x] = lo[x], for 0 <= x < i (2.5) // - new_lo[i] = 1 (2.6) // - new_lo[x] = 0, for x > i (not yet 2.7) src/hotspot/share/opto/rangeinference.cpp line 301: > 299: // Our current new_lo satisfies zeros, just OR it with ones to obtain the > 300: // correct result > 301: // 1 1 0 1 0 0 1 0 Suggestion: // Our current new_lo satisfies zeros, just OR it with ones to obtain the // correct result // 1 1 0 1 0 0 1 0 // Since the bits at i and before are not changed, we now have: // - new_lo[x] = lo[x], for 0 <= x < i (2.5) // - new_lo[i] = 1 (2.6) // - new_lo[x] = ones[x], for x > i (2.7) // Hence: new_lo = r ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2844258870 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069933569 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069935004 From epeter at openjdk.org Thu May 1 07:51:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 May 2025 07:51:59 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v53] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 07:00:46 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - wording > - grammar, more details for non-existence Two minor suggestions for the proof around `new_hi`. I think the added proof helps, thanks for adding it :) src/hotspot/share/opto/rangeinference.cpp line 416: > 414: // > 415: // 2. Assume there is a value k that is larger than ~h such that k is not > 416: // larger than hi and k satisfies {bits._zeros, bits._ones}. As a result, ~k Suggestion: // 2. Assume there is a value k that is larger than ~h such that k is not // larger than hi, i.e. ~h < k <= hi and k satisfies {bits._zeros, bits._ones}. As a result, ~k Just to help the reader a little. src/hotspot/share/opto/rangeinference.cpp line 424: > 422: // As a result, ~h is the largest value not larger than hi that satisfies > 423: // bits (QED). > 424: U new_hi = ~adjust_lo(~bounds._hi, {bits._ones, bits._zeros}); Suggestion: U h = adjust_lo(~bounds._hi, {bits._ones, bits._zeros}) U new_hi = ~h; Just to tie it in to your proof from above. ------------- PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2809297171 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069947334 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069945068 From qamai at openjdk.org Thu May 1 07:59:15 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 1 May 2025 07:59:15 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v54] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with three additional commits since the last revision: - new_hi computation - refer back to the formality section - clarify where overflow comes from ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/654f8333..b29ff4ac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=53 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=52-53 Stats: 95 lines in 1 file changed: 57 ins; 9 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Thu May 1 08:02:02 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 1 May 2025 08:02:02 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v53] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 07:34:22 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - wording >> - grammar, more details for non-existence > > Can you do the analogue with the else (one violation) case? > That one is probably a bit harder, but I have faith in you ;) @eme64 Done! You are right that it would be better to link the implementation back to the theoretical proof, especially the part after obtaining the value of `i`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2844328674 From mli at openjdk.org Thu May 1 08:17:48 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 1 May 2025 08:17:48 GMT Subject: Integrated: 8355913: RISC-V: improve hotspot/jtreg/compiler/vectorization/runner/BasicFloatOpTest.java In-Reply-To: <2Y39meZMDbec4LfN-EqdwJTvT_NIO1FqRBAv8bL5m6k=.b8359d80-9aa3-445e-a3d7-07868f7dff1b@github.com> References: <2Y39meZMDbec4LfN-EqdwJTvT_NIO1FqRBAv8bL5m6k=.b8359d80-9aa3-445e-a3d7-07868f7dff1b@github.com> Message-ID: On Tue, 29 Apr 2025 13:42:22 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch? > Previously, BasicFloatOpTest.java is accidently not really enabled on riscv. > And FmaVF/FmaVD depends on both UseFMA and UseRVV, the code should make it clear in this sense. > And IR verification of FmaVF in BasicFloatOpTest.java should only be enabled when UseFMA && rvv. > > Thanks! This pull request has now been integrated. Changeset: 0cd0afb2 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/0cd0afb2b32abd77f6275cf34a499b5cb31f22b5 Stats: 10 lines in 3 files changed: 2 ins; 0 del; 8 mod 8355913: RISC-V: improve hotspot/jtreg/compiler/vectorization/runner/BasicFloatOpTest.java Reviewed-by: fyang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/24950 From mli at openjdk.org Thu May 1 08:04:43 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 1 May 2025 08:04:43 GMT Subject: RFR: 8355980: RISC-V: remove vmclr_m before vmsXX and vmfXX In-Reply-To: References: Message-ID: On Thu, 1 May 2025 00:57:24 GMT, Fei Yang wrote: > Thanks! Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24968#issuecomment-2844341656 From mli at openjdk.org Thu May 1 08:36:29 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 1 May 2025 08:36:29 GMT Subject: RFR: 8355704: RISC-V: enable TestIRFma.java [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch to enable TestIRFma.java? > FmaF/D (checked by TestIRFma.java) are supported on riscv, but for some reason we can not enable it easily, but we should enable it. > > NOTE: the reason I change IRNode matching rules is that, previously it verify the `FINAL CODE` where every platform could have different instruct name; I change it from machOnlyNameRegex to beforeMatchingNameRegex, to make it verify the `PrintIdeal` where every platform share the same names. > > Also tested on machine with `asimd` support. > > Thanks! Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: adjust IR verification ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24947/files - new: https://git.openjdk.org/jdk/pull/24947/files/590f6ae8..10a819d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24947&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24947&range=00-01 Stats: 71 lines in 2 files changed: 28 ins; 5 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/24947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24947/head:pull/24947 PR: https://git.openjdk.org/jdk/pull/24947 From mli at openjdk.org Thu May 1 08:36:29 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 1 May 2025 08:36:29 GMT Subject: RFR: 8355704: RISC-V: enable TestIRFma.java [v2] In-Reply-To: References: Message-ID: <71DhXQciBXxav_QAqDfyTSNuJO7OlYmQFBNgA57tk1U=.341ec2f2-3598-41f9-bfcd-741c2051d7e9@github.com> On Wed, 30 Apr 2025 14:42:59 GMT, Hamlin Li wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestIRFma.java line 89: >> >>> 87: >>> 88: @Test >>> 89: @IR(counts = {IRNode.FMSUB_F, "> 0", IRNode.NEG_F, "> 0"}, >> >> What is the reason for adding the "IRNode.NEG_F" ? > > I think I need to reconsider the test improvement, will update the pr later. > > Thank for having a look. I'll keep the test simple, let IR verification to verify the optimal FmaF/D nodes are generated, and let asserts to verify the functionality of these nodes are right. So it's a bit different from another platform, which verify the specific instructs are finally generated in "FINAL CODE" phase, which is not bad, but seems over-complicated and over-white-boxed, and different platforms could have different instruct names for the same IR node (which is the case in FMA on riscv and aarch64). And I'll leave aarch64 untouched. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24947#discussion_r2069990316 From epeter at openjdk.org Thu May 1 08:42:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 May 2025 08:42:01 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v53] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 08:02:42 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - wording >> - grammar, more details for non-existence > > src/hotspot/share/opto/rangeinference.cpp line 454: > >> 452: // it >> 453: U new_zeros = bits._zeros | (match_mask & ~bounds._lo); >> 454: U new_ones = bits._ones | (match_mask & bounds._lo); > > Suggestion: > > U common_prefix = match_mask & bounds._lo; > assert(common_prefix == match_mask & bounds._lo, "common prefix of both"); > U neg_common_prefix = match_mask & ~bounds._lo; > assert(neg_common_prefix == match_mask & ~bounds._lo, "common negated prefix of both"); > U new_zeros = bits._zeros | neg_common_prefix; > U new_ones = bits._ones | common_prefix; > > Just an idea. Up to you. Hmm, I think your solution is understandable already, not sure mine is better. I'll still leave it for you here, feel free to mark it resolved. > src/hotspot/share/opto/rangeinference.cpp line 509: > >> 507: // Trivially canonicalize the bounds so that srange._lo and urange._hi are >> 508: // both < 0 or >= 0. The same for srange._hi and urange._ulo. See TypeInt for >> 509: // detailed explanation. > > You seem to suggest that `urange._hi` can be `< 0`. But it is unsigned, so that is confusing. > This looks like Lemma 3 from TypeInt is relevant here, right? You might want to state that here, > and still also bring this ASCII art up again here: > > * Signed: > * -----lo=========uhi---------0--------ulo==========hi----- > * Unsigned: > * 0--------ulo==========hi----------lo=========uhi--------- > > Also, does `S(urange._lo) > S(urange._hi)` imply `U(srange._lo) > U(srange._hi)`? > - If yes, assert it! > - If no: are we missing an optimization? It could also be that I'm missing some things here... So it could be good to refer to the exact things in `TypeInt` and the Lemma numbers you are using here. Otherwise it is a lot of work for the reader to check what you are doing here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069969301 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069991243 From epeter at openjdk.org Thu May 1 08:42:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 May 2025 08:42:01 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v53] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 07:00:46 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - wording > - grammar, more details for non-existence A few more comments around canonicalization. src/hotspot/share/opto/rangeinference.cpp line 39: > 37: public: > 38: bool _progress; // whether there is progress compared to the last iteration > 39: bool _present; // whether the result is empty, typically due to the calculation arriving at contradiction Yeah, why not just call it `_is_non_empty`? I guess that is a matter of taste. src/hotspot/share/opto/rangeinference.cpp line 443: > 441: template > 442: static AdjustResult> > 443: adjust_bits_from_bounds(const KnownBits& bits, const RangeInt& bounds) { Ok, so this only deals with the unsigned bounds. Is there an analogue for the signed bits? Ah, you could make the name more precise. Suggestion: template static AdjustResult> adjust_bits_from_unsigned_bounds(const KnownBits& bits, const RangeInt& bounds) { Hmm, maybe you only deal with simple intervals, and there the signed and unsigned bounds end up giving you the equivalent info... is this correct? src/hotspot/share/opto/rangeinference.cpp line 447: > 445: // and bounds._hi should share this common prefix in terms of bits > 446: U mismatch = bounds._lo ^ bounds._hi; > 447: // Find the first mismatch, all bits before it is the same in bounds._lo and Suggestion: // Find the first mismatch, all bits before it are the same in bounds._lo and src/hotspot/share/opto/rangeinference.cpp line 454: > 452: // it > 453: U new_zeros = bits._zeros | (match_mask & ~bounds._lo); > 454: U new_ones = bits._ones | (match_mask & bounds._lo); Suggestion: U common_prefix = match_mask & bounds._lo; assert(common_prefix == match_mask & bounds._lo, "common prefix of both"); U neg_common_prefix = match_mask & ~bounds._lo; assert(neg_common_prefix == match_mask & ~bounds._lo, "common negated prefix of both"); U new_zeros = bits._zeros | neg_common_prefix; U new_ones = bits._ones | common_prefix; Just an idea. Up to you. src/hotspot/share/opto/rangeinference.cpp line 456: > 454: U new_ones = bits._ones | (match_mask & bounds._lo); > 455: bool progress = (new_zeros != bits._zeros) || (new_ones != bits._ones); > 456: bool present = ((new_zeros & new_ones) == U(0)); Hmm. It sounds like `present` could also be named `is_non_empty`? src/hotspot/share/opto/rangeinference.cpp line 466: > 464: // not be larger than 64. > 465: // This function is called simple because it deals with a simple intervals (see > 466: // TypeInt at type.hpp). Could we somehow assert that the input bounds are indeed a simple interval? src/hotspot/share/opto/rangeinference.cpp line 509: > 507: // Trivially canonicalize the bounds so that srange._lo and urange._hi are > 508: // both < 0 or >= 0. The same for srange._hi and urange._ulo. See TypeInt for > 509: // detailed explanation. You seem to suggest that `urange._hi` can be `< 0`. But it is unsigned, so that is confusing. This looks like Lemma 3 from TypeInt is relevant here, right? You might want to state that here, and still also bring this ASCII art up again here: * Signed: * -----lo=========uhi---------0--------ulo==========hi----- * Unsigned: * 0--------ulo==========hi----------lo=========uhi--------- Also, does `S(urange._lo) > S(urange._hi)` imply `U(srange._lo) > U(srange._hi)`? - If yes, assert it! - If no: are we missing an optimization? src/hotspot/share/opto/rangeinference.cpp line 519: > 517: // This means that there should be no element in the interval > 518: // [S(urange._lo), max_S], tighten urange._lo to min_S > 519: urange._lo = U(std::numeric_limits::min()); You could also add `min_S` and `max_S` in the ASCII art :) ------------- PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2809309491 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069971106 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069953917 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069957235 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069968393 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069970304 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069975655 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069985155 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069989449 From rehn at openjdk.org Thu May 1 09:04:47 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 1 May 2025 09:04:47 GMT Subject: RFR: 8355704: RISC-V: enable TestIRFma.java [v2] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 08:36:29 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch to enable TestIRFma.java? >> FmaF/D (checked by TestIRFma.java) are supported on riscv, but for some reason we can not enable it easily, but we should enable it. >> >> NOTE: the reason I change IRNode matching rules is that, previously it verify the `FINAL CODE` where every platform could have different instruct name; I change it from machOnlyNameRegex to beforeMatchingNameRegex, to make it verify the `PrintIdeal` where every platform share the same names. >> >> Also tested on machine with `asimd` support. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > adjust IR verification Thanks, looks good! test/hotspot/jtreg/compiler/c2/irTests/TestIRFma.java line 91: > 89: @IR(counts = {IRNode.FMSUB, "> 0"}, > 90: applyIfCPUFeature = {"asimd", "true"}) > 91: @IR(counts = {IRNode.FMA_F, "= 1"}, When you read these tests, it is a bit hard to know how that IRNode.XX do the matching. In this case FMSUB is actually not an IRNode but an instruction, which I didn't understand. ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24947#pullrequestreview-2809392301 PR Review Comment: https://git.openjdk.org/jdk/pull/24947#discussion_r2070011735 From mli at openjdk.org Thu May 1 09:37:48 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 1 May 2025 09:37:48 GMT Subject: RFR: 8355704: RISC-V: enable TestIRFma.java [v2] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 09:01:58 GMT, Robbin Ehn wrote: > Thanks, looks good! Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24947#issuecomment-2844532035 From epeter at openjdk.org Thu May 1 09:49:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 May 2025 09:49:03 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v54] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 07:59:15 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with three additional commits since the last revision: > > - new_hi computation > - refer back to the formality section > - clarify where overflow comes from Another batch before lunch :) src/hotspot/share/opto/rangeinference.cpp line 626: > 624: bool TypeIntPrototype::contains(S v) const { > 625: U u(v); > 626: return v >= _srange._lo && v <= _srange._hi && u >= _urange._lo && u <= _urange._hi && _bits.is_satisfied_by(u); Suggestion: return _srange._lo <= v && v <= _srange._hi && _urange._lo <= u && u <= _urange._hi && _bits.is_satisfied_by(u); Optional: improve readability. src/hotspot/share/opto/rangeinference.cpp line 670: > 668: if (i1 == t2 || t2 == Type::TOP) { > 669: return i1; > 670: } Not sure if I got this right, I'm not fully understanding what the `dual` flag does here yet. Is this correct? `meet = intersection` - `dual = false` `join = union` - `dual = true` So if `i1 == t2`, then we can return `i1` in both cases, as it is both the intersection and union. But if `t2 == Type::TOP`, and `i1` is not TOP, then `i1` is not the intersection. What am I missing? src/hotspot/share/opto/rangeinference.cpp line 678: > 676: {MIN2(i1->_ulo, i2->_ulo), MAX2(i1->_uhi, i2->_uhi)}, > 677: {i1->_bits._zeros & i2->_bits._zeros, i1->_bits._ones & i2->_bits._ones}}, > 678: MAX2(i1->_widen, i2->_widen), false); Ok, this looks like a union. And below like a intersection. src/hotspot/share/opto/rangeinference.cpp line 938: > 936: > 937: template > 938: const char* TypeIntHelper::bitname(char* buf, size_t buf_size, U zeros, U ones) { Wow, these will look incredibly long for 64bit long values, no? Well, if it ever gets too much in the way, we can still try to find more compressed representations later. Maybe things like: `*..*000` for 8-aligned values. Others: `0..0***`, `00*..*`, `1..1***` etc. `0..0*..*` would be a little unfortunate as we would lose the position where the bits flip. src/hotspot/share/opto/rangeinference.cpp line 1016: > 1014: } else { > 1015: if (verbose) { > 1016: st->print("long:%s..%s ^ %s..%s, bits:%s", Suggestion: st->print("long:%s..%s, %s..%s, bits:%s", That's what you have for the ints above. src/hotspot/share/opto/rangeinference.cpp line 1034: > 1032: } > 1033: } else { > 1034: st->print("long:%s..%s ^ %s..%s", Suggestion: st->print("long:%s..%s, %s..%s", That's what you have for the ints above. ------------- PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2809373002 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2069998856 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070007333 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070017853 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070094305 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070103170 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070103489 From epeter at openjdk.org Thu May 1 09:49:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 May 2025 09:49:03 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v54] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 08:54:46 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with three additional commits since the last revision: >> >> - new_hi computation >> - refer back to the formality section >> - clarify where overflow comes from > > src/hotspot/share/opto/rangeinference.cpp line 670: > >> 668: if (i1 == t2 || t2 == Type::TOP) { >> 669: return i1; >> 670: } > > Not sure if I got this right, I'm not fully understanding what the `dual` flag does here yet. > Is this correct? > `meet = intersection` - `dual = false` > `join = union` - `dual = true` > > So if `i1 == t2`, then we can return `i1` in both cases, as it is both the intersection and union. > But if `t2 == Type::TOP`, and `i1` is not TOP, then `i1` is not the intersection. > > What am I missing? If this is indeed a bug, we need a test case to catch it :) > src/hotspot/share/opto/rangeinference.cpp line 678: > >> 676: {MIN2(i1->_ulo, i2->_ulo), MAX2(i1->_uhi, i2->_uhi)}, >> 677: {i1->_bits._zeros & i2->_bits._zeros, i1->_bits._ones & i2->_bits._ones}}, >> 678: MAX2(i1->_widen, i2->_widen), false); > > Ok, this looks like a union. And below like a intersection. Why do you handle the `widen` differently here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070008861 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070018828 From qamai at openjdk.org Thu May 1 10:48:42 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 1 May 2025 10:48:42 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v55] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Emanuel's reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/b29ff4ac..3a2aa8d4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=54 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=53-54 Stats: 49 lines in 1 file changed: 23 ins; 0 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Thu May 1 10:48:44 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 1 May 2025 10:48:44 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v53] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 07:34:22 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - wording >> - grammar, more details for non-existence > > Can you do the analogue with the else (one violation) case? > That one is probably a bit harder, but I have faith in you ;) @eme64 I have addressed all of your comments. > src/hotspot/share/opto/rangeinference.cpp line 443: > >> 441: template >> 442: static AdjustResult> >> 443: adjust_bits_from_bounds(const KnownBits& bits, const RangeInt& bounds) { > > Ok, so this only deals with the unsigned bounds. Is there an analogue for the signed bits? > Ah, you could make the name more precise. > Suggestion: > > template > static AdjustResult> > adjust_bits_from_unsigned_bounds(const KnownBits& bits, const RangeInt& bounds) { > > Hmm, maybe you only deal with simple intervals, and there the signed and unsigned bounds end up giving you the equivalent info... is this correct? Conceptually, these deal with unsigned bounds, I thought that would be obvious from the type of `bounds`, but I have changed the name of this function to `unsigned_bounds` to be clearer. > src/hotspot/share/opto/rangeinference.cpp line 456: > >> 454: U new_ones = bits._ones | (match_mask & bounds._lo); >> 455: bool progress = (new_zeros != bits._zeros) || (new_ones != bits._ones); >> 456: bool present = ((new_zeros & new_ones) == U(0)); > > Hmm. It sounds like `present` could also be named `is_non_empty`? I think that there is no difference. > src/hotspot/share/opto/rangeinference.cpp line 466: > >> 464: // not be larger than 64. >> 465: // This function is called simple because it deals with a simple intervals (see >> 466: // TypeInt at type.hpp). > > Could we somehow assert that the input bounds are indeed a simple interval? Done, we can assert that the highest bits of them are the same. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2844629691 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070137714 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070144032 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070138294 From qamai at openjdk.org Thu May 1 10:48:47 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 1 May 2025 10:48:47 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v54] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 08:56:30 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/rangeinference.cpp line 670: >> >>> 668: if (i1 == t2 || t2 == Type::TOP) { >>> 669: return i1; >>> 670: } >> >> Not sure if I got this right, I'm not fully understanding what the `dual` flag does here yet. >> Is this correct? >> `meet = intersection` - `dual = false` >> `join = union` - `dual = true` >> >> So if `i1 == t2`, then we can return `i1` in both cases, as it is both the intersection and union. >> But if `t2 == Type::TOP`, and `i1` is not TOP, then `i1` is not the intersection. >> >> What am I missing? > > If this is indeed a bug, we need a test case to catch it :) No, the subset relation is only reversed in the set of all `CT` instances. In the overall `Type` hierarchy it is still the same, we are trying to find the union of the arguments. If you try to change it to `return dual ? t2 : i1` you would be hit with `=== Meet Not Symmetric ===` errors everywhere :) Similarly, that is why when the 2 arguments are of different kind the result is always `Type::BOTTOM`. >> src/hotspot/share/opto/rangeinference.cpp line 678: >> >>> 676: {MIN2(i1->_ulo, i2->_ulo), MAX2(i1->_uhi, i2->_uhi)}, >>> 677: {i1->_bits._zeros & i2->_bits._zeros, i1->_bits._ones & i2->_bits._ones}}, >>> 678: MAX2(i1->_widen, i2->_widen), false); >> >> Ok, this looks like a union. And below like a intersection. > > Why do you handle the `widen` differently here? Because the join of 2 widens should be the smaller value and the meet of 2 widens should be the larger one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070140871 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070141428 From qamai at openjdk.org Thu May 1 10:48:47 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 1 May 2025 10:48:47 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v54] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 10:39:57 GMT, Quan Anh Mai wrote: >> If this is indeed a bug, we need a test case to catch it :) > > No, the subset relation is only reversed in the set of all `CT` instances. In the overall `Type` hierarchy it is still the same, we are trying to find the union of the arguments. If you try to change it to `return dual ? t2 : i1` you would be hit with `=== Meet Not Symmetric ===` errors everywhere :) Similarly, that is why when the 2 arguments are of different kind the result is always `Type::BOTTOM`. I have also added an explanation for this function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070142995 From qamai at openjdk.org Thu May 1 10:48:49 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 1 May 2025 10:48:49 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v54] In-Reply-To: References: Message-ID: <-PwC3GNfGkJx8XVuhDelARrUHi6gX6GsqQJYsVzVdvE=.bf82fc8f-41da-4b36-a9e5-90c2cda94875@github.com> On Thu, 1 May 2025 09:34:56 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with three additional commits since the last revision: >> >> - new_hi computation >> - refer back to the formality section >> - clarify where overflow comes from > > src/hotspot/share/opto/rangeinference.cpp line 938: > >> 936: >> 937: template >> 938: const char* TypeIntHelper::bitname(char* buf, size_t buf_size, U zeros, U ones) { > > Wow, these will look incredibly long for 64bit long values, no? > Well, if it ever gets too much in the way, we can still try to find more compressed representations later. > Maybe things like: `*..*000` for 8-aligned values. > Others: `0..0***`, `00*..*`, `1..1***` etc. > `0..0*..*` would be a little unfortunate as we would lose the position where the bits flip. It only gets printed when we print verbose the `TypeInt` instance, for normal `dump` bit information is not present. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070142217 From qamai at openjdk.org Thu May 1 10:48:46 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 1 May 2025 10:48:46 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v53] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 08:04:02 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/rangeinference.cpp line 454: >> >>> 452: // it >>> 453: U new_zeros = bits._zeros | (match_mask & ~bounds._lo); >>> 454: U new_ones = bits._ones | (match_mask & bounds._lo); >> >> Suggestion: >> >> U common_prefix = match_mask & bounds._lo; >> assert(common_prefix == match_mask & bounds._lo, "common prefix of both"); >> U neg_common_prefix = match_mask & ~bounds._lo; >> assert(neg_common_prefix == match_mask & ~bounds._lo, "common negated prefix of both"); >> U new_zeros = bits._zeros | neg_common_prefix; >> U new_ones = bits._ones | common_prefix; >> >> Just an idea. Up to you. > > Hmm, I think your solution is understandable already, not sure mine is better. I'll still leave it for you here, feel free to mark it resolved. Nice suggestions, I have done so with a little modification. >> src/hotspot/share/opto/rangeinference.cpp line 509: >> >>> 507: // Trivially canonicalize the bounds so that srange._lo and urange._hi are >>> 508: // both < 0 or >= 0. The same for srange._hi and urange._ulo. See TypeInt for >>> 509: // detailed explanation. >> >> You seem to suggest that `urange._hi` can be `< 0`. But it is unsigned, so that is confusing. >> This looks like Lemma 3 from TypeInt is relevant here, right? You might want to state that here, >> and still also bring this ASCII art up again here: >> >> * Signed: >> * -----lo=========uhi---------0--------ulo==========hi----- >> * Unsigned: >> * 0--------ulo==========hi----------lo=========uhi--------- >> >> Also, does `S(urange._lo) > S(urange._hi)` imply `U(srange._lo) > U(srange._hi)`? >> - If yes, assert it! >> - If no: are we missing an optimization? > > It could also be that I'm missing some things here... So it could be good to refer to the exact things in `TypeInt` and the Lemma numbers you are using here. Otherwise it is a lot of work for the reader to check what you are doing here. Here the bounds may take any value, so we cannot have S(urange._lo) > S(urange._hi) imply U(srange._lo) > U(srange._hi) yet. I hope the description I have done is now clearer. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070137968 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070143606 From mli at openjdk.org Thu May 1 11:37:22 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 1 May 2025 11:37:22 GMT Subject: RFR: 8356030: RISC-V: enable (part of) BasicDoubleOpTest.java Message-ID: <3UkoITinG0CBPVt9q5O8vpnHKh154itJ4STteFDM1cc=.b5da8c9f-2ca8-4d4a-91b6-70ae0a949a94@github.com> Hi, Can you help to review this patch? Originally, I was going to enable all test cases on riscv in this test file. But seems there was already a try to implement RoundDoubleModeV (which is IRNode.ROUND_DOUBLE_MODE_V) in https://github.com/openjdk/jdk/pull/21164, but failed because of some performance regression. So I'll just enable part of test cases in this pr. Thanks! ------------- Commit messages: - not enable RoundDoubleModeV - initial commit Changes: https://git.openjdk.org/jdk/pull/24983/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24983&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356030 Stats: 24 lines in 1 file changed: 13 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/24983.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24983/head:pull/24983 PR: https://git.openjdk.org/jdk/pull/24983 From duke at openjdk.org Thu May 1 12:19:22 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Thu, 1 May 2025 12:19:22 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v5] In-Reply-To: References: Message-ID: > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'openjdk:master' into JDK-8322174 - Merge branch 'master' into JDK-8322174 - Merge master - num_8b_elems_in_vec --> nof_vec_elems - Removed checks for (MaxVectorSize >= 16) per @RealFYang suggestion. - 8322174: RISC-V: C2 VectorizedHashCode RVV Version ------------- Changes: https://git.openjdk.org/jdk/pull/17413/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=04 Stats: 531 lines in 6 files changed: 529 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From kvn at openjdk.org Thu May 1 13:53:50 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 May 2025 13:53:50 GMT Subject: RFR: 8355896: Lossy narrowing cast of JVMCINMethodData::size In-Reply-To: <_NK25ix3znr_ZqssJTXhXQ1-BqHNq_d3toJSGQ9S1mU=.a5279c90-4441-4a70-943e-c9c5f97d9252@github.com> References: <_NK25ix3znr_ZqssJTXhXQ1-BqHNq_d3toJSGQ9S1mU=.a5279c90-4441-4a70-943e-c9c5f97d9252@github.com> Message-ID: On Wed, 30 Apr 2025 13:10:19 GMT, Boris Ulasevich wrote: > In https://github.com/openjdk/jdk/pull/21276 mutable_data, which includes relocations, metadata, and jvmci_data, was moved to a separately malloc'ed blob. The nmethod (a CodeBlob) holds a pointer to the mutable_data blob and stores its internal offsets. > > As part of that change, I reused the former uint16_t offset field to store jvmci_data_size. This turned out to be incorrect, since jvmci_data can exceed 64 KB (as shown in https://github.com/openjdk/jdk/pull/24753). > > The most direct fix would be to change jvmci_data_size to uint, placing it alongside other int fields to avoid padding. However, in fact on my build this increases the size of the nmethod structure from 240 to 248 bytes, which I would prefer to avoid. > > Instead, I propose storing metadata_size in the existing uint16_t field. The average metadata_size is approximately 140 bytes, and the maximum observed in practice is around 4 KB. While, like oops_size, this value is not formally guaranteed to remain below 64 KB, no cases have been observed where this limit is exceeded. A GUARANTEE check is included to immediately catch any overflow if it ever occurs. > > Testing: in progress. My testing passed. @bulasevich you need second review. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24965#pullrequestreview-2809825930 PR Comment: https://git.openjdk.org/jdk/pull/24965#issuecomment-2844884339 From duke at openjdk.org Thu May 1 14:17:06 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Thu, 1 May 2025 14:17:06 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v6] In-Reply-To: References: Message-ID: <5WFnnB_4JCdBg0_OLfMnq645RuC6VLhqciD2-N5VZ3I=.534d353c-fb0e-44f5-b1b7-0e77410ded8f@github.com> > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: Fixed git rebase artifacts. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/712cf05d..9ba27686 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=04-05 Stats: 52 lines in 1 file changed: 0 ins; 52 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From duke at openjdk.org Thu May 1 14:31:46 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Thu, 1 May 2025 14:31:46 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v3] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 12:02:25 GMT, Robbin Ehn wrote: > Can you resolve the conflict by merging with master? Fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-2844949777 From asmehra at openjdk.org Thu May 1 15:12:52 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 1 May 2025 15:12:52 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v12] In-Reply-To: <6QqEVIYBum6lJLSUHs_yZ-Zbk1vEDXqqQBYM6usjoJ8=.295c3db4-c458-42c9-a5ea-adc585a645d0@github.com> References: <6QqEVIYBum6lJLSUHs_yZ-Zbk1vEDXqqQBYM6usjoJ8=.295c3db4-c458-42c9-a5ea-adc585a645d0@github.com> Message-ID: On Thu, 1 May 2025 01:51:07 GMT, Vladimir Ivanov wrote: > assert(success || !AOTCodeCache::is_dumping_adapters(), ""); This condtion `!AOTCodeCache::is_dumping_adapters()` in the assert is not very intuitive. I think what we need to assert is future stores in the aot code cache are disabled. So having a method like `AOTCodeCache::is_store_disabled()` would better communicate the intent here. But I don't mind keeping this condition for this initial PR. I will just suggest to add a better assert message like: ```assert(success || !AOTCodeCache::is_dumping_adapters(), "storing of adapter must be disabled");``` And I think we should also be setting `_adapter_caching` to false in `report_load_failure` and `report_store_failure` to be consistent, otherwise we may end up in a situation where `AOTAdapterCaching` is false but `_adapter_caching` is true. In fact, I feel we should only be setting `_adapter_caching` and not `AOTAdapterCaching` in `report_load/store_failure` because `_adapter_caching` is the flag used to gate store/load operations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2070387521 From duke at openjdk.org Thu May 1 15:14:52 2025 From: duke at openjdk.org (Mohamed Issa) Date: Thu, 1 May 2025 15:14:52 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v9] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 14:20:57 GMT, Mohamed Issa wrote: >> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: >> >> Create separate tanh micro-benchmark module to avoid noise in MathBench > > @TobiHartmann @vnkozlov Ok to run this through Oracle test framework before integration? > @missa-prime I launched some internal testing :) @eme64 How are the tests going? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23889#issuecomment-2845037643 From epeter at openjdk.org Thu May 1 15:06:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 May 2025 15:06:05 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v54] In-Reply-To: <-PwC3GNfGkJx8XVuhDelARrUHi6gX6GsqQJYsVzVdvE=.bf82fc8f-41da-4b36-a9e5-90c2cda94875@github.com> References: <-PwC3GNfGkJx8XVuhDelARrUHi6gX6GsqQJYsVzVdvE=.bf82fc8f-41da-4b36-a9e5-90c2cda94875@github.com> Message-ID: On Thu, 1 May 2025 10:41:43 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/rangeinference.cpp line 938: >> >>> 936: >>> 937: template >>> 938: const char* TypeIntHelper::bitname(char* buf, size_t buf_size, U zeros, U ones) { >> >> Wow, these will look incredibly long for 64bit long values, no? >> Well, if it ever gets too much in the way, we can still try to find more compressed representations later. >> Maybe things like: `*..*000` for 8-aligned values. >> Others: `0..0***`, `00*..*`, `1..1***` etc. >> `0..0*..*` would be a little unfortunate as we would lose the position where the bits flip. > > It only gets printed when we print verbose the `TypeInt` instance, for normal `dump` bit information is not present. Hmm, maybe we can find some way to only print the bits if they give "additional information" that is not given by the ranges. Otherwise, we would not print alignment information, and that's quite a shame. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070379738 From kvn at openjdk.org Thu May 1 15:46:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 May 2025 15:46:51 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v12] In-Reply-To: References: <6QqEVIYBum6lJLSUHs_yZ-Zbk1vEDXqqQBYM6usjoJ8=.295c3db4-c458-42c9-a5ea-adc585a645d0@github.com> Message-ID: On Thu, 1 May 2025 15:10:09 GMT, Ashutosh Mehra wrote: >> Maybe add an assert here? >> >> >> bool success = AOTCodeCache::store_code_blob(...); >> assert(success || !AOTCodeCache::is_dumping_adapters(), ""); > >> assert(success || !AOTCodeCache::is_dumping_adapters(), ""); > > This condtion `!AOTCodeCache::is_dumping_adapters()` in the assert is not very intuitive. I think what we need to assert is future stores in the aot code cache are disabled. So having a method like `AOTCodeCache::is_store_disabled()` would better communicate the intent here. But I don't mind keeping this condition for this initial PR. I will just suggest to add a better assert message like: > > ```assert(success || !AOTCodeCache::is_dumping_adapters(), "storing of adapter must be disabled");``` > > And I think we should also be setting `_adapter_caching` to false in `report_load_failure` and `report_store_failure` to be consistent, otherwise we may end up in a situation where `AOTAdapterCaching` is false but `_adapter_caching` is true. In fact, I feel we should only be setting `_adapter_caching` and not `AOTAdapterCaching` in `report_load/store_failure` because `_adapter_caching` is the flag used to gate store/load operations. Thank you, @ashu-mehra. You have good points. I will work on them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2070428589 From epeter at openjdk.org Thu May 1 16:01:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 May 2025 16:01:20 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v55] In-Reply-To: References: Message-ID: <1c6Y052jDqIe32X5irgiw7JK4Q9iNHNqxtC3KBXRLGU=.eb1dae8c-cca7-4431-ad10-90f6510d277d@github.com> On Thu, 1 May 2025 10:48:42 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Emanuel's reviews Made it down through `type.cpp`. Will have to do a pass through the rest later. And then go through the whole thing again. I feel like we are making good progress on this now :) src/hotspot/share/opto/rangeinference.cpp line 694: > 692: return i1; > 693: } > 694: const CT* i2 = t2->try_cast(); And what is the significance of the `dual` state of i1 and `i2`? Should their `dual` state be the same? Can we add some assert here? src/hotspot/share/opto/rangeinference.hpp line 185: > 183: > 184: template > 185: static const Type* int_type_narrow(const CT* nt, const CT* ot); Suggestion: static const Type* int_type_widen(const CT* new_type, const CT* old_type, const CT* limit_type); template static const Type* int_type_narrow(const CT* new_type, const CT* old_type, const CT* limit_type); For consistency with its definition. src/hotspot/share/opto/type.cpp line 1806: > 1804: bool TypeInt::contains(jint i) const { > 1805: juint u = i; > 1806: return i >= _lo && i <= _hi && u >= _ulo && u <= _uhi && _bits.is_satisfied_by(u); Suggestion: return _lo <= i && i <= _hi && _ulo <= u && u <= _uhi && _bits.is_satisfied_by(u); Optional, for readability. src/hotspot/share/opto/type.cpp line 1809: > 1807: } > 1808: > 1809: bool TypeInt::contains(const TypeInt* t) const { Is dual allowed here, or could we assert? src/hotspot/share/opto/type.cpp line 1848: > 1846: // The widen bits must be allowed to run freely through the graph. > 1847: return (new TypeInt(TypeIntPrototype{{ft->_lo, ft->_hi}, {ft->_ulo, ft->_uhi}, ft->_bits}, > 1848: this->_widen, false))->hashcons(); Why not use `TypeInt::make`? src/hotspot/share/opto/type.cpp line 1930: > 1928: bool TypeLong::contains(jlong i) const { > 1929: julong u = i; > 1930: return i >= _lo && i <= _hi && u >= _ulo && u <= _uhi && _bits.is_satisfied_by(u); Suggestion: return _lo <= i && i <= _hi && _ulo <= u && u <= _uhi && _bits.is_satisfied_by(u); For readability. ------------- PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2810044167 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070423848 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070413425 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070420492 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070430474 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070432896 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070437258 From qamai at openjdk.org Thu May 1 16:05:03 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 1 May 2025 16:05:03 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v54] In-Reply-To: References: <-PwC3GNfGkJx8XVuhDelARrUHi6gX6GsqQJYsVzVdvE=.bf82fc8f-41da-4b36-a9e5-90c2cda94875@github.com> Message-ID: On Thu, 1 May 2025 15:03:25 GMT, Emanuel Peter wrote: >> It only gets printed when we print verbose the `TypeInt` instance, for normal `dump` bit information is not present. > > Hmm, maybe we can find some way to only print the bits if they give "additional information" that is not given by the ranges. Otherwise, we would not print alignment information, and that's quite a shame. Maybe that's worth thinking about, the thing is that bit information is exceptionally long and may not be universally interesting. I think we can think more about it later. My rough idea is that it may be better to print it when it is interesting. For example, we can print alignment properties at the memory access. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070451373 From epeter at openjdk.org Thu May 1 16:01:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 May 2025 16:01:20 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v55] In-Reply-To: <1c6Y052jDqIe32X5irgiw7JK4Q9iNHNqxtC3KBXRLGU=.eb1dae8c-cca7-4431-ad10-90f6510d277d@github.com> References: <1c6Y052jDqIe32X5irgiw7JK4Q9iNHNqxtC3KBXRLGU=.eb1dae8c-cca7-4431-ad10-90f6510d277d@github.com> Message-ID: <67-_LntRKD0jtEAOBBT3BkFrXZVkV5m5fQUIAFwVY94=.8b7fe6d8-ec15-4899-bb69-63e3adb43c23@github.com> On Thu, 1 May 2025 15:47:56 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Emanuel's reviews > > src/hotspot/share/opto/type.cpp line 1848: > >> 1846: // The widen bits must be allowed to run freely through the graph. >> 1847: return (new TypeInt(TypeIntPrototype{{ft->_lo, ft->_hi}, {ft->_ulo, ft->_uhi}, ft->_bits}, >> 1848: this->_widen, false))->hashcons(); > > Why not use `TypeInt::make`? Also: when do you have to do `hashcons` and when is it ok without? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070447069 From epeter at openjdk.org Thu May 1 16:01:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 1 May 2025 16:01:20 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v54] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 10:43:04 GMT, Quan Anh Mai wrote: >> No, the subset relation is only reversed in the set of all `CT` instances. In the overall `Type` hierarchy it is still the same, we are trying to find the union of the arguments. If you try to change it to `return dual ? t2 : i1` you would be hit with `=== Meet Not Symmetric ===` errors everywhere :) Similarly, that is why when the 2 arguments are of different kind the result is always `Type::BOTTOM`. > > I have also added an explanation for this function. Hmm ok. I think someone with a deeper knowledge of the type system has to check this. To me this does not make sense, but you clearly have much more knowledge here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070427607 From iveresov at openjdk.org Thu May 1 16:58:39 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 1 May 2025 16:58:39 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v9] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Port 8355915: [leyden] Crash in MDO clearing the unloaded array type ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24886/files - new: https://git.openjdk.org/jdk/pull/24886/files/b937681e..ee6bd11d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=07-08 Stats: 17 lines in 4 files changed: 6 ins; 4 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From vlivanov at openjdk.org Thu May 1 17:12:49 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 1 May 2025 17:12:49 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v12] In-Reply-To: References: <6QqEVIYBum6lJLSUHs_yZ-Zbk1vEDXqqQBYM6usjoJ8=.295c3db4-c458-42c9-a5ea-adc585a645d0@github.com> Message-ID: On Thu, 1 May 2025 15:44:11 GMT, Vladimir Kozlov wrote: >>> assert(success || !AOTCodeCache::is_dumping_adapters(), ""); >> >> This condtion `!AOTCodeCache::is_dumping_adapters()` in the assert is not very intuitive. I think what we need to assert is future stores in the aot code cache are disabled. So having a method like `AOTCodeCache::is_store_disabled()` would better communicate the intent here. But I don't mind keeping this condition for this initial PR. I will just suggest to add a better assert message like: >> >> ```assert(success || !AOTCodeCache::is_dumping_adapters(), "storing of adapter must be disabled");``` >> >> And I think we should also be setting `_adapter_caching` to false in `report_load_failure` and `report_store_failure` to be consistent, otherwise we may end up in a situation where `AOTAdapterCaching` is false but `_adapter_caching` is true. In fact, I feel we should only be setting `_adapter_caching` and not `AOTAdapterCaching` in `report_load/store_failure` because `_adapter_caching` is the flag used to gate store/load operations. > > Thank you, @ashu-mehra. You have good points. I will work on them. FTR I suggested `!AOTCodeCache::is_dumping_adapters()` because that's the guarding check for `AOTCodeCache::store_code_blob()` call. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2070535559 From kvn at openjdk.org Thu May 1 18:39:50 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 May 2025 18:39:50 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v12] In-Reply-To: References: <6QqEVIYBum6lJLSUHs_yZ-Zbk1vEDXqqQBYM6usjoJ8=.295c3db4-c458-42c9-a5ea-adc585a645d0@github.com> Message-ID: On Thu, 1 May 2025 17:10:17 GMT, Vladimir Ivanov wrote: >> Thank you, @ashu-mehra. You have good points. I will work on them. > > FTR I suggested `!AOTCodeCache::is_dumping_adapters()` because that's the guarding check for `AOTCodeCache::store_code_blob()` call. I will add assert message but keep check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24740#discussion_r2070638283 From kvn at openjdk.org Thu May 1 18:53:38 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 May 2025 18:53:38 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v16] In-Reply-To: References: Message-ID: > [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. > > We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. > > Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): > > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) > > > New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): > > > -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters > -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching > -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure > > By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. > This flag is ignored when `AOTCache` is not specified. > > To use AOT adapters follow process described in JEP 483: > > > java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App > java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar > java -XX:AOTCache=app.aot -cp app.jar App > > > There are several new UL flag combinations to trace the AOT code caching process: > > > -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs > > > @ashu-mehra is main author of changes. He implemented adapters caching. > I did main framework (`AOTCodeCache` class) for saving and loading AOT code. > > Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Revert log_warning change. Replace _adapter_caching check with flag check. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24740/files - new: https://git.openjdk.org/jdk/pull/24740/files/286e0e67..325efec0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=14-15 Stats: 20 lines in 3 files changed: 8 ins; 5 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24740.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24740/head:pull/24740 PR: https://git.openjdk.org/jdk/pull/24740 From vlivanov at openjdk.org Thu May 1 19:27:49 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 1 May 2025 19:27:49 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v16] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 18:53:38 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Revert log_warning change. Replace _adapter_caching check with flag check. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24740#pullrequestreview-2810533587 From kvn at openjdk.org Thu May 1 15:53:36 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 May 2025 15:53:36 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v15] In-Reply-To: References: Message-ID: > [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. > > We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. > > Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): > > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) > > > New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): > > > -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters > -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching > -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure > > By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. > This flag is ignored when `AOTCache` is not specified. > > To use AOT adapters follow process described in JEP 483: > > > java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App > java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar > java -XX:AOTCache=app.aot -cp app.jar App > > > There are several new UL flag combinations to trace the AOT code caching process: > > > -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs > > > @ashu-mehra is main author of changes. He implemented adapters caching. > I did main framework (`AOTCodeCache` class) for saving and loading AOT code. > > Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. Vladimir Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Merge branch 'master' into JDK-8350209 - renaming and new assert - Address Vladimir's comments - address Ioi's comments - Remove unused method - Fix C strings caching - Merge branch 'master' into JDK-8350209 - Merge branch 'master' into JDK-8350209 - Downgraded UL as asked. Added synchronization to C strings caching. - Fix message - ... and 6 more: https://git.openjdk.org/jdk/compare/e2ae50d8...286e0e67 ------------- Changes: https://git.openjdk.org/jdk/pull/24740/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24740&range=14 Stats: 3297 lines in 51 files changed: 2827 ins; 200 del; 270 mod Patch: https://git.openjdk.org/jdk/pull/24740.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24740/head:pull/24740 PR: https://git.openjdk.org/jdk/pull/24740 From iveresov at openjdk.org Thu May 1 19:34:34 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 1 May 2025 19:34:34 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v10] In-Reply-To: References: Message-ID: <4uRz9S2VvUHduPnG2Vnh3v-AbRtoB86mM1A9sJBLZ30=.840a3c9b-ada1-4ba4-b8d8-af4e94607556@github.com> > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Fix semantics change from the previous commit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24886/files - new: https://git.openjdk.org/jdk/pull/24886/files/ee6bd11d..014b0ec5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From kvn at openjdk.org Thu May 1 18:53:40 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 May 2025 18:53:40 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v15] In-Reply-To: References: Message-ID: <2EBIRKcMsbi0pptQJnsclCch18L6caIEhHgC3VPlo8I=.e0372c4d-515a-451e-ac4b-dbf2c94e1e43@github.com> On Thu, 1 May 2025 15:53:36 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Merge branch 'master' into JDK-8350209 > - renaming and new assert > - Address Vladimir's comments > - address Ioi's comments > - Remove unused method > - Fix C strings caching > - Merge branch 'master' into JDK-8350209 > - Merge branch 'master' into JDK-8350209 > - Downgraded UL as asked. Added synchronization to C strings caching. > - Fix message > - ... and 6 more: https://git.openjdk.org/jdk/compare/e2ae50d8...286e0e67 I reverted back `log_warning()` to `log_info()` in `report_*_failure()` methods. I removed `AOTCodeCache::_adapter_caching` field and use `AOTAdapterCaching` flag check instead. Setting the flag to `false` in case of a failure make more sense now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24740#issuecomment-2845485585 From asmehra at openjdk.org Thu May 1 19:53:50 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 1 May 2025 19:53:50 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v16] In-Reply-To: References: Message-ID: <47yR6by-cdpqHdDV9K_KcREg9d7Ii6tUvciBv_TWQKI=.42b886e4-befc-440a-bb16-c73df5db5d19@github.com> On Thu, 1 May 2025 18:53:38 GMT, Vladimir Kozlov wrote: >> [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. >> >> We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. >> >> Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): >> >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) >> >> (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed >> 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) >> >> >> New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): >> >> >> -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters >> -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching >> -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure >> >> By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. >> This flag is ignored when `AOTCache` is not specified. >> >> To use AOT adapters follow process described in JEP 483: >> >> >> java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App >> java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar >> java -XX:AOTCache=app.aot -cp app.jar App >> >> >> There are several new UL flag combinations to trace the AOT code caching process: >> >> >> -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs >> >> >> @ashu-mehra is main author of changes. He implemented adapters caching. >> I did main framework (`AOTCodeCache` class) for saving and loading AOT code. >> >> Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Revert log_warning change. Replace _adapter_caching check with flag check. lgtm ------------- Marked as reviewed by asmehra (Committer). PR Review: https://git.openjdk.org/jdk/pull/24740#pullrequestreview-2810579866 From swen at openjdk.org Thu May 1 17:35:56 2025 From: swen at openjdk.org (Shaojin Wen) Date: Thu, 1 May 2025 17:35:56 GMT Subject: RFR: 8356044: Use Double::hashCode and Long::hashCode in java.vm.ci.meta Message-ID: <8SlBOjUBPGyZbR9GxEBZlLzOiNPbdws1GTZ4gGY8v9c=.fdefa26b-52ee-48f9-b814-3981b79f6012@github.com> Similar to #24959 and #24971 and #24987, AbstractProfiledItem/PrimitiveConstant in java.vm.ci.meta can also be simplified similarly. Replace manual bitwise operations in hashCode implementations of java.vm.ci.meta.AbstractProfiledItem/java.vm.ci.meta.PrimitiveConstant with Long::hashCode/Double.hashCode. ------------- Commit messages: - Use Double::hashCode & Long::hashCode Changes: https://git.openjdk.org/jdk/pull/24988/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24988&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356044 Stats: 8 lines in 2 files changed: 0 ins; 5 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24988.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24988/head:pull/24988 PR: https://git.openjdk.org/jdk/pull/24988 From kvn at openjdk.org Thu May 1 21:05:59 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 May 2025 21:05:59 GMT Subject: RFR: 8350209: Preserve adapters in AOT cache [v16] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 19:24:46 GMT, Vladimir Ivanov wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert log_warning change. Replace _adapter_caching check with flag check. > > Looks good. Thank you again, @iwanowww and @ashu-mehra ------------- PR Comment: https://git.openjdk.org/jdk/pull/24740#issuecomment-2845771528 From kvn at openjdk.org Thu May 1 21:06:01 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 1 May 2025 21:06:01 GMT Subject: Integrated: 8350209: Preserve adapters in AOT cache In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 19:11:47 GMT, Vladimir Kozlov wrote: > [JEP 483](https://bugs.openjdk.org/browse/JDK-8315737) preserves class information in AOT cache which helps Java startup performance. > > We should also preserve adapters (i2c, c2i) to further improve performance of class linking where adapters are generated. > > Short running Java application can see several percents improvement. I got 6% improvement when ran `HelloWorld.java` on Linux-x64 Ice Lake CPU (2.5Ghz): > > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0299401 +- 0.0000504 seconds time elapsed ( +- 0.17% ) > > (perf stat -r 100 java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed > 0.0318654 +- 0.0000535 seconds time elapsed ( +- 0.17% ) > > > New diagnostic flags are introduced (use `-XX:+UnlockDiagnosticVMOptions` to unlock them): > > > -XX:+AOTAdapterCaching - Enable or disable saving and restoring i2c2i adapters > -XX:AOTCodeMaxSize=10*M - buffer size in bytes for AOT code caching > -XX:+AbortVMOnAOTCodeFailure - Abort VM on the first occurrence of AOT code caching failure > > By default `AOTAdapterCaching` is `false` and enabled ergonomically when `-XX:AOTCache` is specified. > This flag is ignored when `AOTCache` is not specified. > > To use AOT adapters follow process described in JEP 483: > > > java -XX:AOTMode=record -XX:AOTConfiguration=app.aotconf -cp app.jar App > java -XX:AOTMode=create -XX:AOTConfiguration=app.aotconf -XX:AOTCache=app.aot -cp app.jar > java -XX:AOTCache=app.aot -cp app.jar App > > > There are several new UL flag combinations to trace the AOT code caching process: > > > -Xlog:aot+codecache+init -Xlog:aot+codecache+exit -Xlog:aot+codecache+stubs > > > @ashu-mehra is main author of changes. He implemented adapters caching. > I did main framework (`AOTCodeCache` class) for saving and loading AOT code. > > Tested tier1-6,10, which includes tests with `AOTClassLinking` enabled. Also Xcomp,stress and JCK. This pull request has now been integrated. Changeset: aae2bb62 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/aae2bb62499855e3da33c06547d437e49c91a14b Stats: 3300 lines in 51 files changed: 2830 ins; 200 del; 270 mod 8350209: Preserve adapters in AOT cache Co-authored-by: Ashutosh Mehra Reviewed-by: vlivanov, asmehra, ihse, iklam ------------- PR: https://git.openjdk.org/jdk/pull/24740 From qamai at openjdk.org Fri May 2 00:50:31 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 00:50:31 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v56] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Emanuel's reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/3a2aa8d4..5616c23e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=55 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=54-55 Stats: 47 lines in 4 files changed: 20 ins; 5 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Fri May 2 00:50:32 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 00:50:32 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v55] In-Reply-To: <1c6Y052jDqIe32X5irgiw7JK4Q9iNHNqxtC3KBXRLGU=.eb1dae8c-cca7-4431-ad10-90f6510d277d@github.com> References: <1c6Y052jDqIe32X5irgiw7JK4Q9iNHNqxtC3KBXRLGU=.eb1dae8c-cca7-4431-ad10-90f6510d277d@github.com> Message-ID: On Thu, 1 May 2025 15:40:29 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Emanuel's reviews > > src/hotspot/share/opto/rangeinference.cpp line 694: > >> 692: return i1; >> 693: } >> 694: const CT* i2 = t2->try_cast(); > > And what is the significance of the `dual` state of i1 and `i2`? Should their `dual` state be the same? > Can we add some assert here? I decide to make `TypeIntHelper` a friend of `TypeInt` and use `i1->_is_dual` here instead. I think it makes it clearer. > src/hotspot/share/opto/type.cpp line 1809: > >> 1807: } >> 1808: >> 1809: bool TypeInt::contains(const TypeInt* t) const { > > Is dual allowed here, or could we assert? Added this assert and another for the other `contains` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070966739 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070967027 From qamai at openjdk.org Fri May 2 00:50:32 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 00:50:32 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v55] In-Reply-To: <67-_LntRKD0jtEAOBBT3BkFrXZVkV5m5fQUIAFwVY94=.8b7fe6d8-ec15-4899-bb69-63e3adb43c23@github.com> References: <1c6Y052jDqIe32X5irgiw7JK4Q9iNHNqxtC3KBXRLGU=.eb1dae8c-cca7-4431-ad10-90f6510d277d@github.com> <67-_LntRKD0jtEAOBBT3BkFrXZVkV5m5fQUIAFwVY94=.8b7fe6d8-ec15-4899-bb69-63e3adb43c23@github.com> Message-ID: On Thu, 1 May 2025 15:58:00 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/type.cpp line 1848: >> >>> 1846: // The widen bits must be allowed to run freely through the graph. >>> 1847: return (new TypeInt(TypeIntPrototype{{ft->_lo, ft->_hi}, {ft->_ulo, ft->_uhi}, ft->_bits}, >>> 1848: this->_widen, false))->hashcons(); >> >> Why not use `TypeInt::make`? > > Also: when do you have to do `hashcons` and when is it ok without? > Why not use `TypeInt::make`? Since we know that the bounds are canonical here, just use the constructor directly saves us the need to do canonicalization. > Also: when do you have to do `hashcons` and when is it ok without? We have to use `hashcons` everywhere except at `xdual`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070967658 From pbk at openjdk.org Fri May 2 00:54:07 2025 From: pbk at openjdk.org (Peter B. Kessler) Date: Fri, 2 May 2025 00:54:07 GMT Subject: RFR: 8354347: Increase the default padding size for aarch64 in JDK code. Message-ID: Increase the default padding for C++ fields to avoid false sharing. ------------- Commit messages: - JDK-8354347: Update copyright date. - JDK-8354347: Increase the default padding size for aarch64 in JDK code. Changes: https://git.openjdk.org/jdk/pull/24994/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24994&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354347 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24994.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24994/head:pull/24994 PR: https://git.openjdk.org/jdk/pull/24994 From qamai at openjdk.org Fri May 2 00:58:10 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 00:58:10 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v53] In-Reply-To: References: Message-ID: <4J2bhK1v1UcrBUtW7tSR6KR9lcYZ2IrUYxeic6vwfZg=.2c3489ac-dd4e-4a7b-97d6-5deae5223354@github.com> On Thu, 1 May 2025 07:34:22 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - wording >> - grammar, more details for non-existence > > Can you do the analogue with the else (one violation) case? > That one is probably a bit harder, but I have faith in you ;) @eme64 Thanks, yes I feel that the progress is much better now. Hope we can finish this soon. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2846082085 From qamai at openjdk.org Fri May 2 00:58:10 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 00:58:10 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v54] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 15:43:23 GMT, Emanuel Peter wrote: >> I have also added an explanation for this function. > > Hmm ok. I think someone with a deeper knowledge of the type system has to check this. To me this does not make sense, but you clearly have much more knowledge here. I hope it is clearer now. This function always calculates the union of 2 `Type` instances. It is just that the `TypeInt`s have their subset relationship reversed if `_is_dual` is `true`, which makes it look like we are calculating the `join` but only when both arguments are `TypeInt`s. This comes from the fact that the `meet` of 2 `Type`s is the dual of the join of the 2 duals of the incoming `Type`s. Of course this duality dance is pretty convoluted and I am thinking about getting rid of it and calculating the join like a normal person. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2070971781 From jkarthikeyan at openjdk.org Fri May 2 04:44:37 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 2 May 2025 04:44:37 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v11] In-Reply-To: References: Message-ID: > Hi all, > This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: > > > Baseline Patch > Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement > VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) > VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) > VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) > > > I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Address more comments, make test and benchmark more exhaustive - Merge from master - Fix copyright after merge - Fix copyright - Merge - Implement patch with VectorCastNode::implemented - Merge branch 'master' into vectorize-subword - Address comments from review, refactor test - Add new conversions to benchmark - Fix some tests that now vectorize - ... and 2 more: https://git.openjdk.org/jdk/compare/bd7c7789...8c00ef84 ------------- Changes: https://git.openjdk.org/jdk/pull/23413/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=10 Stats: 549 lines in 9 files changed: 510 ins; 7 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/23413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23413/head:pull/23413 PR: https://git.openjdk.org/jdk/pull/23413 From jkarthikeyan at openjdk.org Fri May 2 05:22:47 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 2 May 2025 05:22:47 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v3] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 09:43:14 GMT, Emanuel Peter wrote: >> @eme64 I think it should be good for another look over! I've addressed your review comments in the last commit. >> >> About the potential for performance degradation, I think it would be unlikely since the code generated by the cast is quite small (as it only needs to truncate or sign-extend) and the patch increases the amount of possible code that can auto-vectorize. The one case that I can think of is that it might cause code that would be otherwise unprofitable to become vectorizable, but that would be because we don't have a cost model yet. > > @jaskarth Let me know if there is anything we can help you with here :) @eme64 Thank you for the comments! I've updated the test and benchmark to be more exhaustive, and applied the suggested changes. For the benchmark, I got these results on my machine: Baseline Patch Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement VectorSubword.byteToChar 1024 avgt 12 252.954 ? 4.129 ns/op 24.219 ? 0.453 ns/op (10.4x) VectorSubword.byteToInt 1024 avgt 12 194.707 ? 3.584 ns/op 38.353 ? 0.637 ns/op (5.07x) VectorSubword.byteToLong 1024 avgt 12 73.645 ? 1.418 ns/op 70.521 ? 0.470 ns/op (no change) VectorSubword.byteToShort 1024 avgt 12 252.647 ? 3.738 ns/op 22.664 ? 0.449 ns/op (11.1x) VectorSubword.charToByte 1024 avgt 12 236.396 ? 3.893 ns/op 228.710 ? 1.967 ns/op (no change) VectorSubword.charToInt 1024 avgt 12 179.673 ? 2.811 ns/op 173.764 ? 1.150 ns/op (no change) VectorSubword.charToLong 1024 avgt 12 184.867 ? 3.079 ns/op 177.999 ? 1.312 ns/op (no change) VectorSubword.charToShort 1024 avgt 12 24.385 ? 1.822 ns/op 22.375 ? 1.980 ns/op (no change) VectorSubword.intToByte 1024 avgt 12 190.949 ? 1.475 ns/op 49.376 ? 1.383 ns/op (3.86x) VectorSubword.intToChar 1024 avgt 12 182.862 ? 3.708 ns/op 44.344 ? 4.513 ns/op (4.12x) VectorSubword.intToLong 1024 avgt 12 76.072 ? 1.153 ns/op 73.382 ? 0.294 ns/op (no change) VectorSubword.intToShort 1024 avgt 12 184.362 ? 1.938 ns/op 45.556 ? 3.323 ns/op (4.04x) VectorSubword.longToByte 1024 avgt 12 150.766 ? 3.475 ns/op 146.651 ? 0.742 ns/op (no change) VectorSubword.longToChar 1024 avgt 12 121.764 ? 1.323 ns/op 117.068 ? 1.891 ns/op (no change) VectorSubword.longToInt 1024 avgt 12 83.761 ? 2.140 ns/op 82.084 ? 0.930 ns/op (no change) VectorSubword.longToShort 1024 avgt 12 132.293 ? 23.046 ns/op 115.883 ? 0.834 ns/op (+ 12.4%) VectorSubword.shortToByte 1024 avgt 12 253.387 ? 5.972 ns/op 27.591 ? 1.311 ns/op (9.18x) VectorSubword.shortToChar 1024 avgt 12 21.446 ? 1.914 ns/op 20.608 ? 1.593 ns/op (no change) VectorSubword.shortToInt 1024 avgt 12 187.109 ? 3.372 ns/op 36.818 ? 0.989 ns/op (5.08x) VectorSubword.shortToLong 1024 avgt 12 75.448 ? 0.930 ns/op 72.835 ? 0.507 ns/op (no change) Interestingly, even though the `longToType` methods are now vectorizable, the performance difference is very small. I think it could be because on my AVX2 machine, it can only process 4 elements per iteration and the overhead of the conversion is fairly high. It's interesting that the `longToInt` method is faster than the rest, I'm curious if something can be done on the backend to improve the speed. I think there could also be potential speed improvements on platforms with wider vectors, like AVX512. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2846335293 From amitkumar at openjdk.org Fri May 2 05:38:47 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 2 May 2025 05:38:47 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: <_WGtnwUi3XQhW-rneGI2mU8_k8Gp8mcOjl4k1mCQaaI=.19264860-3921-417b-afe7-bdd52284afa4@github.com> On Tue, 29 Apr 2025 14:44:43 GMT, Amit Kumar wrote: > How is the behavior of mvc specified when hitting a signal (SIGSEGV or SIGBUS)? Will all Bytes before that place be written? MVC is non interruptible instruction. So either it completes (without exception), or else it is suppressed or nullified (with exception). In the latter case, no modification to memory is visible at all. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2846406211 From epeter at openjdk.org Fri May 2 06:03:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 06:03:51 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v9] In-Reply-To: References: Message-ID: On Sat, 26 Apr 2025 01:06:55 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future. >> >> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. >> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. >> >> For the first set of performance data collected with the new built-in range micro-benchmark, see the tables below. Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In all scenarios, the changes increase throughput values over _baseline1_. The uplift over _baseline1_ is quite significant for the high value (100, 1000, 10000, 100000) scenarios. When comparing against _baseline2_, the changes have significant uplift with the lower value inputs (1, 2, 10, 20, 100). However, they significantly lag behind _baseline2_ when the high value inputs (1000, 10000, 100000) are used. >> >> | Input range(s) | Baseline1 (ops/s) | Change (ops/s) | Change vs baseline1 (%) | >> | :-------------------: | :-----------------: | :----------------: | :-------------------------: | >> | [-1, 1] | 103342 | 103705 | +0.35 | >> | [-2, 2] | 99977 | 100819 | +0.84 | >> | [-10, 10] | 99147 | 100240 | +1.10 | >> | [-20, 20] | 99419 | 99492 |... > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Create separate tanh micro-benchmark module to avoid noise in MathBench Testing is passing, patch looks reasonable to me :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23889#pullrequestreview-2811230567 From epeter at openjdk.org Fri May 2 06:03:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 06:03:52 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v8] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 12:07:28 GMT, Jatin Bhateja wrote: >> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: >> >> Switch to constant double fields with separate micro-benchmarks > > Over all the patch looks good to me now apart from concerns around benchmark, existing Java implementation handles special cases upfront, thereby compromising the performance of most common cases. Java implementation scores above intrinsic in two outlier ranges < 2^-55 and > 22. While intrinsic implementation is performant for a meaty generic range ie. > 2^-55 and < 22.0 > We get around 30% performance uplift from intrinsic implementation over java implementation for the bulky generic input range. > For ranges above 22.0, we now see better performance in comparison to the earlier intrinsic implementation. > > New benchmark shows clear gain for the value range [A][B][C] this patch optimizes. > > > Baseline: > ========= > Benchmark (tanhRangeIndex) Mode Cnt Score Error Units > TanhPerf.TanhPerfConstant.tanhConstDouble1 N/A thrpt 2 117588.175 ops/ms > TanhPerf.TanhPerfConstant.tanhConstDouble21 N/A thrpt 2 117550.954 ops/ms > TanhPerf.TanhPerfConstant.tanhConstDoubleLarge N/A thrpt 2 117580.385 ops/ms => A > TanhPerf.TanhPerfConstant.tanhConstDoubleSmall N/A thrpt 2 403652.485 ops/ms > TanhPerf.TanhPerfConstant.tanhConstDoubleTiny N/A thrpt 2 408909.294 ops/ms > TanhPerf.TanhPerfRanges.tanhNegRangeDouble 0 thrpt 2 397200.032 ops/ms > TanhPerf.TanhPerfRanges.tanhNegRangeDouble 1 thrpt 2 116082.297 ops/ms > TanhPerf.TanhPerfRanges.tanhNegRangeDouble 2 thrpt 2 112213.540 ops/ms > TanhPerf.TanhPerfRanges.tanhNegRangeDouble 3 thrpt 2 433899.459 ops/ms => B > TanhPerf.TanhPerfRanges.tanhPosDoubleRange 0 thrpt 2 396818.181 ops/ms > TanhPerf.TanhPerfRanges.tanhPosDoubleRange 1 thrpt 2 115886.117 ops/ms > TanhPerf.TanhPerfRanges.tanhPosDoubleRange 2 thrpt 2 112048.023 ops/ms > TanhPerf.TanhPerfRanges.tanhPosDoubleRange 3 thrpt 2 440250.930 ops/ms => C > > WithOpt: > ======== > Benchmark (tanhRangeIndex) Mode Cnt Score Error Units > TanhPerf.TanhPerfConstant.tanhConstDouble1 N/A thrpt 2 116459.753 ops/ms > TanhPerf.TanhPerfConstant.tanhConstDouble21 N/... @jatin-bhateja @missa-prime Is https://bugs.openjdk.org/browse/JDK-8355238 related to this bug here? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23889#issuecomment-2846434350 From epeter at openjdk.org Fri May 2 06:31:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 06:31:51 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v4] In-Reply-To: References: Message-ID: On Thu, 1 May 2025 07:32:09 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Update the jtreg test > - Merge branch 'master' into JDK-8354242 > - Addressed some review comments > > 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. > 2. Improve code comments. > - Merge branch 'master' into JDK-8354242 > - Merge branch 'master' into JDK-8354242 > - 8354242: VectorAPI: combine vector not operation with compare > > This patch optimizes the following patterns: > For integer types: > ``` > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > ``` > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > ``` > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > ``` > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: > With option `-XX:UseSVE=2`: > ``` > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.5... @erifan thanks for updating the tests! Now I had a quick look at the VM code. My biggest observation is this: Wrapping `VectorNode::Ideal` somewhere in the middle of your new optimization is going to make future optimizations here much harder. How would they check their conditions next to yours? That would be quite a mess. I suggest you do this: - `XorVNode::Ideal` does - checks `in1 == in2` case - calls a method called `XorVNode::Ideal_XorV_VectorMaskCmp`. Check if it succeeded, i.e. returns `nullptr`. - ... future optimizations could go here ... - Finally, i.e. none of the optimizations above worked: call `VectorNode::Ideal` Then you pack all your new logic here into `XorVNode::Ideal_XorV_VectorMaskCmp`. You can also find a better name, it is just what I came up with just now. This gives us a much more **modular** design, and it is easier to add another new optimization to `XorVNode::Ideal`. It is easy to change the precedence of the optimizations by just changing the order, etc. src/hotspot/share/opto/vectornode.cpp line 2216: > 2214: in2->is_predicated_vector()) { > 2215: with_predicated = true; > 2216: } Suggestion: bool with_predicated = is_predicated_vector() || in1->is_predicated_vector() || in2->is_predicated_vector(); src/hotspot/share/opto/vectornode.cpp line 2224: > 2222: // => (VectorMaskCmp src1 src2 ncond) > 2223: // cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > 2224: // negative comparison of cond. Suggestion: // cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt. // ncond is the negative comparison of cond. I was getting lost in all the commas. src/hotspot/share/opto/vectornode.cpp line 2248: > 2246: !((VectorMaskCmpNode*) in1)->predicate_can_be_inverted() || > 2247: !VectorNode::is_all_ones_vector(in2)) { > 2248: return VectorNode::Ideal(phase, can_reshape); Hmm, so this is really the "else" case, if your optimization does not succeed, right? Wrapping `VectorNode::Ideal` somewhere in the middle is going to make future optimizations here much harder. How would they check their conditions next to yours? That would be quite a mess. I suggest you do this: - `XorVNode::Ideal` does - checks `in1 == in2` case - calls a method called `XorVNode::Ideal_XorV_VectorMaskCmp`. Check if it succeeded, i.e. returns `nullptr`. - Finally, i.e. none of the optimizations above worked: call `VectorNode::Ideal` Then you pack all your new logic here into `XorVNode::Ideal_XorV_VectorMaskCmp`. You can also find a better name, it is just what I came up with just now. This gives us a much more **modular** design, and it is easier to add another new optimization to `XorVNode::Ideal`. It is easy to change the precedence of the optimizations by just changing the order, etc. ------------- PR Review: https://git.openjdk.org/jdk/pull/24674#pullrequestreview-2811246433 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2071147043 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2071148329 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2071156329 From epeter at openjdk.org Fri May 2 06:31:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 06:31:51 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v4] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 06:14:19 GMT, Emanuel Peter wrote: >> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Update the jtreg test >> - Merge branch 'master' into JDK-8354242 >> - Addressed some review comments >> >> 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >> 2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >> This patch optimizes the following patterns: >> For integer types: >> ``` >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> ``` >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> ``` >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> ``` >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: >> With option `-XX:UseSVE=2`: >> ``` >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 1... > > src/hotspot/share/opto/vectornode.cpp line 2216: > >> 2214: in2->is_predicated_vector()) { >> 2215: with_predicated = true; >> 2216: } > > Suggestion: > > bool with_predicated = is_predicated_vector() || > in1->is_predicated_vector() || > in2->is_predicated_vector(); Would that not be easier to read? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2071147181 From epeter at openjdk.org Fri May 2 06:41:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 06:41:46 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v3] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 05:19:41 GMT, Jasmine Karthikeyan wrote: >> @jaskarth Let me know if there is anything we can help you with here :) > > @eme64 Thank you for the comments! I've updated the test and benchmark to be more exhaustive, and applied the suggested changes. For the benchmark, I got these results on my machine: > > Baseline Patch > Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement > VectorSubword.byteToChar 1024 avgt 12 252.954 ? 4.129 ns/op 24.219 ? 0.453 ns/op (10.4x) > VectorSubword.byteToInt 1024 avgt 12 194.707 ? 3.584 ns/op 38.353 ? 0.637 ns/op (5.07x) > VectorSubword.byteToLong 1024 avgt 12 73.645 ? 1.418 ns/op 70.521 ? 0.470 ns/op (no change) > VectorSubword.byteToShort 1024 avgt 12 252.647 ? 3.738 ns/op 22.664 ? 0.449 ns/op (11.1x) > VectorSubword.charToByte 1024 avgt 12 236.396 ? 3.893 ns/op 228.710 ? 1.967 ns/op (no change) > VectorSubword.charToInt 1024 avgt 12 179.673 ? 2.811 ns/op 173.764 ? 1.150 ns/op (no change) > VectorSubword.charToLong 1024 avgt 12 184.867 ? 3.079 ns/op 177.999 ? 1.312 ns/op (no change) > VectorSubword.charToShort 1024 avgt 12 24.385 ? 1.822 ns/op 22.375 ? 1.980 ns/op (no change) > VectorSubword.intToByte 1024 avgt 12 190.949 ? 1.475 ns/op 49.376 ? 1.383 ns/op (3.86x) > VectorSubword.intToChar 1024 avgt 12 182.862 ? 3.708 ns/op 44.344 ? 4.513 ns/op (4.12x) > VectorSubword.intToLong 1024 avgt 12 76.072 ? 1.153 ns/op 73.382 ? 0.294 ns/op (no change) > VectorSubword.intToShort 1024 avgt 12 184.362 ? 1.938 ns/op 45.556 ? 3.323 ns/op (4.04x) > VectorSubword.longToByte 1024 avgt 12 150.766 ? 3.475 ns/op 146.651 ? 0.742 ns/op (no change) > VectorSubword.longToChar 1024 avgt 12 121.764 ? 1.323 ns/op 117.068 ? 1.891 ns/op (no change) > VectorSubword.longToInt 1024 avgt 12 83.761 ? 2.140 ns/op 82.084 ? 0.930 ns/op (no change) > VectorSubword.longToShort 1024 avgt 12 132.293 ? 23.046 ns/op 115.883 ? 0.834 ns/op (+ 12.4%) > VectorSubword.shortToByte 1024 avgt 12 253.387 ? 5.972 ns/op 27.591 ? 1.311 ns/op (9.18x) > VectorSubword.shortToChar 1024 avgt 12 21.446 ? 1.914 ns/op 20.608 ? 1.593 ns/op (no change) > VectorSubword.shortToInt 1024 avgt 12 187.109 ? 3.372 ns/op 36.818 ? 0.989 ns/op (5.08x) > VectorSubword.shortToLong 1024 avgt 12 75.448 ? 0.930 ns/op 72.835 ? 0.507 ns/op (no change) > > Interestingly, eve... @jaskarth Nice, thanks for posting the more exhaustive benchmark! I'm glad that this allowed us to find some more things to investigate later :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2846482345 From epeter at openjdk.org Fri May 2 06:46:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 06:46:49 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v11] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 04:44:37 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: >> >> >> Baseline Patch >> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement >> VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) >> VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) >> VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) >> >> >> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Address more comments, make test and benchmark more exhaustive > - Merge from master > - Fix copyright after merge > - Fix copyright > - Merge > - Implement patch with VectorCastNode::implemented > - Merge branch 'master' into vectorize-subword > - Address comments from review, refactor test > - Add new conversions to benchmark > - Fix some tests that now vectorize > - ... and 2 more: https://git.openjdk.org/jdk/compare/bd7c7789...8c00ef84 src/hotspot/share/opto/superword.cpp line 2361: > 2359: > 2360: // Subword cast: Element sizes differ, but the platform supports a cast to change the def shape to the use shape. > 2361: Suggestion: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2071170438 From epeter at openjdk.org Fri May 2 06:53:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 06:53:46 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v3] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 05:19:41 GMT, Jasmine Karthikeyan wrote: >> @jaskarth Let me know if there is anything we can help you with here :) > > @eme64 Thank you for the comments! I've updated the test and benchmark to be more exhaustive, and applied the suggested changes. For the benchmark, I got these results on my machine: > > Baseline Patch > Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement > VectorSubword.byteToChar 1024 avgt 12 252.954 ? 4.129 ns/op 24.219 ? 0.453 ns/op (10.4x) > VectorSubword.byteToInt 1024 avgt 12 194.707 ? 3.584 ns/op 38.353 ? 0.637 ns/op (5.07x) > VectorSubword.byteToLong 1024 avgt 12 73.645 ? 1.418 ns/op 70.521 ? 0.470 ns/op (no change) > VectorSubword.byteToShort 1024 avgt 12 252.647 ? 3.738 ns/op 22.664 ? 0.449 ns/op (11.1x) > VectorSubword.charToByte 1024 avgt 12 236.396 ? 3.893 ns/op 228.710 ? 1.967 ns/op (no change) > VectorSubword.charToInt 1024 avgt 12 179.673 ? 2.811 ns/op 173.764 ? 1.150 ns/op (no change) > VectorSubword.charToLong 1024 avgt 12 184.867 ? 3.079 ns/op 177.999 ? 1.312 ns/op (no change) > VectorSubword.charToShort 1024 avgt 12 24.385 ? 1.822 ns/op 22.375 ? 1.980 ns/op (no change) > VectorSubword.intToByte 1024 avgt 12 190.949 ? 1.475 ns/op 49.376 ? 1.383 ns/op (3.86x) > VectorSubword.intToChar 1024 avgt 12 182.862 ? 3.708 ns/op 44.344 ? 4.513 ns/op (4.12x) > VectorSubword.intToLong 1024 avgt 12 76.072 ? 1.153 ns/op 73.382 ? 0.294 ns/op (no change) > VectorSubword.intToShort 1024 avgt 12 184.362 ? 1.938 ns/op 45.556 ? 3.323 ns/op (4.04x) > VectorSubword.longToByte 1024 avgt 12 150.766 ? 3.475 ns/op 146.651 ? 0.742 ns/op (no change) > VectorSubword.longToChar 1024 avgt 12 121.764 ? 1.323 ns/op 117.068 ? 1.891 ns/op (no change) > VectorSubword.longToInt 1024 avgt 12 83.761 ? 2.140 ns/op 82.084 ? 0.930 ns/op (no change) > VectorSubword.longToShort 1024 avgt 12 132.293 ? 23.046 ns/op 115.883 ? 0.834 ns/op (+ 12.4%) > VectorSubword.shortToByte 1024 avgt 12 253.387 ? 5.972 ns/op 27.591 ? 1.311 ns/op (9.18x) > VectorSubword.shortToChar 1024 avgt 12 21.446 ? 1.914 ns/op 20.608 ? 1.593 ns/op (no change) > VectorSubword.shortToInt 1024 avgt 12 187.109 ? 3.372 ns/op 36.818 ? 0.989 ns/op (5.08x) > VectorSubword.shortToLong 1024 avgt 12 75.448 ? 0.930 ns/op 72.835 ? 0.507 ns/op (no change) > > Interestingly, eve... @jaskarth This looks really solid now. Another thought I had: Generally, vectorization is faster because we use fewer instructions. Example: if you had 8 loads, 4 adds and 4 stores, you now have 2 loads, 1 add, and 1 store. The cost per operation is roughly the same, but there are fewer operations now, so that makes it faster. It is of course a little more complicated, but still a good heuristic. But as soon as vectorization requires additional instructions, such as reductions (with shuffle inside) or your your subword conversions now, then that is additional cost. Reductions are not always profitable with vectorization, sometimes the shuffles make it more expensive than the scalar loop. I wonder if there could be a similar edge case with these subword conversions, which might actually lead to a small regression. I'm not saying this should be a blocker here, but I'm interested in this for my future work on the cost model, we might have some interesting cases here that I'll want to evaluate: https://bugs.openjdk.org/browse/JDK-8340093 If you can find any such regression case with subword casts, then it would be great if we could keep track of it, so I can try to address it with the cost model later :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2846499207 From epeter at openjdk.org Fri May 2 06:58:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 06:58:46 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v3] In-Reply-To: References: Message-ID: <0_Z_GdHfrASV-JhPNF7k5HZ0STgjouFtQKN3jDnWYjE=.ba07e18a-caef-4145-b9ca-c77ceec3456e@github.com> On Fri, 2 May 2025 05:19:41 GMT, Jasmine Karthikeyan wrote: >> @jaskarth Let me know if there is anything we can help you with here :) > > @eme64 Thank you for the comments! I've updated the test and benchmark to be more exhaustive, and applied the suggested changes. For the benchmark, I got these results on my machine: > > Baseline Patch > Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement > VectorSubword.byteToChar 1024 avgt 12 252.954 ? 4.129 ns/op 24.219 ? 0.453 ns/op (10.4x) > VectorSubword.byteToInt 1024 avgt 12 194.707 ? 3.584 ns/op 38.353 ? 0.637 ns/op (5.07x) > VectorSubword.byteToLong 1024 avgt 12 73.645 ? 1.418 ns/op 70.521 ? 0.470 ns/op (no change) > VectorSubword.byteToShort 1024 avgt 12 252.647 ? 3.738 ns/op 22.664 ? 0.449 ns/op (11.1x) > VectorSubword.charToByte 1024 avgt 12 236.396 ? 3.893 ns/op 228.710 ? 1.967 ns/op (no change) > VectorSubword.charToInt 1024 avgt 12 179.673 ? 2.811 ns/op 173.764 ? 1.150 ns/op (no change) > VectorSubword.charToLong 1024 avgt 12 184.867 ? 3.079 ns/op 177.999 ? 1.312 ns/op (no change) > VectorSubword.charToShort 1024 avgt 12 24.385 ? 1.822 ns/op 22.375 ? 1.980 ns/op (no change) > VectorSubword.intToByte 1024 avgt 12 190.949 ? 1.475 ns/op 49.376 ? 1.383 ns/op (3.86x) > VectorSubword.intToChar 1024 avgt 12 182.862 ? 3.708 ns/op 44.344 ? 4.513 ns/op (4.12x) > VectorSubword.intToLong 1024 avgt 12 76.072 ? 1.153 ns/op 73.382 ? 0.294 ns/op (no change) > VectorSubword.intToShort 1024 avgt 12 184.362 ? 1.938 ns/op 45.556 ? 3.323 ns/op (4.04x) > VectorSubword.longToByte 1024 avgt 12 150.766 ? 3.475 ns/op 146.651 ? 0.742 ns/op (no change) > VectorSubword.longToChar 1024 avgt 12 121.764 ? 1.323 ns/op 117.068 ? 1.891 ns/op (no change) > VectorSubword.longToInt 1024 avgt 12 83.761 ? 2.140 ns/op 82.084 ? 0.930 ns/op (no change) > VectorSubword.longToShort 1024 avgt 12 132.293 ? 23.046 ns/op 115.883 ? 0.834 ns/op (+ 12.4%) > VectorSubword.shortToByte 1024 avgt 12 253.387 ? 5.972 ns/op 27.591 ? 1.311 ns/op (9.18x) > VectorSubword.shortToChar 1024 avgt 12 21.446 ? 1.914 ns/op 20.608 ? 1.593 ns/op (no change) > VectorSubword.shortToInt 1024 avgt 12 187.109 ? 3.372 ns/op 36.818 ? 0.989 ns/op (5.08x) > VectorSubword.shortToLong 1024 avgt 12 75.448 ? 0.930 ns/op 72.835 ? 0.507 ns/op (no change) > > Interestingly, eve... @jaskarth I'm running some internal testing now, feel free to ping me after the weekend for results! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2846505907 From mchevalier at openjdk.org Fri May 2 07:21:45 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 2 May 2025 07:21:45 GMT Subject: RFR: 8354284: Add more compiler test folders to tier1 runs In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 16:56:54 GMT, Vladimir Kozlov wrote: > I would not do that. First compilation of run() will be OSR and we will never run it fully compiled. You need several iterations in main() to trigger and use normal compilation. But 100 iterations should be fine. This should put execution time under 1 sec. I thought so too, but actually, `run()` is OSR compiled at first (which doesn't reproduce) and then fully compiled (where the crash happens). I don't understand why, but I can see it happening. Of course, I can also make a hundred iterations, that is cheap enough. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24817#issuecomment-2846539478 From jbhateja at openjdk.org Fri May 2 07:50:27 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 2 May 2025 07:50:27 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v7] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Addressing code refactoring comments - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352675 - Fix windows build - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352675 - Add dynamic sized feature vectors - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352675 - dropping unneeded feature enabling/checks - 8352675: Support Intel AVX10 converged vector ISA feature detection ------------- Changes: https://git.openjdk.org/jdk/pull/24329/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=06 Stats: 545 lines in 27 files changed: 315 ins; 14 del; 216 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From duke at openjdk.org Fri May 2 07:54:31 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Fri, 2 May 2025 07:54:31 GMT Subject: RFR: 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list [v2] In-Reply-To: References: Message-ID: > Issue: The assertion failure , `assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list`, occurs when [loop striping mining ](https://bugs.openjdk.org/browse/JDK-8186027)may create a [MaxL](https://bugs.openjdk.org/browse/JDK-8324655) after macro expansion. > > Analysis : Before the macro nodes are expanded in` expand_macro_nodes`, there is a process where nodes from the macro list are eliminated. This also includes elimination of any `OuterStripMinedLoop` node in the macro list. The bug occurs due to the refining of the strip mined loop in `adjust_strip_mined_loop` function just before it is eliminated. In this case, a` MaxL` node is added to the macro list in `adjust_strip_mined_loop`. > > Fix: The fix involves performing the refining of the strip mined loop before elimination process. More specifically, moving the `adjust_strip_mined_loop` function outside the elimination loop. > > Improvement: The process of eliminating macro nodes by calling `eliminate_macro_nodes` and performing additional Opaque and LoopLimit nodes elimination in ` expand_macro_nodes` is unintuitive as suggested in [JDK-8325478 ](https://bugs.openjdk.org/browse/JDK-8325478) and the current fix should be moved along with the other elimination code. Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: added the test to JTREG ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24890/files - new: https://git.openjdk.org/jdk/pull/24890/files/5c24df77..8d045cb1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24890&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24890&range=00-01 Stats: 49 lines in 1 file changed: 49 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24890/head:pull/24890 PR: https://git.openjdk.org/jdk/pull/24890 From mdoerr at openjdk.org Fri May 2 08:04:49 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 2 May 2025 08:04:49 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 06:09:25 GMT, Amit Kumar wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > improved mvc implementation Interesting. Thanks for finding it out! So, this makes the behavior different to all other platforms which write all bytes before the address which is not writable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2846607572 From mchevalier at openjdk.org Fri May 2 08:07:57 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 2 May 2025 08:07:57 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls Message-ID: A first part toward a better support of pure functions. ## Pure Functions Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. ## Scope We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. ## Implementation Overview We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! IR framework and IGV needed a little bit of fixing. Thanks, Marc ------------- Commit messages: - Clean up IRNode - cleanup - hash and cmp - get_early_ctrl_for_expensive - depends_only_on_test - depends_only_on_test - First try Changes: https://git.openjdk.org/jdk/pull/24966/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24966&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347901 Stats: 694 lines in 15 files changed: 449 ins; 226 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/24966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24966/head:pull/24966 PR: https://git.openjdk.org/jdk/pull/24966 From jbhateja at openjdk.org Fri May 2 08:08:27 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 2 May 2025 08:08:27 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v8] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Updating comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/04de0289..4a614be8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=06-07 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From epeter at openjdk.org Fri May 2 08:12:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 08:12:51 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v11] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 04:44:37 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: >> >> >> Baseline Patch >> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement >> VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) >> VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) >> VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) >> >> >> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Address more comments, make test and benchmark more exhaustive > - Merge from master > - Fix copyright after merge > - Fix copyright > - Merge > - Implement patch with VectorCastNode::implemented > - Merge branch 'master' into vectorize-subword > - Address comments from review, refactor test > - Add new conversions to benchmark > - Fix some tests that now vectorize > - ... and 2 more: https://git.openjdk.org/jdk/compare/bd7c7789...8c00ef84 I ran the benchmarks on my `avx512` laptop. I noticed it would take 16 min, so I cut it short like this, so it only takes a little more than 5 min: 35 @Warmup(iterations = 2, time = 1, timeUnit = TimeUnit.SECONDS) 36 @Measurement(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS) ![image](https://github.com/user-attachments/assets/2ad2a043-55eb-4970-b597-9c30425d18be) @jaskarth I think your "improvement" in `VectorSubword.longToShort` is just due to noise on your master run, the improvement lies within the error. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2846621819 From epeter at openjdk.org Fri May 2 08:17:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 08:17:08 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v56] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 00:50:31 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Emanuel's reviews Comments for `type.hpp` src/hotspot/share/opto/type.hpp line 293: > 291: > 292: template > 293: const TypeClass* try_cast() const; This is a way to get to the `isa_...` via templated types, right? I wonder if it might be better to name it `isa_???`, or even just `isa`, so that it is clearer that it is about the `isa` query. Currently, it is not very clear what it does, until you look at the implementation. That's a bit unfortunate. src/hotspot/share/opto/type.hpp line 636: > 634: * of a TypeInt iff: > 635: * > 636: * v >= _lo && v <= _hi && juint(v) >= _ulo && juint(v) <= _uhi && _bits.is_satisfied_by(v) Suggestion: * A TypeInt represents a set of non-empty jint values. A jint v is an element * of a TypeInt iff: * _lo <= v <= _hi && * _ulo <= juint(v) <= _uhi && * _bits.is_satisfied_by(v) Would be equivalent, but more readable, right? src/hotspot/share/opto/type.hpp line 645: > 643: * > 644: * Then, t1 and t2 both represent the set {3, 5}. We can also see that the > 645: * constraints of t2 are tightest possible. I.e there exists no TypeInt t3 Suggestion: * constraints of t2 are the tightest possible. I.e there exists no TypeInt t3 src/hotspot/share/opto/type.hpp line 649: > 647: * > 648: * t3._lo > t2._lo || t3._hi < t2._hi || t3._ulo > t2._ulo || t3._uhi < t2._uhi || > 649: * (t3._bits._zeros &~ t2._bis._zeros) != 0 || (t3._bits._ones &~ t2._bits._ones) != 0 Suggestion: * which also represents {3, 5} such that any of these would be true: * 1) t3._lo > t2._lo * 2) t3._hi < t2._hi * 3) t3._ulo > t2._ulo * 4) t3._uhi < t2._uhi * 5) (t3._bits._zeros &~ t2._bis._zeros) != 0 * 6) (t3._bits._ones &~ t2._bits._ones) != 0 src/hotspot/share/opto/type.hpp line 654: > 652: * t3._bits._zeros and t2._bits._zeros is not empty, which means that the > 653: * bits in t3._bits._zeros is not a subset of those in t2._bits._zeros, the > 654: * same applies to _bits._ones Then you can refer to the `condition 5)` and `condition 6)` and the reader can find them quicker, and does not have to worry if you are counting 0 or 1 based ;) src/hotspot/share/opto/type.hpp line 657: > 655: * > 656: * As a result, every TypeInt is canonicalized to its tightest form upon > 657: * construction. This makes it easier to reason about them in optimizations. As a result of what? I think this could be better: Suggestion: * To simplify reasoning about the types in optimizations, we canonicalize every * TypeInt to its tightest form, already at construction. src/hotspot/share/opto/type.hpp line 702: > 700: * > 701: * Proof of lemma 3: > 702: * Lemma 3.1: For 2 jint value x, y such that they are both >= 0 or both < 0. Suggestion: * Lemma 3.1: Given 2 jint values x, y where either both >= 0 or both < 0. src/hotspot/share/opto/type.hpp line 708: > 706: * I.e. x <= y in the signed domain iff x <= y in the unsigned domain > 707: * > 708: * Then, we have: The two `Then` are confusing me a little. The assumption is only the first sentence, not `x <= y iff juint(x) <= juint(y)`, right? Hmm, ah you have an additional Lemma 3.1 here. But it is a little unclear where that starts and ends. Maybe that can be fixed with some indentation? src/hotspot/share/opto/type.hpp line 714: > 712: * a. t._lo >= 0, we have: > 713: * > 714: * 0 <= t_lo <= jint(t._ulo) (lemma 2.1) Suggestion: * a. 0 <= t._lo, we have: * * 0 <= t._lo <= jint(t._ulo) (lemma 2.1) src/hotspot/share/opto/type.hpp line 735: > 733: * b. t._hi < 0. Similarly, t._lo == jint(t._ulo) and t._hi == jint(t._uhi) > 734: * > 735: * c. t._lo < 0, t._hi >= 0. Suggestion: * c. t._lo < 0, 0 <= t._hi. I like ordering numbers according to their value :) src/hotspot/share/opto/type.hpp line 745: > 743: * > 744: * In this case, all elements of t belongs to either [t._lo, jint(t._uhi)] or > 745: * [jint(t._ulo), t._hi]. When you say "all elements belong to x or y", one might misunderstand that they all are in one range and the other is empty. Suggestion: * In this case, each element of t belongs to either [t._lo, jint(t._uhi)] or * [jint(t._ulo), t._hi]. Here we look at them individually, and so it sounds a little less misleading to me. But that may just be my brain that sees it like this ? src/hotspot/share/opto/type.hpp line 772: > 770: private: > 771: TypeInt(const TypeIntPrototype& t, int w, bool dual); > 772: static const Type* try_make(const TypeIntPrototype& t, int widen, bool dual); Just an idea, very optional. `try_make` does not say what it does when it fails. Exception? `nullptr`? `TOP`? You you could rename it to `try_make_else_null` or `make_or_null`. Something like that. ------------- PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2811316183 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071192824 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071196300 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071197526 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071201252 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071202049 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071205778 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071210512 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071215527 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071222242 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071229350 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071232432 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071242488 From epeter at openjdk.org Fri May 2 08:17:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 08:17:09 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v56] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 07:29:41 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Emanuel's reviews > > src/hotspot/share/opto/type.hpp line 708: > >> 706: * I.e. x <= y in the signed domain iff x <= y in the unsigned domain >> 707: * >> 708: * Then, we have: > > The two `Then` are confusing me a little. The assumption is only the first sentence, not `x <= y iff juint(x) <= juint(y)`, right? > > Hmm, ah you have an additional Lemma 3.1 here. But it is a little unclear where that starts and ends. Maybe that can be fixed with some indentation? Or you just cram in the Lemma 3.1 before you start the proof of Lemma 3. That would probably be cleanest. > src/hotspot/share/opto/type.hpp line 745: > >> 743: * >> 744: * In this case, all elements of t belongs to either [t._lo, jint(t._uhi)] or >> 745: * [jint(t._ulo), t._hi]. > > When you say "all elements belong to x or y", one might misunderstand that they all are in one range and the other is empty. > Suggestion: > > * In this case, each element of t belongs to either [t._lo, jint(t._uhi)] or > * [jint(t._ulo), t._hi]. > > Here we look at them individually, and so it sounds a little less misleading to me. But that may just be my brain that sees it like this ? Anyway: either all elements belong or each element belongs ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071216411 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071233010 From mli at openjdk.org Fri May 2 08:33:54 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 2 May 2025 08:33:54 GMT Subject: Integrated: 8355980: RISC-V: remove vmclr_m before vmsXX and vmfXX In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 14:46:23 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple improvement? > > By rvv spec, > "integer compare instructions write 1 to the destination mask register element if the comparison evaluates to true, and 0 otherwise." > "These vector FP compare instructions compare two source operands and write the comparison result to a mask register. " > > So, it's not always necessary to clear the mask register before vector comparison operation, e.g. when `vm != Assembler::v0_t`. > > Thanks! This pull request has now been integrated. Changeset: 811f117c Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/811f117ce396ac7aafd71f5618f2de96bb96f311 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod 8355980: RISC-V: remove vmclr_m before vmsXX and vmfXX Reviewed-by: dzhang, fyang ------------- PR: https://git.openjdk.org/jdk/pull/24968 From epeter at openjdk.org Fri May 2 08:33:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 08:33:29 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v11] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 04:44:37 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: >> >> >> Baseline Patch >> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement >> VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) >> VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) >> VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) >> >> >> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Address more comments, make test and benchmark more exhaustive > - Merge from master > - Fix copyright after merge > - Fix copyright > - Merge > - Implement patch with VectorCastNode::implemented > - Merge branch 'master' into vectorize-subword > - Address comments from review, refactor test > - Add new conversions to benchmark > - Fix some tests that now vectorize > - ... and 2 more: https://git.openjdk.org/jdk/compare/bd7c7789...8c00ef84 Now I ran the same benchmark, but the baseline is without vectorization. ![image](https://github.com/user-attachments/assets/41918860-ad8a-489f-8ca0-f1a84b01f3e6) Generally, we can see that vectorization is now leading to speedups in all cases, except: - `charToByte`: seems to have a 30% slowdown with vectorization, yikes! - `charToInt`: why? - `charToLong`: why? @jaskarth You don't have to investigate that now, or even at all. Your PR only improves the situation. But if you want, it would be nice if you could check if you get similar results on your machine. And if yes, we could see what to do about these next :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2846656946 From mchevalier at openjdk.org Fri May 2 08:43:23 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 2 May 2025 08:43:23 GMT Subject: RFR: 8354284: Add more compiler test folders to tier1 runs [v2] In-Reply-To: References: Message-ID: > Some folders in jtreg/compiler have been reported not to be run in any tier, while tier1 was probably intended, but the tier definition was mistakenly not updated. I've checked which folders are not referenced into `TEST.groups`. > > The unmentioned ones: > - `ccp` > - `ciReplay` > - `ciTypeFlow` > - `compilercontrol` > - `debug` > - `oracle` > - `predicates` > - `print` > - `relocations` > - `sharedstubs` > - `splitif` > - `tiered` > - `whitebox` > > And those, that are not test folders: > - `lib` > - `patches` > - `testlibraries` > > I'm adding `ccp`, `ciTypeFlow`, `predicates`, `sharedstubs` and `splitif` to tier1. > > The other folders seems to have been around for very long (since at least mid-2021). It's not clear how meaningful it'd be to add them/what the intent from them was. I've rather focused on the recently(-ish) added folders, that one forgot to put in a tier when adding it. > > Feel free to tell if other folders should be included (and in which tier). > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: speed up slowest test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24817/files - new: https://git.openjdk.org/jdk/pull/24817/files/7918a832..3232e5b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24817&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24817&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24817.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24817/head:pull/24817 PR: https://git.openjdk.org/jdk/pull/24817 From mchevalier at openjdk.org Fri May 2 08:43:24 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 2 May 2025 08:43:24 GMT Subject: RFR: 8354284: Add more compiler test folders to tier1 runs In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:44:04 GMT, Marc Chevalier wrote: > Some folders in jtreg/compiler have been reported not to be run in any tier, while tier1 was probably intended, but the tier definition was mistakenly not updated. I've checked which folders are not referenced into `TEST.groups`. > > The unmentioned ones: > - `ccp` > - `ciReplay` > - `ciTypeFlow` > - `compilercontrol` > - `debug` > - `oracle` > - `predicates` > - `print` > - `relocations` > - `sharedstubs` > - `splitif` > - `tiered` > - `whitebox` > > And those, that are not test folders: > - `lib` > - `patches` > - `testlibraries` > > I'm adding `ccp`, `ciTypeFlow`, `predicates`, `sharedstubs` and `splitif` to tier1. > > The other folders seems to have been around for very long (since at least mid-2021). It's not clear how meaningful it'd be to add them/what the intent from them was. I've rather focused on the recently(-ish) added folders, that one forgot to put in a tier when adding it. > > Feel free to tell if other folders should be included (and in which tier). > > Thanks, > Marc I've pushed the suggested change. The test still passes, longest result was 12s, from 40s without fix, 8s with my more radical fix: so it's still a big improvement. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24817#issuecomment-2846675598 From aph at openjdk.org Fri May 2 08:43:46 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 2 May 2025 08:43:46 GMT Subject: RFR: 8354347: Increase the default padding size for aarch64 in JDK code. In-Reply-To: References: Message-ID: On Fri, 2 May 2025 00:49:45 GMT, Peter B. Kessler wrote: > Increase the default padding for C++ fields to avoid false sharing. Thanks. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24994#pullrequestreview-2811475908 From epeter at openjdk.org Fri May 2 08:48:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 08:48:49 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v3] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 05:19:41 GMT, Jasmine Karthikeyan wrote: >> @jaskarth Let me know if there is anything we can help you with here :) > > @eme64 Thank you for the comments! I've updated the test and benchmark to be more exhaustive, and applied the suggested changes. For the benchmark, I got these results on my machine: > > Baseline Patch > Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement > VectorSubword.byteToChar 1024 avgt 12 252.954 ? 4.129 ns/op 24.219 ? 0.453 ns/op (10.4x) > VectorSubword.byteToInt 1024 avgt 12 194.707 ? 3.584 ns/op 38.353 ? 0.637 ns/op (5.07x) > VectorSubword.byteToLong 1024 avgt 12 73.645 ? 1.418 ns/op 70.521 ? 0.470 ns/op (no change) > VectorSubword.byteToShort 1024 avgt 12 252.647 ? 3.738 ns/op 22.664 ? 0.449 ns/op (11.1x) > VectorSubword.charToByte 1024 avgt 12 236.396 ? 3.893 ns/op 228.710 ? 1.967 ns/op (no change) > VectorSubword.charToInt 1024 avgt 12 179.673 ? 2.811 ns/op 173.764 ? 1.150 ns/op (no change) > VectorSubword.charToLong 1024 avgt 12 184.867 ? 3.079 ns/op 177.999 ? 1.312 ns/op (no change) > VectorSubword.charToShort 1024 avgt 12 24.385 ? 1.822 ns/op 22.375 ? 1.980 ns/op (no change) > VectorSubword.intToByte 1024 avgt 12 190.949 ? 1.475 ns/op 49.376 ? 1.383 ns/op (3.86x) > VectorSubword.intToChar 1024 avgt 12 182.862 ? 3.708 ns/op 44.344 ? 4.513 ns/op (4.12x) > VectorSubword.intToLong 1024 avgt 12 76.072 ? 1.153 ns/op 73.382 ? 0.294 ns/op (no change) > VectorSubword.intToShort 1024 avgt 12 184.362 ? 1.938 ns/op 45.556 ? 3.323 ns/op (4.04x) > VectorSubword.longToByte 1024 avgt 12 150.766 ? 3.475 ns/op 146.651 ? 0.742 ns/op (no change) > VectorSubword.longToChar 1024 avgt 12 121.764 ? 1.323 ns/op 117.068 ? 1.891 ns/op (no change) > VectorSubword.longToInt 1024 avgt 12 83.761 ? 2.140 ns/op 82.084 ? 0.930 ns/op (no change) > VectorSubword.longToShort 1024 avgt 12 132.293 ? 23.046 ns/op 115.883 ? 0.834 ns/op (+ 12.4%) > VectorSubword.shortToByte 1024 avgt 12 253.387 ? 5.972 ns/op 27.591 ? 1.311 ns/op (9.18x) > VectorSubword.shortToChar 1024 avgt 12 21.446 ? 1.914 ns/op 20.608 ? 1.593 ns/op (no change) > VectorSubword.shortToInt 1024 avgt 12 187.109 ? 3.372 ns/op 36.818 ? 0.989 ns/op (5.08x) > VectorSubword.shortToLong 1024 avgt 12 75.448 ? 0.930 ns/op 72.835 ? 0.507 ns/op (no change) > > Interestingly, eve... @jaskarth I ran it with `perf`: `make test TEST="micro:VectorSubword.charToByte" CONF=linux-x64 TEST_VM_OPTS="-XX:-UseSuperWord" MICRO="OPTIONS=-prof perfasm"` No SuperWord, about 90%+ of the time are spent in the main loop, 8x unrolled: 1.13% ?? 0x00007fc0b01d79b0: movslq %r11d,%r14 ;*bastore {reexecute=0 rethrow=0 return_oop=0} ?? ; - org.openjdk.bench.vm.compiler.VectorSubword::charToByte at 22 (line 76) ?? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorSubword_charToByte_jmhTest::charToByte_avgt_jmhStub at 15 (line 190) 2.44% ?? 0x00007fc0b01d79b3: movzwl 0x10(%rdi,%r14,2),%r10d ;*caload {reexecute=0 rethrow=0 return_oop=0} ?? ; - org.openjdk.bench.vm.compiler.VectorSubword::charToByte at 20 (line 76) ?? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorSubword_charToByte_jmhTest::charToByte_avgt_jmhStub at 15 (line 190) 2.73% ?? 0x00007fc0b01d79b9: mov %r10b,0x10(%r9,%r14,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0} ?? ; - org.openjdk.bench.vm.compiler.VectorSubword::charToByte at 22 (line 76) ?? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorSubword_charToByte_jmhTest::charToByte_avgt_jmhStub at 15 (line 190) 5.89% ?? 0x00007fc0b01d79be: movzwl 0x1e(%rdi,%r14,2),%r10d 2.35% ?? 0x00007fc0b01d79c4: movzwl 0x1c(%rdi,%r14,2),%esi 1.51% ?? 0x00007fc0b01d79ca: movzwl 0x1a(%rdi,%r14,2),%r8d 5.34% ?? 0x00007fc0b01d79d0: movzwl 0x18(%rdi,%r14,2),%edx 1.16% ?? 0x00007fc0b01d79d6: movzwl 0x16(%rdi,%r14,2),%ebx 1.77% ?? 0x00007fc0b01d79dc: movzwl 0x14(%rdi,%r14,2),%ebp 2.51% ?? 0x00007fc0b01d79e2: movzwl 0x12(%rdi,%r14,2),%eax ;*caload {reexecute=0 rethrow=0 return_oop=0} ?? ; - org.openjdk.bench.vm.compiler.VectorSubword::charToByte at 20 (line 76) ?? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorSubword_charToByte_jmhTest::charToByte_avgt_jmhStub at 15 (line 190) 5.73% ?? 0x00007fc0b01d79e8: mov %al,0x11(%r9,%r14,1) 13.70% ?? 0x00007fc0b01d79ed: mov %bpl,0x12(%r9,%r14,1) 6.72% ?? 0x00007fc0b01d79f2: mov %bl,0x13(%r9,%r14,1) 9.84% ?? 0x00007fc0b01d79f7: mov %dl,0x14(%r9,%r14,1) 6.01% ?? 0x00007fc0b01d79fc: mov %r8b,0x15(%r9,%r14,1) 6.05% ?? 0x00007fc0b01d7a01: mov %sil,0x16(%r9,%r14,1) 5.24% ?? 0x00007fc0b01d7a06: mov %r10b,0x17(%r9,%r14,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0} ?? ; - org.openjdk.bench.vm.compiler.VectorSubword::charToByte at 22 (line 76) ?? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorSubword_charToByte_jmhTest::charToByte_avgt_jmhStub at 15 (line 190) ?? ; {other} 11.16% ?? 0x00007fc0b01d7a0b: add $0x8,%r11d ;*iinc {reexecute=0 rethrow=0 return_oop=0} ?? ; - org.openjdk.bench.vm.compiler.VectorSubword::charToByte at 23 (line 75) ?? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorSubword_charToByte_jmhTest::charToByte_avgt_jmhStub at 15 (line 190) 0.23% ?? 0x00007fc0b01d7a0f: cmp %ecx,%r11d ?? 0x00007fc0b01d7a12: jl 0x00007fc0b01d79b0 ;*goto {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.compiler.VectorSubword::charToByte at 26 (line 75) ? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorSubword_charToByte_jmhTest::charToByte_avgt_jmhStub at 15 (line 190) With SuperWord this happens: 2.91% ? 0x00007f5f801d6d71: mov %r11d,%r8d ;*aload_0 {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.compiler.VectorSubword::charToByte at 10 (line 76) ? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorSubword_charToByte_jmhTest::charToByte_avgt_jmhStub at 15 (line 190) 0.78% ? 0x00007f5f801d6d74: vmovd %r8d,%xmm4 0.81% ? 0x00007f5f801d6d79: movslq %r8d,%r14 ;*bastore {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.compiler.VectorSubword::charToByte at 22 (line 76) ? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorSubword_charToByte_jmhTest::charToByte_avgt_jmhStub at 15 (line 190) ? 0x00007f5f801d6d7c: movzwl 0x10(%r13,%r14,2),%r10d ;*caload {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.compiler.VectorSubword::charToByte at 20 (line 76) ? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorSubword_charToByte_jmhTest::charToByte_avgt_jmhStub at 15 (line 190) 9.42% ? 0x00007f5f801d6d82: mov %r10b,0x10(%rbp,%r14,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.compiler.VectorSubword::charToByte at 22 (line 76) ? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorSubword_charToByte_jmhTest::charToByte_avgt_jmhStub at 15 (line 190) 4.40% ? 0x00007f5f801d6d87: movzwl 0x4e(%r13,%r14,2),%r10d 1.10% ? 0x00007f5f801d6d8d: vmovd %r10d,%xmm3 0.16% ? 0x00007f5f801d6d92: movzwl 0x4c(%r13,%r14,2),%r10d 0.29% ? 0x00007f5f801d6d98: vmovd %r10d,%xmm6 2.14% ? 0x00007f5f801d6d9d: movzwl 0x4a(%r13,%r14,2),%r10d ? 0x00007f5f801d6da3: vmovd %r10d,%xmm5 2.17% ? 0x00007f5f801d6da8: movzwl 0x48(%r13,%r14,2),%r10d 0.03% ? 0x00007f5f801d6dae: vmovd %r10d,%xmm8 1.91% ? 0x00007f5f801d6db3: movzwl 0x46(%r13,%r14,2),%r10d ? 0x00007f5f801d6db9: vmovd %r10d,%xmm7 3.76% ? 0x00007f5f801d6dbe: movzwl 0x44(%r13,%r14,2),%r10d ; {other} 0.19% ? 0x00007f5f801d6dc4: vmovd %r10d,%xmm10 1.07% ? 0x00007f5f801d6dc9: movzwl 0x42(%r13,%r14,2),%r10d ? 0x00007f5f801d6dcf: vmovd %r10d,%xmm9 1.68% ? 0x00007f5f801d6dd4: movzwl 0x40(%r13,%r14,2),%r10d ? 0x00007f5f801d6dda: vmovd %r10d,%xmm12 1.81% ? 0x00007f5f801d6ddf: movzwl 0x3e(%r13,%r14,2),%r10d ? 0x00007f5f801d6de5: vmovd %r10d,%xmm11 2.17% ? 0x00007f5f801d6dea: movzwl 0x3c(%r13,%r14,2),%r10d 0.16% ? 0x00007f5f801d6df0: vmovd %r10d,%xmm14 3.37% ? 0x00007f5f801d6df5: movzwl 0x3a(%r13,%r14,2),%r10d ? 0x00007f5f801d6dfb: vmovd %r10d,%xmm13 1.98% ? 0x00007f5f801d6e00: movzwl 0x38(%r13,%r14,2),%r10d ? 0x00007f5f801d6e06: vmovd %r10d,%xmm16 1.59% ? 0x00007f5f801d6e0c: movzwl 0x36(%r13,%r14,2),%r10d ? 0x00007f5f801d6e12: vmovd %r10d,%xmm15 2.49% ? 0x00007f5f801d6e17: movzwl 0x34(%r13,%r14,2),%r10d 0.13% ? 0x00007f5f801d6e1d: vmovd %r10d,%xmm18 1.94% ? 0x00007f5f801d6e23: movzwl 0x32(%r13,%r14,2),%r10d ? 0x00007f5f801d6e29: vmovd %r10d,%xmm17 2.43% ? 0x00007f5f801d6e2f: movzwl 0x30(%r13,%r14,2),%r10d ? 0x00007f5f801d6e35: vmovd %r10d,%xmm20 1.62% ? 0x00007f5f801d6e3b: movzwl 0x2e(%r13,%r14,2),%r10d 0.84% ? 0x00007f5f801d6e41: vmovd %r10d,%xmm19 2.36% ? 0x00007f5f801d6e47: movzwl 0x2c(%r13,%r14,2),%r10d 0.06% ? 0x00007f5f801d6e4d: vmovd %r10d,%xmm22 3.17% ? 0x00007f5f801d6e53: movzwl 0x2a(%r13,%r14,2),%r10d ? 0x00007f5f801d6e59: vmovd %r10d,%xmm21 2.04% ? 0x00007f5f801d6e5f: movzwl 0x28(%r13,%r14,2),%r10d ? 0x00007f5f801d6e65: vmovd %r10d,%xmm24 1.42% ? 0x00007f5f801d6e6b: movzwl 0x26(%r13,%r14,2),%r10d ? 0x00007f5f801d6e71: vmovd %r10d,%xmm23 1.72% ? 0x00007f5f801d6e77: movzwl 0x24(%r13,%r14,2),%esi 0.03% ? 0x00007f5f801d6e7d: movzwl 0x22(%r13,%r14,2),%r10d 0.03% ? 0x00007f5f801d6e83: movzwl 0x20(%r13,%r14,2),%r11d ? 0x00007f5f801d6e89: movzwl 0x1e(%r13,%r14,2),%r9d ? 0x00007f5f801d6e8f: movzwl 0x1c(%r13,%r14,2),%r8d 0.03% ? 0x00007f5f801d6e95: movzwl 0x1a(%r13,%r14,2),%ebx 0.39% ? 0x00007f5f801d6e9b: movzwl 0x18(%r13,%r14,2),%ecx ? 0x00007f5f801d6ea1: movzwl 0x16(%r13,%r14,2),%edx 1.85% ? 0x00007f5f801d6ea7: movzwl 0x14(%r13,%r14,2),%edi ? 0x00007f5f801d6ead: movzwl 0x12(%r13,%r14,2),%eax ;*caload {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.compiler.VectorSubword::charToByte at 20 (line 76) ? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorSubword_charToByte_jmhTest::charToByte_avgt_jmhStub at 15 (line 190) 0.06% ? 0x00007f5f801d6eb3: mov %al,0x11(%rbp,%r14,1) 0.03% ? 0x00007f5f801d6eb8: mov %dil,0x12(%rbp,%r14,1) ? 0x00007f5f801d6ebd: mov %dl,0x13(%rbp,%r14,1) ? 0x00007f5f801d6ec2: mov %cl,0x14(%rbp,%r14,1) ; {other} 0.49% ? 0x00007f5f801d6ec7: mov %bl,0x15(%rbp,%r14,1) ? 0x00007f5f801d6ecc: mov %r8b,0x16(%rbp,%r14,1) 1.78% ? 0x00007f5f801d6ed1: mov %r9b,0x17(%rbp,%r14,1) 0.03% ? 0x00007f5f801d6ed6: mov %r11b,0x18(%rbp,%r14,1) 0.03% ? 0x00007f5f801d6edb: mov %r10b,0x19(%rbp,%r14,1) ? 0x00007f5f801d6ee0: mov %sil,0x1a(%rbp,%r14,1) 1.88% ? 0x00007f5f801d6ee5: vmovd %xmm23,%r10d ? 0x00007f5f801d6eeb: mov %r10b,0x1b(%rbp,%r14,1) 2.30% ? 0x00007f5f801d6ef0: vmovd %xmm24,%r10d 0.03% ? 0x00007f5f801d6ef6: mov %r10b,0x1c(%rbp,%r14,1) 0.13% ? 0x00007f5f801d6efb: vmovd %xmm21,%r10d ? 0x00007f5f801d6f01: mov %r10b,0x1d(%rbp,%r14,1) 0.03% ? 0x00007f5f801d6f06: vmovd %xmm22,%r10d ? 0x00007f5f801d6f0c: mov %r10b,0x1e(%rbp,%r14,1) ? 0x00007f5f801d6f11: vmovd %xmm19,%r10d 0.03% ? 0x00007f5f801d6f17: mov %r10b,0x1f(%rbp,%r14,1) 1.85% ? 0x00007f5f801d6f1c: vmovd %xmm20,%r10d ? 0x00007f5f801d6f22: mov %r10b,0x20(%rbp,%r14,1) 0.16% ? 0x00007f5f801d6f27: vmovd %xmm17,%r10d ? 0x00007f5f801d6f2d: mov %r10b,0x21(%rbp,%r14,1) 0.03% ? 0x00007f5f801d6f32: vmovd %xmm18,%r10d ? 0x00007f5f801d6f38: mov %r10b,0x22(%rbp,%r14,1) ? 0x00007f5f801d6f3d: vmovd %xmm15,%r10d ? 0x00007f5f801d6f42: mov %r10b,0x23(%rbp,%r14,1) 1.98% ? 0x00007f5f801d6f47: vmovd %xmm16,%r10d ? 0x00007f5f801d6f4d: mov %r10b,0x24(%rbp,%r14,1) 0.13% ? 0x00007f5f801d6f52: vmovd %xmm13,%r10d ? 0x00007f5f801d6f57: mov %r10b,0x25(%rbp,%r14,1) 0.10% ? 0x00007f5f801d6f5c: vmovd %xmm14,%r10d ? 0x00007f5f801d6f61: mov %r10b,0x26(%rbp,%r14,1) ? 0x00007f5f801d6f66: vmovd %xmm11,%r10d ? 0x00007f5f801d6f6b: mov %r10b,0x27(%rbp,%r14,1) 2.04% ? 0x00007f5f801d6f70: vmovd %xmm12,%r10d ? 0x00007f5f801d6f75: mov %r10b,0x28(%rbp,%r14,1) 0.13% ? 0x00007f5f801d6f7a: vmovd %xmm9,%r10d ? 0x00007f5f801d6f7f: mov %r10b,0x29(%rbp,%r14,1) 1.46% ? 0x00007f5f801d6f84: vmovd %xmm10,%r10d ? 0x00007f5f801d6f89: mov %r10b,0x2a(%rbp,%r14,1) 0.26% ? 0x00007f5f801d6f8e: vmovd %xmm7,%r10d 0.03% ? 0x00007f5f801d6f93: mov %r10b,0x2b(%rbp,%r14,1) 2.20% ? 0x00007f5f801d6f98: vmovd %xmm8,%r10d 0.19% ? 0x00007f5f801d6f9d: mov %r10b,0x2c(%rbp,%r14,1) 2.30% ? 0x00007f5f801d6fa2: vmovd %xmm5,%r10d 0.03% ? 0x00007f5f801d6fa7: mov %r10b,0x2d(%rbp,%r14,1) 1.23% ? 0x00007f5f801d6fac: vmovd %xmm6,%r10d 0.06% ? 0x00007f5f801d6fb1: mov %r10b,0x2e(%rbp,%r14,1) 2.14% ? 0x00007f5f801d6fb6: vmovd %xmm3,%r10d ? 0x00007f5f801d6fbb: mov %r10b,0x2f(%rbp,%r14,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.compiler.VectorSubword::charToByte at 22 (line 76) ? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorSubword_charToByte_jmhTest::charToByte_avgt_jmhStub at 15 (line 190) 2.69% ? 0x00007f5f801d6fc0: vmovd %xmm4,%r11d ; {other} ? 0x00007f5f801d6fc5: add $0x20,%r11d ;*iinc {reexecute=0 rethrow=0 return_oop=0} ? ; - org.openjdk.bench.vm.compiler.VectorSubword::charToByte at 23 (line 75) ? ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorSubword_charToByte_jmhTest::charToByte_avgt_jmhStub at 15 (line 190) 0.32% ? 0x00007f5f801d6fc9: cmp 0x34(%rsp),%r11d ? 0x00007f5f801d6fce: jl 0x00007f5f801d6d71 ;*goto {reexecute=0 rethrow=0 return_oop=0} ; - org.openjdk.bench.vm.compiler.VectorSubword::charToByte at 26 (line 75) ; - org.openjdk.bench.vm.compiler.jmh_generated.VectorSubword_charToByte_jmhTest::charToByte_avgt_jmhStub at 15 (line 190) What is happening here? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2846688923 From qamai at openjdk.org Fri May 2 08:55:45 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 08:55:45 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v57] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Emanuel's reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/5616c23e..58978fbd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=56 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=55-56 Stats: 64 lines in 3 files changed: 13 ins; 7 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Fri May 2 08:55:47 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 08:55:47 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v56] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 07:06:24 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Emanuel's reviews > > src/hotspot/share/opto/type.hpp line 293: > >> 291: >> 292: template >> 293: const TypeClass* try_cast() const; > > This is a way to get to the `isa_...` via templated types, right? > > I wonder if it might be better to name it `isa_???`, or even just `isa`, so that it is clearer that it is about the `isa` query. > > Currently, it is not very clear what it does, until you look at the implementation. That's a bit unfortunate. I like the naming `try_cast` better because it aligns with the semantics of `std::dynamic_cast`. `isa` is a bad name. > src/hotspot/share/opto/type.hpp line 735: > >> 733: * b. t._hi < 0. Similarly, t._lo == jint(t._ulo) and t._hi == jint(t._uhi) >> 734: * >> 735: * c. t._lo < 0, t._hi >= 0. > > Suggestion: > > * c. t._lo < 0, 0 <= t._hi. > > I like ordering numbers according to their value :) Personally I like having a named variable in the lhs and a constant in the rhs, it also makes the case distinction clearer. You have either `t._lo < 0` or `t._lo >= 0`. > src/hotspot/share/opto/type.hpp line 772: > >> 770: private: >> 771: TypeInt(const TypeIntPrototype& t, int w, bool dual); >> 772: static const Type* try_make(const TypeIntPrototype& t, int widen, bool dual); > > Just an idea, very optional. > `try_make` does not say what it does when it fails. > Exception? `nullptr`? `TOP`? > You you could rename it to `try_make_else_null` or `make_or_null`. Something like that. Yah `make_or_top` seems to be a good name. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071305594 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071303369 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071302242 From qamai at openjdk.org Fri May 2 08:55:48 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 08:55:48 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v56] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 07:30:33 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/type.hpp line 708: >> >>> 706: * I.e. x <= y in the signed domain iff x <= y in the unsigned domain >>> 707: * >>> 708: * Then, we have: >> >> The two `Then` are confusing me a little. The assumption is only the first sentence, not `x <= y iff juint(x) <= juint(y)`, right? >> >> Hmm, ah you have an additional Lemma 3.1 here. But it is a little unclear where that starts and ends. Maybe that can be fixed with some indentation? > > Or you just cram in the Lemma 3.1 before you start the proof of Lemma 3. That would probably be cleanest. That's a good idea, I lifted lemma 3.1 to number 3 and make the old lemma 3 lemma 4. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071304030 From epeter at openjdk.org Fri May 2 08:59:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 08:59:54 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v11] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 04:44:37 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: >> >> >> Baseline Patch >> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement >> VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) >> VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) >> VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) >> >> >> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Address more comments, make test and benchmark more exhaustive > - Merge from master > - Fix copyright after merge > - Fix copyright > - Merge > - Implement patch with VectorCastNode::implemented > - Merge branch 'master' into vectorize-subword > - Address comments from review, refactor test > - Add new conversions to benchmark > - Fix some tests that now vectorize > - ... and 2 more: https://git.openjdk.org/jdk/compare/bd7c7789...8c00ef84 It seems the only difference is just the level of unrolling. 8x vs 32x. But no vectorization either way. public class Test { public static int SIZE = 1024; public static byte[] bytes = new byte[SIZE]; public static char[] chars = new char[SIZE]; public static void main(String[] args) { for (int i = 0; i < 10_000; i++) { test(); } } public static void test() { for (int i = 0; i < SIZE; i++) { bytes[i] = (byte)chars[i]; } } } `./java -XX:CompileCommand=compileonly,Test::test -XX:CompileCommand=printcompilation,Test::test -XX:+TraceLoopOpts -XX:-UseSuperWord Test.java` And then it seems that the 32x unrolling leads to some interesting use of registers. I think that the issue is that first all loads are done, and we don't have enough regular registers, so we start pushing to `xmm` registers. And later move them back to regular registers. That creates a very long loop, and that is not very efficient ? And we somehow still don't allow vectorization of `LoadUS -> StoreB`. @jaskarth Do you know why? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2846708541 From epeter at openjdk.org Fri May 2 09:11:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 09:11:07 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v56] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 00:50:31 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Emanuel's reviews Comments for `intn_t.hpp` src/hotspot/share/utilities/intn_t.hpp line 36: > 34: > 35: template > 36: class intn_t { Can we have some description of this class? What is it for / what does it do? src/hotspot/share/utilities/intn_t.hpp line 37: > 35: template > 36: class intn_t { > 37: static_assert(n > 0 && n <= 8, "should not be larger than char"); Does `n` stand for the number of bits? Maybe you could write `bits` or even `NUM_BITS` instead? src/hotspot/share/utilities/intn_t.hpp line 55: > 53: explicit constexpr operator int() const { > 54: int shift = 32 - n; > 55: return int(_v << shift) >> shift; Suggestion: // Sign extension. int shift = 32 - n; return int(_v << shift) >> shift; Correct? src/hotspot/share/utilities/intn_t.hpp line 58: > 56: } > 57: > 58: constexpr static int min = std::numeric_limits::max() << (n - 1); Ok, so the lower `n-1` bits are zero, and the uppermost is `1`. Why not just shift up a `1`? Or do you actually care about the upper bits? What exactly is the general assumption about the upper bits? src/hotspot/share/utilities/intn_t.hpp line 134: > 132: }; > 133: > 134: } Could use indentation to make clear where the namespace starts and ends. Or at least a comment like this: Suggestion: namespace std { template class numeric_limits> { public: constexpr static intn_t min() { return intn_t(intn_t::min); } constexpr static intn_t max() { return intn_t(intn_t::max); } }; template class numeric_limits> { public: constexpr static uintn_t min() { return uintn_t(uintn_t::min); } constexpr static uintn_t max() { return uintn_t(uintn_t::max); } }; } // namespace std ------------- PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2811428757 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071263630 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071266208 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071269567 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071286524 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071324433 From epeter at openjdk.org Fri May 2 09:11:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 09:11:08 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v56] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 08:17:41 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Emanuel's reviews > > src/hotspot/share/utilities/intn_t.hpp line 37: > >> 35: template >> 36: class intn_t { >> 37: static_assert(n > 0 && n <= 8, "should not be larger than char"); > > Does `n` stand for the number of bits? Maybe you could write `bits` or even `NUM_BITS` instead? Maybe it's not worth it. But then it should be explained in the class description comment above :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071267929 From epeter at openjdk.org Fri May 2 09:11:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 09:11:08 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v56] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 08:19:17 GMT, Emanuel Peter wrote: >> src/hotspot/share/utilities/intn_t.hpp line 37: >> >>> 35: template >>> 36: class intn_t { >>> 37: static_assert(n > 0 && n <= 8, "should not be larger than char"); >> >> Does `n` stand for the number of bits? Maybe you could write `bits` or even `NUM_BITS` instead? > > Maybe it's not worth it. But then it should be explained in the class description comment above :) Suggestion: static_assert(0 < n && n <= 8, "should not be larger than char"); And I like things ordered according to their value :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071284869 From epeter at openjdk.org Fri May 2 09:19:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 09:19:50 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v11] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 04:44:37 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: >> >> >> Baseline Patch >> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement >> VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) >> VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) >> VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) >> >> >> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Address more comments, make test and benchmark more exhaustive > - Merge from master > - Fix copyright after merge > - Fix copyright > - Merge > - Implement patch with VectorCastNode::implemented > - Merge branch 'master' into vectorize-subword > - Address comments from review, refactor test > - Add new conversions to benchmark > - Fix some tests that now vectorize > - ... and 2 more: https://git.openjdk.org/jdk/compare/bd7c7789...8c00ef84 Ah, in your PR description you say: > Currently, only narrowing casts are supported as I wanted to re-use existing VectorCastX2Y logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. Ok, I jumped into the debugger. The issue is when we check `VectorCastNode::opcode`, for `bt = T_CHAR`, we end in the `default` case, and return `0`. I suppose that is the same that happens for `charToInt` and `charToLong`? But then why does `charToShort` vectorize? Ah, because they already have the same element size. Makes sense. Ok then. We should file an RFE for `charToX` vector casts., right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2846742805 From epeter at openjdk.org Fri May 2 09:19:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 09:19:51 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v3] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 05:19:41 GMT, Jasmine Karthikeyan wrote: >> @jaskarth Let me know if there is anything we can help you with here :) > > @eme64 Thank you for the comments! I've updated the test and benchmark to be more exhaustive, and applied the suggested changes. For the benchmark, I got these results on my machine: > > Baseline Patch > Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement > VectorSubword.byteToChar 1024 avgt 12 252.954 ? 4.129 ns/op 24.219 ? 0.453 ns/op (10.4x) > VectorSubword.byteToInt 1024 avgt 12 194.707 ? 3.584 ns/op 38.353 ? 0.637 ns/op (5.07x) > VectorSubword.byteToLong 1024 avgt 12 73.645 ? 1.418 ns/op 70.521 ? 0.470 ns/op (no change) > VectorSubword.byteToShort 1024 avgt 12 252.647 ? 3.738 ns/op 22.664 ? 0.449 ns/op (11.1x) > VectorSubword.charToByte 1024 avgt 12 236.396 ? 3.893 ns/op 228.710 ? 1.967 ns/op (no change) > VectorSubword.charToInt 1024 avgt 12 179.673 ? 2.811 ns/op 173.764 ? 1.150 ns/op (no change) > VectorSubword.charToLong 1024 avgt 12 184.867 ? 3.079 ns/op 177.999 ? 1.312 ns/op (no change) > VectorSubword.charToShort 1024 avgt 12 24.385 ? 1.822 ns/op 22.375 ? 1.980 ns/op (no change) > VectorSubword.intToByte 1024 avgt 12 190.949 ? 1.475 ns/op 49.376 ? 1.383 ns/op (3.86x) > VectorSubword.intToChar 1024 avgt 12 182.862 ? 3.708 ns/op 44.344 ? 4.513 ns/op (4.12x) > VectorSubword.intToLong 1024 avgt 12 76.072 ? 1.153 ns/op 73.382 ? 0.294 ns/op (no change) > VectorSubword.intToShort 1024 avgt 12 184.362 ? 1.938 ns/op 45.556 ? 3.323 ns/op (4.04x) > VectorSubword.longToByte 1024 avgt 12 150.766 ? 3.475 ns/op 146.651 ? 0.742 ns/op (no change) > VectorSubword.longToChar 1024 avgt 12 121.764 ? 1.323 ns/op 117.068 ? 1.891 ns/op (no change) > VectorSubword.longToInt 1024 avgt 12 83.761 ? 2.140 ns/op 82.084 ? 0.930 ns/op (no change) > VectorSubword.longToShort 1024 avgt 12 132.293 ? 23.046 ns/op 115.883 ? 0.834 ns/op (+ 12.4%) > VectorSubword.shortToByte 1024 avgt 12 253.387 ? 5.972 ns/op 27.591 ? 1.311 ns/op (9.18x) > VectorSubword.shortToChar 1024 avgt 12 21.446 ? 1.914 ns/op 20.608 ? 1.593 ns/op (no change) > VectorSubword.shortToInt 1024 avgt 12 187.109 ? 3.372 ns/op 36.818 ? 0.989 ns/op (5.08x) > VectorSubword.shortToLong 1024 avgt 12 75.448 ? 0.930 ns/op 72.835 ? 0.507 ns/op (no change) > > Interestingly, eve... @jaskarth And of course you already filed it! I'm just playing catch-up with you ? [JDK-8349562](https://bugs.openjdk.org/browse/JDK-8349562) Add autovectorizer support for char casts on x86 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2846746044 From epeter at openjdk.org Fri May 2 09:25:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 09:25:46 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v3] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 05:19:41 GMT, Jasmine Karthikeyan wrote: >> @jaskarth Let me know if there is anything we can help you with here :) > > @eme64 Thank you for the comments! I've updated the test and benchmark to be more exhaustive, and applied the suggested changes. For the benchmark, I got these results on my machine: > > Baseline Patch > Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement > VectorSubword.byteToChar 1024 avgt 12 252.954 ? 4.129 ns/op 24.219 ? 0.453 ns/op (10.4x) > VectorSubword.byteToInt 1024 avgt 12 194.707 ? 3.584 ns/op 38.353 ? 0.637 ns/op (5.07x) > VectorSubword.byteToLong 1024 avgt 12 73.645 ? 1.418 ns/op 70.521 ? 0.470 ns/op (no change) > VectorSubword.byteToShort 1024 avgt 12 252.647 ? 3.738 ns/op 22.664 ? 0.449 ns/op (11.1x) > VectorSubword.charToByte 1024 avgt 12 236.396 ? 3.893 ns/op 228.710 ? 1.967 ns/op (no change) > VectorSubword.charToInt 1024 avgt 12 179.673 ? 2.811 ns/op 173.764 ? 1.150 ns/op (no change) > VectorSubword.charToLong 1024 avgt 12 184.867 ? 3.079 ns/op 177.999 ? 1.312 ns/op (no change) > VectorSubword.charToShort 1024 avgt 12 24.385 ? 1.822 ns/op 22.375 ? 1.980 ns/op (no change) > VectorSubword.intToByte 1024 avgt 12 190.949 ? 1.475 ns/op 49.376 ? 1.383 ns/op (3.86x) > VectorSubword.intToChar 1024 avgt 12 182.862 ? 3.708 ns/op 44.344 ? 4.513 ns/op (4.12x) > VectorSubword.intToLong 1024 avgt 12 76.072 ? 1.153 ns/op 73.382 ? 0.294 ns/op (no change) > VectorSubword.intToShort 1024 avgt 12 184.362 ? 1.938 ns/op 45.556 ? 3.323 ns/op (4.04x) > VectorSubword.longToByte 1024 avgt 12 150.766 ? 3.475 ns/op 146.651 ? 0.742 ns/op (no change) > VectorSubword.longToChar 1024 avgt 12 121.764 ? 1.323 ns/op 117.068 ? 1.891 ns/op (no change) > VectorSubword.longToInt 1024 avgt 12 83.761 ? 2.140 ns/op 82.084 ? 0.930 ns/op (no change) > VectorSubword.longToShort 1024 avgt 12 132.293 ? 23.046 ns/op 115.883 ? 0.834 ns/op (+ 12.4%) > VectorSubword.shortToByte 1024 avgt 12 253.387 ? 5.972 ns/op 27.591 ? 1.311 ns/op (9.18x) > VectorSubword.shortToChar 1024 avgt 12 21.446 ? 1.914 ns/op 20.608 ? 1.593 ns/op (no change) > VectorSubword.shortToInt 1024 avgt 12 187.109 ? 3.372 ns/op 36.818 ? 0.989 ns/op (5.08x) > VectorSubword.shortToLong 1024 avgt 12 75.448 ? 0.930 ns/op 72.835 ? 0.507 ns/op (no change) > > Interestingly, eve... @jaskarth Anyway, I'm super happy that you are working on patching this hole :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2846773124 From epeter at openjdk.org Fri May 2 09:41:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 09:41:03 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v57] In-Reply-To: References: Message-ID: <-RjffrOs-0kK0Wy2eLL-4FfPQt3Wf98q9y40b229M1A=.25a220cf-a07a-4012-addc-c4897ef43133@github.com> On Fri, 2 May 2025 08:55:45 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Emanuel's reviews And a few comments about the gtests :) test/hotspot/gtest/opto/test_rangeinference.cpp line 113: > 111: for (int i = 0; i < parameters; i++) { > 112: S a = uniform_random(); > 113: S b = uniform_random(); How expensive would it be to run a few more here? test/hotspot/gtest/utilities/test_intn_t.cpp line 38: > 36: if (i < intn_t::max) { > 37: ASSERT_TRUE(intn_t(i) < intn_t(i + 1)); > 38: } How about a check for the overflow case? test/hotspot/gtest/utilities/test_intn_t.cpp line 51: > 49: test_intn_t<7>(); > 50: test_intn_t<8>(); > 51: } And how about some tests for `uintn_t`? ------------- PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2811578117 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071352975 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071360729 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071361519 From epeter at openjdk.org Fri May 2 09:41:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 09:41:04 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v57] In-Reply-To: <-RjffrOs-0kK0Wy2eLL-4FfPQt3Wf98q9y40b229M1A=.25a220cf-a07a-4012-addc-c4897ef43133@github.com> References: <-RjffrOs-0kK0Wy2eLL-4FfPQt3Wf98q9y40b229M1A=.25a220cf-a07a-4012-addc-c4897ef43133@github.com> Message-ID: On Fri, 2 May 2025 09:37:04 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Emanuel's reviews > > test/hotspot/gtest/utilities/test_intn_t.cpp line 51: > >> 49: test_intn_t<7>(); >> 50: test_intn_t<8>(); >> 51: } > > And how about some tests for `uintn_t`? Those also have a lot more operations to test... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071362254 From epeter at openjdk.org Fri May 2 09:44:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 09:44:03 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v53] In-Reply-To: <4J2bhK1v1UcrBUtW7tSR6KR9lcYZ2IrUYxeic6vwfZg=.2c3489ac-dd4e-4a7b-97d6-5deae5223354@github.com> References: <4J2bhK1v1UcrBUtW7tSR6KR9lcYZ2IrUYxeic6vwfZg=.2c3489ac-dd4e-4a7b-97d6-5deae5223354@github.com> Message-ID: On Fri, 2 May 2025 00:56:07 GMT, Quan Anh Mai wrote: >> Can you do the analogue with the else (one violation) case? >> That one is probably a bit harder, but I have faith in you ;) > > @eme64 Thanks, yes I feel that the progress is much better now. Hope we can finish this soon. @merykitty Ok, now I'm through the whole thing. I am doing this review more thorough and nit-picky than others, because it is going to be quite a substantial change, and quite at the core of many IGVN optimizations going forward. Again: I'm really impressed by the bit tricks here, and very happy that we now have quite solid explanations / proofs. Once you tell me that you responded to all comments here, I can make another pass over everything :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2846806729 From epeter at openjdk.org Fri May 2 10:04:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 10:04:51 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v15] In-Reply-To: References: Message-ID: On Sun, 27 Apr 2025 11:53:31 GMT, kuaiwei wrote: >> In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. >> >> The benchmark result of MergeLoadBench.java >> AMD EPYC 9T24 96-Core Processor: >> >> |name | -MergeLoads | +MergeLoads |delta| >> |---|---|---|---| >> |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | >> |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | >> |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | >> |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | >> |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | >> |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | >> |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | >> |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | >> |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | >> |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | >> |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | >> |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | >> |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | >> |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | >> |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | >> |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | >> |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | >> |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | >> |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | >> |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | >> |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | >> |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | >> |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | >> |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | >> |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | >> |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | >> |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | >> |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | >> |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | >> |Merg... > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Fix build error on mac and windows Continuing our conversation from here: https://github.com/openjdk/jdk/pull/24023#discussion_r2063697381 > I added a flag, _merge_memops_checked in AddNode. It will be checked when search MergePrimitiveLoads::has_no_merge_load_combine_below() . So these cases can be handled. And tests are added. I'm not a great fan of additional flags in nodes. It is not something I have ever seen. It also increases the size of every `AddNode`, which requires more memory. And there are a lot of nodes that inherit from `AddNode`, and many of them are not even relevant for this optimization, right? Plus: you might have checked a node before, and marked it as `merge_memops_checked`. But then its inputs could be optimized, to a point where that node could be optimized again. Maybe we don't care too much about that, and don't expect that to happen at the time we run the MergeStores and MergeLoads phase. So maybe this aspect is ok, even if not perfect. I would feel better if we could just do it with pattern matching only. Why can you not just go down to the `Or` below, and then look at the other side of that `Or`, and see if that branch could be merged in or not? src/hotspot/share/opto/addnode.cpp line 1035: > 1033: return nullptr; > 1034: } > 1035: _combine->set_merge_memops_checked(true); Referenced in comment. src/hotspot/share/opto/addnode.cpp line 1041: > 1039: Node* oper = _combine; > 1040: NOT_PRODUCT(int steps = 0;) // prevent dead loop in bad graph > 1041: while (load == nullptr NOT_PRODUCT(&& steps < 30)) { And just saw this when flying by. What "bad graph" is this? What is the "dead loop" here? What happens in product, since there you don't have this check? ------------- PR Review: https://git.openjdk.org/jdk/pull/24023#pullrequestreview-2811617778 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2071379962 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2071377036 From epeter at openjdk.org Fri May 2 10:04:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 10:04:51 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v12] In-Reply-To: References: <56jHI-HsjxkhkEQ5Dciu-thfFFPCErUEdzuzHr5k7HA=.168c6cdc-52b5-4c96-9068-feb00f621e61@github.com> Message-ID: On Mon, 28 Apr 2025 13:46:46 GMT, kuaiwei wrote: >> Yes, it's the limit of this implementation. I need to find the last `combine` node which can be replaced with merged load. But if it's used by other `Or` operator. So far I can not find a good way to distinguish these two cases. >> I may add a new `checked` flag for combine operator. For case like: >> >> int x = (... merge load pattern with OR ...); >> int y = (... merge load pattern with OR ...); >> int z = x | y; >> >> When IGVN check the `Or` in `x | y`, it's the last one of combine nodes. But it will fail to merge because `collect_merge_list` can not find a related `Load` for it. And I can mark it as `checked`. So when IGVN check the `Or` nodes in line 1 and line2. it will find the next `Or` is checked and get the right one. >> >> Do you think if it is doable? Other suggestion is appreciated. Thanks. > > I added a flag, `_merge_memops_checked` in `AddNode`. It will be checked when search `MergePrimitiveLoads::has_no_merge_load_combine_below()` . So these cases can be handled. And tests are added. I responded below https://github.com/openjdk/jdk/pull/24023#pullrequestreview-2811617778 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2071389572 From epeter at openjdk.org Fri May 2 10:37:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 10:37:56 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v15] In-Reply-To: References: Message-ID: On Sun, 27 Apr 2025 11:53:31 GMT, kuaiwei wrote: >> In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. >> >> The benchmark result of MergeLoadBench.java >> AMD EPYC 9T24 96-Core Processor: >> >> |name | -MergeLoads | +MergeLoads |delta| >> |---|---|---|---| >> |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | >> |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | >> |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | >> |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | >> |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | >> |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | >> |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | >> |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | >> |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | >> |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | >> |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | >> |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | >> |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | >> |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | >> |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | >> |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | >> |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | >> |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | >> |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | >> |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | >> |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | >> |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | >> |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | >> |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | >> |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | >> |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | >> |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | >> |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | >> |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | >> |Merg... > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Fix build error on mac and windows I scanned just a part of the code quickly, and have some more comments :) src/hotspot/share/opto/addnode.cpp line 833: > 831: void dump() { > 832: tty->print_cr("MergeLoadInfo: load: %s(%d), combine: %d, shift: %d", > 833: _load->Name(), _load->_idx, _combine->_idx, _shift); What do I get if I dump and invalid `MergeLoadInfo`? src/hotspot/share/opto/addnode.cpp line 868: > 866: private: > 867: // Detect if the embedding combine node is last one of combine operators > 868: bool has_no_merge_load_combine_below( ) const; Suggestion: bool has_no_merge_load_combine_below() const; src/hotspot/share/opto/addnode.cpp line 974: > 972: // Construct merge information item from input load > 973: MergeLoadInfo MergePrimitiveLoads::merge_load_info(LoadNode* load) const { > 974: const MergeLoadInfo invalid = MergeLoadInfo(); I would suggest to drop this line, and replace all uses with a `MergeLoadInfo::make_invalid()`, which just returns you such an invalid `MergeLoadInfo`. src/hotspot/share/opto/addnode.cpp line 975: > 973: MergeLoadInfo MergePrimitiveLoads::merge_load_info(LoadNode* load) const { > 974: const MergeLoadInfo invalid = MergeLoadInfo(); > 975: const Node* check = bypass_i2l(load); What does `check` stand for? Might `load_use` be more descriptive? src/hotspot/share/opto/addnode.cpp line 981: > 979: // Check the Load node has the pattern "(Or (LShift (Load .. ) ConI) ..)" or "(Or (Load ..) ..)" > 980: for (DUIterator_Fast imax, iter = check->fast_outs(imax); iter < imax; iter++) { > 981: Node *out = check->fast_out(iter); Do you need to loop here? Do all cases we expect to optimize only have a single use of `check`? Or what exactly would happen here if we had multiple uses? Could be good if you had a regression test that triggers such a case with multiple uses here. src/hotspot/share/opto/addnode.cpp line 990: > 988: shift = 0; > 989: } else { > 990: // Too much Or usages "Or usages" are countable. "much" is for uncountable, "many" for countable ;) Suggestion: // Too many Or usages src/hotspot/share/opto/addnode.cpp line 1001: > 999: if (shift_oper->outcnt() != 1 || // Shift should has only one usage > 1000: !is_supported_combine_opcode(shift_oper->unique_out()->Opcode()) || // Not used by combine operator > 1001: !shift_oper->in(2)->is_ConI()) { // Not shift by constant Please fix the alignment of the comments :) src/hotspot/share/opto/addnode.cpp line 1039: > 1037: // go up through combine operators to find load node > 1038: LoadNode* load = nullptr; > 1039: Node* oper = _combine; Can you show some patterns, i.e. what would `load` and `oper` be after this? src/hotspot/share/opto/addnode.cpp line 1054: > 1052: } else { > 1053: // not found > 1054: add_operators_to_worklist(_combine); Why are you doing this? If an input still needs to be transformed, then it should be put onto the work list by the inputs of that operator. And not by `combine`, i.e. the use of that operator. Plus: if those operators are now transformed, would we actually ever get back here and attempt optimizing again? Your flag is now already set with `set_merge_memops_checked`, so we would not get here again, right? ------------- PR Review: https://git.openjdk.org/jdk/pull/24023#pullrequestreview-2811640092 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2071398672 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2071391711 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2071399977 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2071405460 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2071406853 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2071400902 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2071402266 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2071419625 PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2071423301 From epeter at openjdk.org Fri May 2 10:37:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 10:37:56 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v15] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 10:18:23 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix build error on mac and windows > > src/hotspot/share/opto/addnode.cpp line 981: > >> 979: // Check the Load node has the pattern "(Or (LShift (Load .. ) ConI) ..)" or "(Or (Load ..) ..)" >> 980: for (DUIterator_Fast imax, iter = check->fast_outs(imax); iter < imax; iter++) { >> 981: Node *out = check->fast_out(iter); > > Do you need to loop here? > Do all cases we expect to optimize only have a single use of `check`? > Or what exactly would happen here if we had multiple uses? > > Could be good if you had a regression test that triggers such a case with multiple uses here. Yeah, it seems you actually check when looping if you find a second one, and then return `invalid`. And the default case also gets `invalid`. Hence, I really see no point in looping here, you should just use `unique_out`, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2071417802 From chagedorn at openjdk.org Fri May 2 10:40:54 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 2 May 2025 10:40:54 GMT Subject: RFR: 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list [v2] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 07:54:31 GMT, Saranya Natarajan wrote: >> Issue: The assertion failure , `assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list`, occurs when [loop striping mining ](https://bugs.openjdk.org/browse/JDK-8186027)may create a [MaxL](https://bugs.openjdk.org/browse/JDK-8324655) after macro expansion. >> >> Analysis : Before the macro nodes are expanded in` expand_macro_nodes`, there is a process where nodes from the macro list are eliminated. This also includes elimination of any `OuterStripMinedLoop` node in the macro list. The bug occurs due to the refining of the strip mined loop in `adjust_strip_mined_loop` function just before it is eliminated. In this case, a` MaxL` node is added to the macro list in `adjust_strip_mined_loop`. >> >> Fix: The fix involves performing the refining of the strip mined loop before elimination process. More specifically, moving the `adjust_strip_mined_loop` function outside the elimination loop. >> >> Improvement: The process of eliminating macro nodes by calling `eliminate_macro_nodes` and performing additional Opaque and LoopLimit nodes elimination in ` expand_macro_nodes` is unintuitive as suggested in [JDK-8325478 ](https://bugs.openjdk.org/browse/JDK-8325478) and the current fix should be moved along with the other elimination code. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > added the test to JTREG A few suggestions, otherwise, it looks good to me, too! src/hotspot/share/opto/macro.cpp line 2460: > 2458: // Returns true if a failure occurred. > 2459: bool PhaseMacroExpand::expand_macro_nodes() { > 2460: // Perform refining of strip mined loop node before we start to expand. Maybe you can put the new code into a separate method. src/hotspot/share/opto/macro.cpp line 2462: > 2460: // Perform refining of strip mined loop node before we start to expand. > 2461: for (int i = C->macro_count(); i > 0; i--) { > 2462: Node* n = C->macro_node(i-1); Suggestion: Node* n = C->macro_node(i - 1); src/hotspot/share/opto/macro.cpp line 2463: > 2461: for (int i = C->macro_count(); i > 0; i--) { > 2462: Node* n = C->macro_node(i-1); > 2463: if (n->Opcode() == Op_OuterStripMinedLoop) { You could also use `is_OuterStripMinedLoop()`: Suggestion: if (n->is_OuterStripMinedLoop()) { test/hotspot/jtreg/compiler/macronodes/TestLoopStripMiningInMacroElimination.java line 48: > 46: } > 47: } > 48: Suggestion: ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24890#pullrequestreview-2811688174 PR Review Comment: https://git.openjdk.org/jdk/pull/24890#discussion_r2071426238 PR Review Comment: https://git.openjdk.org/jdk/pull/24890#discussion_r2071423368 PR Review Comment: https://git.openjdk.org/jdk/pull/24890#discussion_r2071424370 PR Review Comment: https://git.openjdk.org/jdk/pull/24890#discussion_r2071424668 From qamai at openjdk.org Fri May 2 11:28:21 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 11:28:21 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v56] In-Reply-To: References: Message-ID: <13S0zB0hQkUJfH4HmdmZMSeTb6ODo7OYW2lxfaxs_nA=.2a4bc9e1-3aba-4258-9d78-18bc305a6ab8@github.com> On Fri, 2 May 2025 08:34:08 GMT, Emanuel Peter wrote: >> Maybe it's not worth it. But then it should be explained in the class description comment above :) > > Suggestion: > > static_assert(0 < n && n <= 8, "should not be larger than char"); > > And I like things ordered according to their value :) > Does `n` stand for the number of bits? Maybe you could write `bits` or even `NUM_BITS` instead? I changed it to `nbits`. > And I like things ordered according to their value :) Fine, this one follows you :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071471665 From qamai at openjdk.org Fri May 2 11:28:21 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 11:28:21 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v58] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: intn_t refinements ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/58978fbd..95e5a23b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=57 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=56-57 Stats: 70 lines in 2 files changed: 31 ins; 0 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Fri May 2 11:28:22 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 11:28:22 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v56] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 08:35:44 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Emanuel's reviews > > src/hotspot/share/utilities/intn_t.hpp line 58: > >> 56: } >> 57: >> 58: constexpr static int min = std::numeric_limits::max() << (n - 1); > > Ok, so the lower `n-1` bits are zero, and the uppermost is `1`. Why not just shift up a `1`? Or do you actually care about the upper bits? What exactly is the general assumption about the upper bits? This is an `int`, not an `intn_t`, so we need all the upper bits be 1, too. > src/hotspot/share/utilities/intn_t.hpp line 134: > >> 132: }; >> 133: >> 134: } > > Could use indentation to make clear where the namespace starts and ends. Or at least a comment like this: > Suggestion: > > namespace std { > > template > class numeric_limits> { > public: > constexpr static intn_t min() { return intn_t(intn_t::min); } > constexpr static intn_t max() { return intn_t(intn_t::max); } > }; > > template > class numeric_limits> { > public: > constexpr static uintn_t min() { return uintn_t(uintn_t::min); } > constexpr static uintn_t max() { return uintn_t(uintn_t::max); } > }; > > } // namespace std Added the comment marking the end of the namespace. We often do not indent for namespace in C++ I believe. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071472663 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071473294 From qamai at openjdk.org Fri May 2 11:28:22 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 11:28:22 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v57] In-Reply-To: <-RjffrOs-0kK0Wy2eLL-4FfPQt3Wf98q9y40b229M1A=.25a220cf-a07a-4012-addc-c4897ef43133@github.com> References: <-RjffrOs-0kK0Wy2eLL-4FfPQt3Wf98q9y40b229M1A=.25a220cf-a07a-4012-addc-c4897ef43133@github.com> Message-ID: On Fri, 2 May 2025 09:36:40 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Emanuel's reviews > > test/hotspot/gtest/utilities/test_intn_t.cpp line 38: > >> 36: if (i < intn_t::max) { >> 37: ASSERT_TRUE(intn_t(i) < intn_t(i + 1)); >> 38: } > > How about a check for the overflow case? Done! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071475074 From jbhateja at openjdk.org Fri May 2 11:31:01 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 2 May 2025 11:31:01 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v9] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Refactoring code to create a seperate VM_Features class ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/4a614be8..a9258174 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=07-08 Stats: 63 lines in 3 files changed: 32 ins; 22 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From qamai at openjdk.org Fri May 2 11:38:42 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 11:38:42 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v59] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: add some more sanity static_asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/95e5a23b..25a6f9b0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=58 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=57-58 Stats: 9 lines in 2 files changed: 8 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Fri May 2 11:38:44 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 11:38:44 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v57] In-Reply-To: <-RjffrOs-0kK0Wy2eLL-4FfPQt3Wf98q9y40b229M1A=.25a220cf-a07a-4012-addc-c4897ef43133@github.com> References: <-RjffrOs-0kK0Wy2eLL-4FfPQt3Wf98q9y40b229M1A=.25a220cf-a07a-4012-addc-c4897ef43133@github.com> Message-ID: <-hL8rLzWSetB99edTK1hWAp3i_bpzOvmb1ZGDyc5MgM=.7bee18ae-e8a0-48bb-9325-91802aaaca4e@github.com> On Fri, 2 May 2025 09:30:17 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Emanuel's reviews > > test/hotspot/gtest/opto/test_rangeinference.cpp line 113: > >> 111: for (int i = 0; i < parameters; i++) { >> 112: S a = uniform_random(); >> 113: S b = uniform_random(); > > How expensive would it be to run a few more here? We can run it more, but this one is the simple cases, so I thought it is not needed much, raising `parameters` to 1000 does not affect the test runtime much, so I did it anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071485731 From qamai at openjdk.org Fri May 2 11:38:47 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 11:38:47 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v57] In-Reply-To: References: <-RjffrOs-0kK0Wy2eLL-4FfPQt3Wf98q9y40b229M1A=.25a220cf-a07a-4012-addc-c4897ef43133@github.com> Message-ID: On Fri, 2 May 2025 09:37:33 GMT, Emanuel Peter wrote: >> test/hotspot/gtest/utilities/test_intn_t.cpp line 51: >> >>> 49: test_intn_t<7>(); >>> 50: test_intn_t<8>(); >>> 51: } >> >> And how about some tests for `uintn_t`? > > Those also have a lot more operations to test... The thing is that the other operations are so trivial that it would be counter-productive to test them, it is like testing `add(int x, int y) { return x + y; }` :) The operations I test here are the non-trivial ones, that is sign extension and comparison. I have added some sanity `static_assert` to catch off-by-one errors, though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071486369 From qamai at openjdk.org Fri May 2 11:42:05 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 11:42:05 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v53] In-Reply-To: References: <4J2bhK1v1UcrBUtW7tSR6KR9lcYZ2IrUYxeic6vwfZg=.2c3489ac-dd4e-4a7b-97d6-5deae5223354@github.com> Message-ID: On Fri, 2 May 2025 09:41:13 GMT, Emanuel Peter wrote: >> @eme64 Thanks, yes I feel that the progress is much better now. Hope we can finish this soon. > > @merykitty Ok, now I'm through the whole thing. I am doing this review more thorough and nit-picky than others, because it is going to be quite a substantial change, and quite at the core of many IGVN optimizations going forward. > > Again: I'm really impressed by the bit tricks here, and very happy that we now have quite solid explanations / proofs. > > Once you tell me that you responded to all comments here, I can make another pass over everything :) @eme64 Please let me know if you disagree with any answer from me. I am fairly confident in this patch, especially with the exhaustive tests exercising `intn_t` values. After this patch, I will work on allowing the test infrastructure to work with `Type` instances directly, templatizing `TypeInt` and `TypeLong` so that we can work with `TypeInt>`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2847009418 From mli at openjdk.org Fri May 2 12:24:17 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 2 May 2025 12:24:17 GMT Subject: RFR: 8355699: RISC-V: support SUADD/SADD/SUSUB/SSUB Message-ID: Hi, Can you help to review this patch to add SUADD/SADD/SUSUB/SSUB for vector api? Thanks! ## Test running in progress ... ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/25005/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25005&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355699 Stats: 168 lines in 4 files changed: 146 ins; 1 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/25005.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25005/head:pull/25005 PR: https://git.openjdk.org/jdk/pull/25005 From chagedorn at openjdk.org Fri May 2 13:49:02 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 2 May 2025 13:49:02 GMT Subject: RFR: 8355674: C2: Partial Peeling should not introduce Phi nodes above OpaqueInitializedAssertionPredicate nodes Message-ID: In the test case, we have two Initialized Assertion Predicate that share the same `Bool` which is perfectly fine: ![image](https://github.com/user-attachments/assets/a7a1673f-b5df-49e8-8a22-c35aa0ee1693) These Initialized Assertion Predicates were created for loops that have been folded away. They then end up in a new inner most loop which is partial peeled. Partial Peeling finds that we need to do the cut between `580 IfTrue` and `581 If`. This means, that the Initialized Assertion Predicate `569 RangeCheck` with its `535 OpaqueInitializedAssertionPredicate` is in the peel set and the second Initialized Assertion Predicate `504 RangeCheck` with its `503 OpaqueInitializedAssertionPredicate` is in the not peel set. As a result of that, we are introducing a `Phi` node between an `OpaqueInitializedAssertionPredicate` and a `Bool` node: ![image](https://github.com/user-attachments/assets/3bbc2b88-300a-4c40-99ac-056cbeab822a) We eventually remove the `OpaqueInitializedAssertionPredicate` and are left with the following graph shape ![image](https://github.com/user-attachments/assets/e839366f-119b-4411-b422-84639c68aa80) which cannot be handled by the backend. The fix I propose is to prohibit Partial Peeling from inserting such a `Phi` node by updating `clone_for_special_use_inside_loop()` which takes care of not inserting phis for an `If/Bool`. We need to also special case `OpaqueInitializedAssertionPredicate`. Thanks, Christian ------------- Commit messages: - 8355674: C2: Partial Peeling should not introduce Phi nodes above OpaqueInitializedAssertionPredicate nodes Changes: https://git.openjdk.org/jdk/pull/25006/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25006&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355674 Stats: 56 lines in 2 files changed: 55 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25006.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25006/head:pull/25006 PR: https://git.openjdk.org/jdk/pull/25006 From epeter at openjdk.org Fri May 2 14:02:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 14:02:05 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v54] In-Reply-To: References: <-PwC3GNfGkJx8XVuhDelARrUHi6gX6GsqQJYsVzVdvE=.bf82fc8f-41da-4b36-a9e5-90c2cda94875@github.com> Message-ID: <0NNyOQ5DSPldUbnpMFcze7YnME0hL4K_onIEAdtWdNs=.2df576ca-08a8-41ec-a12c-ea5a03dfc80e@github.com> On Thu, 1 May 2025 16:01:41 GMT, Quan Anh Mai wrote: >> Hmm, maybe we can find some way to only print the bits if they give "additional information" that is not given by the ranges. Otherwise, we would not print alignment information, and that's quite a shame. > > Maybe that's worth thinking about, the thing is that bit information is exceptionally long and may not be universally interesting. I think we can think more about it later. My rough idea is that it may be better to print it when it is interesting. For example, we can print alignment properties at the memory access. Yes, we can always improve printing in a later RFE :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071656709 From chagedorn at openjdk.org Fri May 2 14:18:57 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 2 May 2025 14:18:57 GMT Subject: RFR: 8356084: C2: Data is wrongly rewired to Initialized Assertion Predicates instead of Template Assertion Predicates Message-ID: <9OvLoKhN1DbjqW9IpXAeQ-Xt-MwyBZXrseoIyXmjNqo=.eed50620-9c5d-4f2e-876c-c51e42fd8c2e@github.com> Before the Assertion Predicate refactorings, we rewired data dependencies either to the newly created Initialized Assertion Predicates (for Loop Peeling) or to the zero trip guard (for main and post loops). Both was incomplete when we further split a loop - we missed to update these data dependencies accordingly. Now that the (almost) complete Assertion Predicate fix is in with [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577), we are now finally able to fix this by always rewiring the data dependencies to the Template Assertion Predicates which will be kept until either no more loop splitting can be done for a loop or until loop opts are over. We could have already fixed that with JDK-8350577 but it was simply missed. As an intermediate solution, we always rewired the data dependencies to the Initialized Assertion Predicates which only worked in some cases when the Initialized Assertion Predicates were folded away: They ended up at the Template Assertion Predicates above and from there we could update the data dependencies further. But if that did not happen, we could not find these data dependencies at the Template Assertion Predicates and failed to further update them when the loop was split again. As a result, we could perform some loads too early and crash (not observable, though). How we could end up with such a crash is described in the newly added regression test `testPeelingThreeTimesDataUpdate()`. Here is a snippet from the graph after applying Loop Peeling several times without the patch: ![image](https://github.com/user-attachments/assets/c40f5918-3ef4-4c4b-ab7d-dc4fdbf41fdf) All `LoadN` data dependencies are piled up at an Initialized Assertion Predicate from where we can no longer update them in further loop splitting optimizations because we only look at Template Assertion Predicates for that. By correctly rewiring the data dependencies to Template Assertion Predicates, we fix this which is proposed with this patch. This was found by a new stress peeling mode ([JDK-8355488](https://bugs.openjdk.org/browse/JDK-8355488)) @marc-chevalier is currently working on. I was able to come up with a reproducer that does not use the new stressing but it shows that the new stressing is useful in finding hard to discover bugs. Thanks, Christian ------------- Commit messages: - 8356084: C2: Data is wrongly rewired to Initialized Assertion Predicates instead of Template Assertion Predicates Changes: https://git.openjdk.org/jdk/pull/25007/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25007&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356084 Stats: 94 lines in 3 files changed: 79 ins; 1 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/25007.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25007/head:pull/25007 PR: https://git.openjdk.org/jdk/pull/25007 From chagedorn at openjdk.org Fri May 2 14:18:58 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 2 May 2025 14:18:58 GMT Subject: RFR: 8356084: C2: Data is wrongly rewired to Initialized Assertion Predicates instead of Template Assertion Predicates In-Reply-To: <9OvLoKhN1DbjqW9IpXAeQ-Xt-MwyBZXrseoIyXmjNqo=.eed50620-9c5d-4f2e-876c-c51e42fd8c2e@github.com> References: <9OvLoKhN1DbjqW9IpXAeQ-Xt-MwyBZXrseoIyXmjNqo=.eed50620-9c5d-4f2e-876c-c51e42fd8c2e@github.com> Message-ID: On Fri, 2 May 2025 14:14:26 GMT, Christian Hagedorn wrote: > Before the Assertion Predicate refactorings, we rewired data dependencies either to the newly created Initialized Assertion Predicates (for Loop Peeling) or to the zero trip guard (for main and post loops). Both was incomplete when we further split a loop - we missed to update these data dependencies accordingly. > > Now that the (almost) complete Assertion Predicate fix is in with [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577), we are now finally able to fix this by always rewiring the data dependencies to the Template Assertion Predicates which will be kept until either no more loop splitting can be done for a loop or until loop opts are over. > > We could have already fixed that with JDK-8350577 but it was simply missed. As an intermediate solution, we always rewired the data dependencies to the Initialized Assertion Predicates which only worked in some cases when the Initialized Assertion Predicates were folded away: They ended up at the Template Assertion Predicates above and from there we could update the data dependencies further. But if that did not happen, we could not find these data dependencies at the Template Assertion Predicates and failed to further update them when the loop was split again. As a result, we could perform some loads too early and crash (not observable, though). > > How we could end up with such a crash is described in the newly added regression test `testPeelingThreeTimesDataUpdate()`. Here is a snippet from the graph after applying Loop Peeling several times without the patch: > > ![image](https://github.com/user-attachments/assets/c40f5918-3ef4-4c4b-ab7d-dc4fdbf41fdf) > > All `LoadN` data dependencies are piled up at an Initialized Assertion Predicate from where we can no longer update them in further loop splitting optimizations because we only look at Template Assertion Predicates for that. By correctly rewiring the data dependencies to Template Assertion Predicates, we fix this which is proposed with this patch. > > This was found by a new stress peeling mode ([JDK-8355488](https://bugs.openjdk.org/browse/JDK-8355488)) @marc-chevalier > is currently working on. I was able to come up with a reproducer that does not use the new stressing but it shows that the new stressing is useful in finding hard to discover bugs. > > Thanks, > Christian src/hotspot/share/opto/predicates.cpp line 996: > 994: > 995: DEBUG_ONLY(initialized_assertion_predicate.verify();) > 996: template_assertion_predicate.rewire_loop_data_dependencies(cloned_template_predicate_tail, Wrongly used the Initialized Assertion Predicate tail instead of the Template Assertion Predicate tail. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25007#discussion_r2071679414 From ecaspole at openjdk.org Fri May 2 15:03:46 2025 From: ecaspole at openjdk.org (Eric Caspole) Date: Fri, 2 May 2025 15:03:46 GMT Subject: RFR: 8354347: Increase the default padding size for aarch64 in JDK code. In-Reply-To: References: Message-ID: On Fri, 2 May 2025 00:49:45 GMT, Peter B. Kessler wrote: > Increase the default padding for C++ fields to avoid false sharing. This was neutral in the typical promo build perf testing I ran on OCI, but it seems worthwhile based on discussions. LGTM. ------------- Marked as reviewed by ecaspole (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24994#pullrequestreview-2812238898 From kvn at openjdk.org Fri May 2 15:33:46 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 2 May 2025 15:33:46 GMT Subject: RFR: 8354284: Add more compiler test folders to tier1 runs [v2] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 08:43:23 GMT, Marc Chevalier wrote: >> Some folders in jtreg/compiler have been reported not to be run in any tier, while tier1 was probably intended, but the tier definition was mistakenly not updated. I've checked which folders are not referenced into `TEST.groups`. >> >> The unmentioned ones: >> - `ccp` >> - `ciReplay` >> - `ciTypeFlow` >> - `compilercontrol` >> - `debug` >> - `oracle` >> - `predicates` >> - `print` >> - `relocations` >> - `sharedstubs` >> - `splitif` >> - `tiered` >> - `whitebox` >> >> And those, that are not test folders: >> - `lib` >> - `patches` >> - `testlibraries` >> >> I'm adding `ccp`, `ciTypeFlow`, `predicates`, `sharedstubs` and `splitif` to tier1. >> >> The other folders seems to have been around for very long (since at least mid-2021). It's not clear how meaningful it'd be to add them/what the intent from them was. I've rather focused on the recently(-ish) added folders, that one forgot to put in a tier when adding it. >> >> Feel free to tell if other folders should be included (and in which tier). >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > speed up slowest test Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24817#pullrequestreview-2812307636 From epeter at openjdk.org Fri May 2 15:37:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 15:37:08 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v59] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 11:38:42 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add some more sanity static_asserts Making another pass from the top. Here mostly comments about `adjust_lo`. I'm excited, it is now much more readable, and I understand almost everything now :) src/hotspot/share/opto/compile.cpp line 4505: > 4503: if (sizetype != nullptr && sizetype->_hi > 0) { > 4504: index_max = sizetype->_hi - 1; > 4505: } Maybe I asked this before: where does the `sizetype->_hi > 0` come from? src/hotspot/share/opto/rangeinference.cpp line 70: > 68: // 0b1000... > 69: constexpr U mid_point = (std::numeric_limits::max() >> 1) + U(1); > 70: assert((bounds._lo < mid_point) == (bounds._hi < mid_point), "must be a simple interval"); Suggestion: assert((bounds._lo < mid_point) == (bounds._hi < mid_point), "must be a simple interval, see Lemma 4"); src/hotspot/share/opto/rangeinference.cpp line 244: > 242: - lo[x] satisfies bits for 0 <= x < i (3.1) > 243: - zeros[i] = 0 (3.2) > 244: - lo[i] = 0 (3.3) Suggestion: - lo[i] = 0 (3.3) If there is no such i, then there is also no r that is the smallest value not smaller than lo that satisfies bits. src/hotspot/share/opto/rangeinference.cpp line 251: > 249: // 0 in one_violation. Since all higher bits are 0 in zero_violation and > 250: // one_violation, we have zero_violation > one_violation. Similarly, if the > 251: // first violation violates ones, we have zero_violation < one_violation. Suggestion: // The algorithm depends on whether the first violation violates zeros or // ones. If it violates zeros, we have the bit being 1 in zero_violation and // 0 in one_violation. Since all higher bits are 0 in zero_violation and // one_violation, we have zero_violation > one_violation. Similarly, if the // first violation violates ones, we have zero_violation < one_violation. Might as well make it a new sentence. src/hotspot/share/opto/rangeinference.cpp line 331: > 329: // - lo[x] satisfies bits for 0 <= x < i (3.1) > 330: // - zeros[i] = 0 (3.2) > 331: // - lo[i] = 0 (3.3) Suggestion: // - lo[i] = 0 (3.3) // If there is no such i, then there is also no r that is the smallest // value not smaller than lo that satisfies bits. src/hotspot/share/opto/rangeinference.cpp line 336: > 334: // holds. However, first_violation is not the value i we are looking for > 335: // because lo[first_violation] == 1. We can also see that any larger value > 336: // of i would violate 3.1 since lo[first_violation] does not satisfy bits. Suggestion: // of i would violate (3.1) since lo[first_violation] does not satisfy bits. src/hotspot/share/opto/rangeinference.cpp line 338: > 336: // of i would violate 3.1 since lo[first_violation] does not satisfy bits. > 337: // As a result, we should find the last i upto first_violation such that > 338: // lo[i] == zeros[i] == 0. Excellent, this really sets up the bit tricks below! Nice ? src/hotspot/share/opto/rangeinference.cpp line 370: > 368: // bits. We will expand the implication of such cases below. > 369: // 0 1 1 0 0 0 0 0 > 370: U tmp = ~either & find_mask; Suggestion: // In tmp exactly those bits are set that are upto first_violation and where // lo[i] == zeros[i] == 0. Hence: The last one of these bits is at bit index i. // Note that there may not exist such bits, i.e. tmp == 0. Hence we cannot // find any i that satisfies (3.1-3.3), and so there is no value not less than // lo that satisfies bits. We will expand the implication of such cases below. // 0 1 1 0 0 0 0 0 U tmp = ~either & find_mask; src/hotspot/share/opto/rangeinference.cpp line 370: > 368: // bits. We will expand the implication of such cases below. > 369: // 0 1 1 0 0 0 0 0 > 370: U tmp = ~either & find_mask; Suggestion: // We want to find the last i upto first_violation such that lo[i] == zeros[i] == 0. // We start with all bits where lo[i] == zeros[i] == 0: // 0 1 1 0 0 0 0 1 U lo_eq_zeros_eq_zero = ~(lo | bits._zeros). // Now let us find all the bit indices x upto first_violation such that // lo[x] == zeros[x] == 0. The last one of these bits must be at index i. // 0 1 1 0 0 0 0 0 // Note that there may not exist such bits, i.e. tmp == 0. Hence we cannot // find any i that satisfies (3.1-3.3), and so there is no value not less than // lo that satisfies bits. We will expand the implication of such cases below. U alignment_candidates = lo_eq_zeros_eq_zero & find_mask; src/hotspot/share/opto/rangeinference.cpp line 378: > 376: // it directly without going through i. > 377: // 0 0 1 0 0 0 0 0 > 378: U alignment = tmp & (-tmp); Suggestion: // We now want to select the last one of these candidates, which is // exactly the last index i upto first_violation such that lo[i] == zeros[i] == 0. // In our example we have i == 2. // 0 0 1 0 0 0 0 0 U alignment = alignment_candidates & (-alignment_candidates); Ok, well what is still missing is how the bit trick with `-alignment_candidates` works here. Let me try: alignment_candidates = 0 1 1 0 0 0 0 0 -alignment_candidates = 1 0 1 0 0 0 0 0 Hmm yeah I'm not sure about this one.... I suppose I would have tried this via trailing zeros, but not sure about it... What is the argument here, why this extracts the last bit? src/hotspot/share/opto/rangeinference.cpp line 387: > 385: // - new_lo[i] = 1 (2.6) > 386: // - new_lo[x] = 0, for x > i (not yet 2.7) > 387: // 1 0 1 0 0 0 0 0 That's a little misleading if there is no such `i`... In that case `alignment = 0`, and `-alignment = 0`, and so `new_lo = 0`... which is not exactly an overflow, but smaller than `lo`, so kinda an overflow ;) src/hotspot/share/opto/rangeinference.cpp line 398: > 396: // This is the result we are looking for. > 397: // 1 0 1 0 0 0 1 1 > 398: new_lo |= bits._ones; Now we should probably also split the cases with and without candidates. Suggestion: // Assume there was at least one candidate, and i is the index of the last one: // Then there exists no value x not larger than i such that // new_lo[x] == 0 and ones[x] == 1. This is because all bits of lo before i // should satisfy bits, and new_lo[i] == 1. As a result, doing // new_lo |= bits.ones will give us a value such that: // - new_lo[x] = lo[x], for 0 <= x < i (2.5) // - new_lo[i] = 1 (2.6) // - new_lo[x] = ones[x], for x > i (2.7) // This is the result r we are looking for. // 1 0 1 0 0 0 1 1 // If there was no candidate, then above we had new_lo = 0, and the // computation below gives us new_lo = ones. new_lo |= bits._ones; src/hotspot/share/opto/rangeinference.cpp line 401: > 399: // In this case, new_lo may not always be a valid answer. This can happen > 400: // if there is no bit upto first_violation that is 0 in both lo and zeros, > 401: // i.e. tmp == 0. In such cases, alignment == 0 && lo == bits._ones. It is Suggestion: // i.e. tmp == 0. In such cases, alignment == 0 and new_lo == bits._ones. It is Or do we somehow also know that `lo = ones`? I think this was a typo, right? src/hotspot/share/opto/rangeinference.cpp line 411: > 409: // result of rounding up being 0. > 410: assert(lo < new_lo || new_lo == bits._ones, "overflow must return bits._ones"); > 411: return new_lo; Hmm, I would not say that it can return a **non valid** answer. Because we declared that it is ok to return overflowed values at the top of the method, in fact we must prove that in those cases where we cannot find an `r` we at least get `new_lo < lo`. ------------- PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2812099778 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071663002 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071672800 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071757681 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071676461 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071759171 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071688886 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071691139 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071706093 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071721498 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071732378 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071755163 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071773264 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071778224 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071789763 From epeter at openjdk.org Fri May 2 15:37:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 15:37:09 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v59] In-Reply-To: References: Message-ID: <6ky_9MDLXFrdtvVobVuxOoT4RadbG1jOslNkaS1x92s=.4104a1ef-3fab-4857-9aef-d8a7e26308c3@github.com> On Fri, 2 May 2025 14:23:07 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> add some more sanity static_asserts > > src/hotspot/share/opto/rangeinference.cpp line 338: > >> 336: // of i would violate 3.1 since lo[first_violation] does not satisfy bits. >> 337: // As a result, we should find the last i upto first_violation such that >> 338: // lo[i] == zeros[i] == 0. > > Excellent, this really sets up the bit tricks below! Nice ? Let's try to match the wording below to this, so it is clear that we are working towards this. > src/hotspot/share/opto/rangeinference.cpp line 370: > >> 368: // bits. We will expand the implication of such cases below. >> 369: // 0 1 1 0 0 0 0 0 >> 370: U tmp = ~either & find_mask; > > Suggestion: > > // In tmp exactly those bits are set that are upto first_violation and where > // lo[i] == zeros[i] == 0. Hence: The last one of these bits is at bit index i. > // Note that there may not exist such bits, i.e. tmp == 0. Hence we cannot > // find any i that satisfies (3.1-3.3), and so there is no value not less than > // lo that satisfies bits. We will expand the implication of such cases below. > // 0 1 1 0 0 0 0 0 > U tmp = ~either & find_mask; That is the smaller suggestion, I have a larger one below. > src/hotspot/share/opto/rangeinference.cpp line 370: > >> 368: // bits. We will expand the implication of such cases below. >> 369: // 0 1 1 0 0 0 0 0 >> 370: U tmp = ~either & find_mask; > > Suggestion: > > // We want to find the last i upto first_violation such that lo[i] == zeros[i] == 0. > // We start with all bits where lo[i] == zeros[i] == 0: > // 0 1 1 0 0 0 0 1 > U lo_eq_zeros_eq_zero = ~(lo | bits._zeros). > // Now let us find all the bit indices x upto first_violation such that > // lo[x] == zeros[x] == 0. The last one of these bits must be at index i. > // 0 1 1 0 0 0 0 0 > // Note that there may not exist such bits, i.e. tmp == 0. Hence we cannot > // find any i that satisfies (3.1-3.3), and so there is no value not less than > // lo that satisfies bits. We will expand the implication of such cases below. > U alignment_candidates = lo_eq_zeros_eq_zero & find_mask; The benefit here: we have more explicit names. And the wording matches a little closer to our goal above. > src/hotspot/share/opto/rangeinference.cpp line 387: > >> 385: // - new_lo[i] = 1 (2.6) >> 386: // - new_lo[x] = 0, for x > i (not yet 2.7) >> 387: // 1 0 1 0 0 0 0 0 > > That's a little misleading if there is no such `i`... > In that case `alignment = 0`, and `-alignment = 0`, and so `new_lo = 0`... which is not exactly an overflow, but smaller than `lo`, so kinda an overflow ;) Suggestion: // similar to aligning lo upto alignment. Also similar to the above case, // this computation cannot overflow. // We now have: // - new_lo[x] = lo[x], for 0 <= x < i (2.5) // - new_lo[i] = 1 (2.6) // - new_lo[x] = 0, for x > i (not yet 2.7) // If there is no such candidate, and no such i, then new_lo = 0. // 1 0 1 0 0 0 0 0 > src/hotspot/share/opto/rangeinference.cpp line 398: > >> 396: // This is the result we are looking for. >> 397: // 1 0 1 0 0 0 1 1 >> 398: new_lo |= bits._ones; > > Now we should probably also split the cases with and without candidates. > Suggestion: > > // Assume there was at least one candidate, and i is the index of the last one: > // Then there exists no value x not larger than i such that > // new_lo[x] == 0 and ones[x] == 1. This is because all bits of lo before i > // should satisfy bits, and new_lo[i] == 1. As a result, doing > // new_lo |= bits.ones will give us a value such that: > // - new_lo[x] = lo[x], for 0 <= x < i (2.5) > // - new_lo[i] = 1 (2.6) > // - new_lo[x] = ones[x], for x > i (2.7) > // This is the result r we are looking for. > // 1 0 1 0 0 0 1 1 > // If there was no candidate, then above we had new_lo = 0, and the > // computation below gives us new_lo = ones. > new_lo |= bits._ones; The "no candidate" case should now have an argument why this is an ok value to return. It does satisfy bits, since `ones` satisfy bits. But do we know that `ones < lo`? We need that to get the "overflow" we promised at the very top of the method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071695189 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071706651 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071722541 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071763058 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071788373 From kvn at openjdk.org Fri May 2 15:37:44 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 2 May 2025 15:37:44 GMT Subject: RFR: 8356084: C2: Data is wrongly rewired to Initialized Assertion Predicates instead of Template Assertion Predicates In-Reply-To: <9OvLoKhN1DbjqW9IpXAeQ-Xt-MwyBZXrseoIyXmjNqo=.eed50620-9c5d-4f2e-876c-c51e42fd8c2e@github.com> References: <9OvLoKhN1DbjqW9IpXAeQ-Xt-MwyBZXrseoIyXmjNqo=.eed50620-9c5d-4f2e-876c-c51e42fd8c2e@github.com> Message-ID: On Fri, 2 May 2025 14:14:26 GMT, Christian Hagedorn wrote: > Before the Assertion Predicate refactorings, we rewired data dependencies either to the newly created Initialized Assertion Predicates (for Loop Peeling) or to the zero trip guard (for main and post loops). Both was incomplete when we further split a loop - we missed to update these data dependencies accordingly. > > Now that the (almost) complete Assertion Predicate fix is in with [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577), we are now finally able to fix this by always rewiring the data dependencies to the Template Assertion Predicates which will be kept until either no more loop splitting can be done for a loop or until loop opts are over. > > We could have already fixed that with JDK-8350577 but it was simply missed. As an intermediate solution, we always rewired the data dependencies to the Initialized Assertion Predicates which only worked in some cases when the Initialized Assertion Predicates were folded away: They ended up at the Template Assertion Predicates above and from there we could update the data dependencies further. But if that did not happen, we could not find these data dependencies at the Template Assertion Predicates and failed to further update them when the loop was split again. As a result, we could perform some loads too early and crash (not observable, though). > > How we could end up with such a crash is described in the newly added regression test `testPeelingThreeTimesDataUpdate()`. Here is a snippet from the graph after applying Loop Peeling several times without the patch: > > ![image](https://github.com/user-attachments/assets/c40f5918-3ef4-4c4b-ab7d-dc4fdbf41fdf) > > All `LoadN` data dependencies are piled up at an Initialized Assertion Predicate from where we can no longer update them in further loop splitting optimizations because we only look at Template Assertion Predicates for that. By correctly rewiring the data dependencies to Template Assertion Predicates, we fix this which is proposed with this patch. > > This was found by a new stress peeling mode ([JDK-8355488](https://bugs.openjdk.org/browse/JDK-8355488)) @marc-chevalier > is currently working on. I was able to come up with a reproducer that does not use the new stressing but it shows that the new stressing is useful in finding hard to discover bugs. > > Thanks, > Christian Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25007#pullrequestreview-2812315637 From epeter at openjdk.org Fri May 2 15:42:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 2 May 2025 15:42:04 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v53] In-Reply-To: References: <4J2bhK1v1UcrBUtW7tSR6KR9lcYZ2IrUYxeic6vwfZg=.2c3489ac-dd4e-4a7b-97d6-5deae5223354@github.com> Message-ID: On Fri, 2 May 2025 11:39:32 GMT, Quan Anh Mai wrote: >> @merykitty Ok, now I'm through the whole thing. I am doing this review more thorough and nit-picky than others, because it is going to be quite a substantial change, and quite at the core of many IGVN optimizations going forward. >> >> Again: I'm really impressed by the bit tricks here, and very happy that we now have quite solid explanations / proofs. >> >> Once you tell me that you responded to all comments here, I can make another pass over everything :) > > @eme64 Please let me know if you disagree with any answer from me. I am fairly confident in this patch, especially with the exhaustive tests exercising `intn_t` values. After this patch, I will work on allowing the test infrastructure to work with `Type` instances directly, templatizing `TypeInt` and `TypeLong` so that we can work with `TypeInt>`. @merykitty Ok, that is all I can do this week, enjoy the weekend ? ? If I don't resume commenting on Monday then feel free to ping me with a reminder ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2847522055 From duke at openjdk.org Fri May 2 16:08:53 2025 From: duke at openjdk.org (Mohamed Issa) Date: Fri, 2 May 2025 16:08:53 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v8] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 12:07:28 GMT, Jatin Bhateja wrote: >> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: >> >> Switch to constant double fields with separate micro-benchmarks > > Over all the patch looks good to me now apart from concerns around benchmark, existing Java implementation handles special cases upfront, thereby compromising the performance of most common cases. Java implementation scores above intrinsic in two outlier ranges < 2^-55 and > 22. While intrinsic implementation is performant for a meaty generic range ie. > 2^-55 and < 22.0 > We get around 30% performance uplift from intrinsic implementation over java implementation for the bulky generic input range. > For ranges above 22.0, we now see better performance in comparison to the earlier intrinsic implementation. > > New benchmark shows clear gain for the value range [A][B][C] this patch optimizes. > > > Baseline: > ========= > Benchmark (tanhRangeIndex) Mode Cnt Score Error Units > TanhPerf.TanhPerfConstant.tanhConstDouble1 N/A thrpt 2 117588.175 ops/ms > TanhPerf.TanhPerfConstant.tanhConstDouble21 N/A thrpt 2 117550.954 ops/ms > TanhPerf.TanhPerfConstant.tanhConstDoubleLarge N/A thrpt 2 117580.385 ops/ms => A > TanhPerf.TanhPerfConstant.tanhConstDoubleSmall N/A thrpt 2 403652.485 ops/ms > TanhPerf.TanhPerfConstant.tanhConstDoubleTiny N/A thrpt 2 408909.294 ops/ms > TanhPerf.TanhPerfRanges.tanhNegRangeDouble 0 thrpt 2 397200.032 ops/ms > TanhPerf.TanhPerfRanges.tanhNegRangeDouble 1 thrpt 2 116082.297 ops/ms > TanhPerf.TanhPerfRanges.tanhNegRangeDouble 2 thrpt 2 112213.540 ops/ms > TanhPerf.TanhPerfRanges.tanhNegRangeDouble 3 thrpt 2 433899.459 ops/ms => B > TanhPerf.TanhPerfRanges.tanhPosDoubleRange 0 thrpt 2 396818.181 ops/ms > TanhPerf.TanhPerfRanges.tanhPosDoubleRange 1 thrpt 2 115886.117 ops/ms > TanhPerf.TanhPerfRanges.tanhPosDoubleRange 2 thrpt 2 112048.023 ops/ms > TanhPerf.TanhPerfRanges.tanhPosDoubleRange 3 thrpt 2 440250.930 ops/ms => C > > WithOpt: > ======== > Benchmark (tanhRangeIndex) Mode Cnt Score Error Units > TanhPerf.TanhPerfConstant.tanhConstDouble1 N/A thrpt 2 116459.753 ops/ms > TanhPerf.TanhPerfConstant.tanhConstDouble21 N/... > @jatin-bhateja @missa-prime Is https://bugs.openjdk.org/browse/JDK-8355238 related to this bug here? JDK-8355238 and this one are related because they occur when the tanh intrinsic was introduced. However, they are distinct because JDK-8355238 covers some small input values (e.g., |x| = 0.5) whereas JDK-8348638 covers large input values (|x| > 22). ------------- PR Comment: https://git.openjdk.org/jdk/pull/23889#issuecomment-2847591198 From duke at openjdk.org Fri May 2 16:11:50 2025 From: duke at openjdk.org (duke) Date: Fri, 2 May 2025 16:11:50 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v9] In-Reply-To: References: Message-ID: On Sat, 26 Apr 2025 01:06:55 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future. >> >> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. >> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. >> >> For the first set of performance data collected with the new built-in range micro-benchmark, see the tables below. Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In all scenarios, the changes increase throughput values over _baseline1_. The uplift over _baseline1_ is quite significant for the high value (100, 1000, 10000, 100000) scenarios. When comparing against _baseline2_, the changes have significant uplift with the lower value inputs (1, 2, 10, 20, 100). However, they significantly lag behind _baseline2_ when the high value inputs (1000, 10000, 100000) are used. >> >> | Input range(s) | Baseline1 (ops/s) | Change (ops/s) | Change vs baseline1 (%) | >> | :-------------------: | :-----------------: | :----------------: | :-------------------------: | >> | [-1, 1] | 103342 | 103705 | +0.35 | >> | [-2, 2] | 99977 | 100819 | +0.84 | >> | [-10, 10] | 99147 | 100240 | +1.10 | >> | [-20, 20] | 99419 | 99492 |... > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Create separate tanh micro-benchmark module to avoid noise in MathBench @missa-prime Your change (at version 006eef6ac677aab91bbf015c5f1cdf2266f796d5) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23889#issuecomment-2847598182 From qamai at openjdk.org Fri May 2 16:25:26 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 16:25:26 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v60] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: refine the cases where there does not exist a result ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/25a6f9b0..950a2662 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=59 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=58-59 Stats: 42 lines in 1 file changed: 6 ins; 3 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From hgreule at openjdk.org Fri May 2 16:25:26 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 2 May 2025 16:25:26 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v59] In-Reply-To: References: Message-ID: <47NDIPVnXOeKPHXEnlpX-QIrAtlBzUTyPVAhTEZbgXA=.9fe65309-37d7-4f8e-8b27-944a768b3b70@github.com> On Fri, 2 May 2025 14:52:27 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> add some more sanity static_asserts > > src/hotspot/share/opto/rangeinference.cpp line 378: > >> 376: // it directly without going through i. >> 377: // 0 0 1 0 0 0 0 0 >> 378: U alignment = tmp & (-tmp); > > Suggestion: > > // We now want to select the last one of these candidates, which is > // exactly the last index i upto first_violation such that lo[i] == zeros[i] == 0. > // In our example we have i == 2. > // 0 0 1 0 0 0 0 0 > U alignment = alignment_candidates & (-alignment_candidates); > > Ok, well what is still missing is how the bit trick with `-alignment_candidates` works here. > > Let me try: > > alignment_candidates = 0 1 1 0 0 0 0 0 > -alignment_candidates = 1 0 1 0 0 0 0 0 > > Hmm yeah I'm not sure about this one.... > I suppose I would have tried this via trailing zeros, but not sure about it... > > What is the argument here, why this extracts the last bit? I think it gets easier when splitting the `-` into a `~` and a `+ 1`. After `~`, the trailing zeros become ones, and adding 1 has a cascading effect of producing zeros again that is stopped by the lowest zero bit (which is a one in the original value, and now is becoming a one again). Due to the `~`, all higher bits are flipped, so they are just becoming zeros when `&`-ing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071855928 From qamai at openjdk.org Fri May 2 16:25:27 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 16:25:27 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v59] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 15:24:40 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> add some more sanity static_asserts > > src/hotspot/share/opto/rangeinference.cpp line 401: > >> 399: // In this case, new_lo may not always be a valid answer. This can happen >> 400: // if there is no bit upto first_violation that is 0 in both lo and zeros, >> 401: // i.e. tmp == 0. In such cases, alignment == 0 && lo == bits._ones. It is > > Suggestion: > > // i.e. tmp == 0. In such cases, alignment == 0 and new_lo == bits._ones. It is > > Or do we somehow also know that `lo = ones`? I think this was a typo, right? Yes it is a typo, I changed the wording here, though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071855272 From qamai at openjdk.org Fri May 2 16:35:28 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 16:35:28 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v61] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: refinement ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/950a2662..693cec2c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=60 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=59-60 Stats: 8 lines in 1 file changed: 3 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Fri May 2 16:35:29 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 16:35:29 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v59] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 14:04:07 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> add some more sanity static_asserts > > src/hotspot/share/opto/compile.cpp line 4505: > >> 4503: if (sizetype != nullptr && sizetype->_hi > 0) { >> 4504: index_max = sizetype->_hi - 1; >> 4505: } > > Maybe I asked this before: where does the `sizetype->_hi > 0` come from? If `sizetype->_hi == 0`, there is no index that can index into this array, so the path is dead. Furthermore, this function really wants a `TypeInt` so this is the cleanest fix for me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071867519 From qamai at openjdk.org Fri May 2 16:35:29 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 16:35:29 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v59] In-Reply-To: <6ky_9MDLXFrdtvVobVuxOoT4RadbG1jOslNkaS1x92s=.4104a1ef-3fab-4857-9aef-d8a7e26308c3@github.com> References: <6ky_9MDLXFrdtvVobVuxOoT4RadbG1jOslNkaS1x92s=.4104a1ef-3fab-4857-9aef-d8a7e26308c3@github.com> Message-ID: On Fri, 2 May 2025 15:30:55 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/rangeinference.cpp line 398: >> >>> 396: // This is the result we are looking for. >>> 397: // 1 0 1 0 0 0 1 1 >>> 398: new_lo |= bits._ones; >> >> Now we should probably also split the cases with and without candidates. >> Suggestion: >> >> // Assume there was at least one candidate, and i is the index of the last one: >> // Then there exists no value x not larger than i such that >> // new_lo[x] == 0 and ones[x] == 1. This is because all bits of lo before i >> // should satisfy bits, and new_lo[i] == 1. As a result, doing >> // new_lo |= bits.ones will give us a value such that: >> // - new_lo[x] = lo[x], for 0 <= x < i (2.5) >> // - new_lo[i] = 1 (2.6) >> // - new_lo[x] = ones[x], for x > i (2.7) >> // This is the result r we are looking for. >> // 1 0 1 0 0 0 1 1 >> // If there was no candidate, then above we had new_lo = 0, and the >> // computation below gives us new_lo = ones. >> new_lo |= bits._ones; > > The "no candidate" case should now have an argument why this is an ok value to return. > It does satisfy bits, since `ones` satisfy bits. > But do we know that `ones < lo`? We need that to get the "overflow" we promised at the very top of the method. You are diverting too much from the base assumption of this function. Formally, this function assumes that a result exists, which means that `i` exists, which leads to `tmp != 0`. The converse is also true, if `tmp != 0`, an index value `i` exists, which leads to a value not smaller than `lo` and satisfies `bits`. This implies that there does not exist one such value if and only if `tmp == 0`. In that case we know exactly that what we return satisfies bits. That's all we need to know in this section. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071864811 From qamai at openjdk.org Fri May 2 16:42:12 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 16:42:12 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v59] In-Reply-To: <47NDIPVnXOeKPHXEnlpX-QIrAtlBzUTyPVAhTEZbgXA=.9fe65309-37d7-4f8e-8b27-944a768b3b70@github.com> References: <47NDIPVnXOeKPHXEnlpX-QIrAtlBzUTyPVAhTEZbgXA=.9fe65309-37d7-4f8e-8b27-944a768b3b70@github.com> Message-ID: On Fri, 2 May 2025 16:22:24 GMT, Hannes Greule wrote: >> src/hotspot/share/opto/rangeinference.cpp line 378: >> >>> 376: // it directly without going through i. >>> 377: // 0 0 1 0 0 0 0 0 >>> 378: U alignment = tmp & (-tmp); >> >> Suggestion: >> >> // We now want to select the last one of these candidates, which is >> // exactly the last index i upto first_violation such that lo[i] == zeros[i] == 0. >> // In our example we have i == 2. >> // 0 0 1 0 0 0 0 0 >> U alignment = alignment_candidates & (-alignment_candidates); >> >> Ok, well what is still missing is how the bit trick with `-alignment_candidates` works here. >> >> Let me try: >> >> alignment_candidates = 0 1 1 0 0 0 0 0 >> -alignment_candidates = 1 0 1 0 0 0 0 0 >> >> Hmm yeah I'm not sure about this one.... >> I suppose I would have tried this via trailing zeros, but not sure about it... >> >> What is the argument here, why this extracts the last bit? > > I think it gets easier when splitting the `-` into a `~` and a `+ 1`. After `~`, the trailing zeros become ones, and adding 1 has a cascading effect of producing zeros again that is stopped by the lowest zero bit (which is a one in the original value, and now is becoming a one again). Due to the `~`, all higher bits are flipped, so they are just becoming zeros when `&`-ing. `-x == ~x + 1`. From the lowest set bit of `x`, we have `x == 0b...100...0`, `~x == 0b...011...1` and `~x + 1 == 0b...100...0`. Also, the higher bits are unchanged by the `+ 1`, so it is equivalent to `x & ~x`. As a result, the only bit remaining in `x & (-x)` is the last bit 1 of `x`. This is a pretty basic bit manipulation, though: https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set#BMI1 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071875022 From qamai at openjdk.org Fri May 2 16:42:13 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 2 May 2025 16:42:13 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v59] In-Reply-To: References: <6ky_9MDLXFrdtvVobVuxOoT4RadbG1jOslNkaS1x92s=.4104a1ef-3fab-4857-9aef-d8a7e26308c3@github.com> Message-ID: On Fri, 2 May 2025 16:29:59 GMT, Quan Anh Mai wrote: >> The "no candidate" case should now have an argument why this is an ok value to return. >> It does satisfy bits, since `ones` satisfy bits. >> But do we know that `ones < lo`? We need that to get the "overflow" we promised at the very top of the method. > > You are diverting too much from the base assumption of this function. Formally, this function assumes that a result exists, which means that `i` exists, which leads to `tmp != 0`. The converse is also true, if `tmp != 0`, an index value `i` exists, which leads to a value not smaller than `lo` and satisfies `bits`. This implies that there does not exist one such value if and only if `tmp == 0`. In that case we know exactly that what we return satisfies bits. That's all we need to know in this section. I changed the comment at the return point of this function to highlight this fact more clearly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2071877164 From sviswanathan at openjdk.org Fri May 2 17:01:53 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 2 May 2025 17:01:53 GMT Subject: RFR: 8354473: Incorrect results for compress/expand tests with -XX:+EnableX86ECoreOpts In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 03:54:09 GMT, Volodymyr Paprotski wrote: > It looks like the `permv` mask isnt always 'all-ones' or 'all-zeroes'. (Which is OK for real blend, but needs to be enforced via the flag for blend emulation) > > Before the fix, `make test TEST="jdk/incubator/vector"` (on ECore machine) > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP >>> jtreg:test/jdk/jdk/incubator/vector 83 71 10 0 2 << > ============================== > TEST FAILURE > > After the fix: > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP > jtreg:test/jdk/jdk/incubator/vector 83 81 0 0 2 > ============================== > TEST SUCCESS > > And on an AVX512 machine: > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP > jtreg:test/jdk/jdk/incubator/vector 83 81 0 0 2 > ============================== > TEST SUCCESS Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24645#pullrequestreview-2812496481 From sviswanathan at openjdk.org Fri May 2 17:05:47 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 2 May 2025 17:05:47 GMT Subject: RFR: 8354473: Incorrect results for compress/expand tests with -XX:+EnableX86ECoreOpts In-Reply-To: References: Message-ID: On Thu, 1 May 2025 06:17:44 GMT, Jatin Bhateja wrote: >>> @vpaprotsk Can you please give a little more details about what exactly went wrong here, and why your change is correct? >> >> @eme64 Thanks for looking. Point form in attempt to be concise: >> - Jatin brought this to my attention, we weren't sure whose code was at fault (i.e. I wrote the blend emulation, he wrote the compress_expand) and I got to the investigation first (i.e. see https://bugs.openjdk.org/browse/JDK-8354473) >> - The mask for vblendvps instruction; actual instruction only cares about the MSB but for emulation we must have the mask to be either `FFF..FF` or `000..00`. In many places blend is used, this is already the case, so no need to recompute the mask. That's why the flag is provided (i.e. optimization). >> - (Without fully understanding the entirety of compress_expand), it appears to me that in this function the mask in `permv` _must_ be computed explicitly. That's why the flag is changed. > >> > @vpaprotsk Can you please give a little more details about what exactly went wrong here, and why your change is correct? >> >> @eme64 Thanks for looking. Point form in attempt to be concise: >> >> * Jatin brought this to my attention, we weren't sure whose code was at fault (i.e. I wrote the blend emulation, he wrote the compress_expand) and I got to the investigation first (i.e. see https://bugs.openjdk.org/browse/JDK-8354473) >> * The mask for vblendvps instruction; actual instruction only cares about the MSB but for emulation we must have the mask to be either `FFF..FF` or `000..00`. In many places blend is used, this is already the case, so no need to recompute the mask. That's why the flag is provided (i.e. optimization). >> * (Without fully understanding the entirety of compress_expand), it appears to me that in this function the mask in `permv` _must_ be computed explicitly. That's why the flag is changed. > > Hi @vpaprotsk , @eme64, > > Just to fill in the missing details about compress/expand handling on AVX2, we maintain an in-memory lookup table of permutation indices corresponding to a mask value. Each row of lookup table either holds a valid permute index, which is a positive index value less than the vector lane count OR a -1 index. > > Since blend emulation always expects to operate over a blend mask vector whose lanes either hold a -1 or a 0 value hence there is a need to re-compose the desired blend mask by signed extending the MSB bits to fill the entire lane. Your fix to recompute the mask looks good to me. > > > Best Regards, > Jatin > @jatin-bhateja It seems the flag `-XX:+EnableX86ECoreOpts` only is enabled on some very specific machines. How important / wide spread are these machines? Will they become more wide spread over time? Or is this rather rare, and not worth investing too many resources? How does their importance compare to AVX and AVX2, or machines with only SSE2 or SSE4.1? Because we put a focus on SSE/AVX in internal testing, but I'm wondering if we should also test `EnableX86ECoreOpts` more. How does this flag interact with AVX features? Do ECore machines always have AVX2 for example? What would be good flag combinations here? Testing with EnableX86ECoreOpts would be good, these machines have AVX2. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24645#issuecomment-2847696929 From duke at openjdk.org Fri May 2 17:25:04 2025 From: duke at openjdk.org (Mohamed Issa) Date: Fri, 2 May 2025 17:25:04 GMT Subject: Integrated: 8348638: Performance regression in Math.tanh In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 09:44:32 GMT, Mohamed Issa wrote: > The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. Additionally, a new set of micro-benchmarks are included to check the performance of specific input value ranges to help prevent regressions in the future. > > 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. > 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. > > The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. The term _baseline1_ refers to runs with the intrinsic enabled and _baseline2_ refers to runs with the intrinsic disabled. > > For the first set of performance data collected with the new built-in range micro-benchmark, see the tables below. Each result is the mean of 8 individual runs, and the input ranges used match those in the bug report with two additional ones included. In all scenarios, the changes increase throughput values over _baseline1_. The uplift over _baseline1_ is quite significant for the high value (100, 1000, 10000, 100000) scenarios. When comparing against _baseline2_, the changes have significant uplift with the lower value inputs (1, 2, 10, 20, 100). However, they significantly lag behind _baseline2_ when the high value inputs (1000, 10000, 100000) are used. > > | Input range(s) | Baseline1 (ops/s) | Change (ops/s) | Change vs baseline1 (%) | > | :-------------------: | :-----------------: | :----------------: | :-------------------------: | > | [-1, 1] | 103342 | 103705 | +0.35 | > | [-2, 2] | 99977 | 100819 | +0.84 | > | [-10, 10] | 99147 | 100240 | +1.10 | > | [-20, 20] | 99419 | 99492 | +0.07 ... This pull request has now been integrated. Changeset: c8bbcaf5 Author: Mohamed Issa Committer: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/c8bbcaf5de6982f673504a8dc766fb80bb6f0d07 Stats: 178 lines in 2 files changed: 160 ins; 7 del; 11 mod 8348638: Performance regression in Math.tanh Reviewed-by: jbhateja, epeter, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/23889 From vpaprotski at openjdk.org Fri May 2 17:30:48 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Fri, 2 May 2025 17:30:48 GMT Subject: RFR: 8354473: Incorrect results for compress/expand tests with -XX:+EnableX86ECoreOpts In-Reply-To: References: Message-ID: On Thu, 1 May 2025 06:42:18 GMT, Emanuel Peter wrote: >>> > @vpaprotsk Can you please give a little more details about what exactly went wrong here, and why your change is correct? >>> >>> @eme64 Thanks for looking. Point form in attempt to be concise: >>> >>> * Jatin brought this to my attention, we weren't sure whose code was at fault (i.e. I wrote the blend emulation, he wrote the compress_expand) and I got to the investigation first (i.e. see https://bugs.openjdk.org/browse/JDK-8354473) >>> * The mask for vblendvps instruction; actual instruction only cares about the MSB but for emulation we must have the mask to be either `FFF..FF` or `000..00`. In many places blend is used, this is already the case, so no need to recompute the mask. That's why the flag is provided (i.e. optimization). >>> * (Without fully understanding the entirety of compress_expand), it appears to me that in this function the mask in `permv` _must_ be computed explicitly. That's why the flag is changed. >> >> Hi @vpaprotsk , @eme64, >> >> Just to fill in the missing details about compress/expand handling on AVX2, we maintain an in-memory lookup table of permutation indices corresponding to a mask value. Each row of lookup table either holds a valid permute index, which is a positive index value less than the vector lane count OR a -1 index. >> >> Since blend emulation always expects to operate over a blend mask vector whose lanes either hold a -1 or a 0 value hence there is a need to re-compose the desired blend mask by signed extending the MSB bits to fill the entire lane. Your fix to recompute the mask looks good to me. >> >> >> Best Regards, >> Jatin > > @jatin-bhateja It seems the flag `-XX:+EnableX86ECoreOpts` only is enabled on some very specific machines. How important / wide spread are these machines? Will they become more wide spread over time? Or is this rather rare, and not worth investing too many resources? How does their importance compare to AVX and AVX2, or machines with only SSE2 or SSE4.1? Because we put a focus on SSE/AVX in internal testing, but I'm wondering if we should also test `EnableX86ECoreOpts` more. How does this flag interact with AVX features? Do ECore machines always have AVX2 for example? What would be good flag combinations here? @eme64 Pinging as promised about tests results.. thanks! Re: `EnableX86ECoreOpts` testing.. the option currently 'protects' some fairly model-specific optimizations. Currently only `test/jdk/java/lang/String/IndexOf.java` calls out for it specifically, which leads to some 'false positives'.. or rather falsely blaming String.indexof. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24645#issuecomment-2847738656 From ayang at openjdk.org Fri May 2 18:41:53 2025 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 2 May 2025 18:41:53 GMT Subject: RFR: 8350621: Code cache stops scheduling GC In-Reply-To: References: Message-ID: On Sun, 16 Feb 2025 18:39:29 GMT, Alexandre Jacob wrote: > The purpose of this PR is to fix a bug where we can end up in a situation where the GC is not scheduled anymore by `CodeCache`. > > This situation is possible because the `_unloading_threshold_gc_requested` flag is set to `true` when triggering the GC and we expect the GC to call `CodeCache::on_gc_marking_cycle_finish` which in turn will call `CodeCache::update_cold_gc_count`, which will reset the flag `_unloading_threshold_gc_requested` allowing further GC scheduling. > > Unfortunately this can't work properly under certain circumstances. > For example, if using G1GC, calling `G1CollectedHeap::collect` does no give the guarantee that the GC will actually run as it can be already running (see [here](https://github.com/openjdk/jdk/blob/7d11418c820b46926a25907766d16083a4b349de/src/hotspot/share/gc/g1/g1CollectedHeap.cpp#L1763)). > > I have observed this behavior on JVM in version 21 that were migrated recently from java 17. > Those JVMs have some pressure on code cache and quite a large heap in comparison to allocation rate, which means that objects are mostly GC'd by young collections and full GC take a long time to happen. > > I have been able to reproduce this issue with ParallelGC and G1GC, and I imagine that other GC can be impacted as well. > > In order to reproduce this issue, I found a very simple and convenient way: > > > public class CodeCacheMain { > public static void main(String[] args) throws InterruptedException { > while (true) { > Thread.sleep(100); > } > } > } > > > Run this simple app with the following JVM flags: > > > -Xlog:gc*=info,codecache=info -Xmx512m -XX:ReservedCodeCacheSize=2496k -XX:StartAggressiveSweepingAt=15 > > > - 512m for the heap just to clarify the intent that we don't want to be bothered by a full GC > - low `ReservedCodeCacheSize` to put pressure on code cache quickly > - `StartAggressiveSweepingAt` can be set to 20 or 15 for faster bug reproduction > > Itself, the program will hardly get pressure on code cache, but the good news is that it is sufficient to attach a jconsole on it which will: > - allows us to monitor code cache > - indirectly generate activity on the code cache, just what we need to reproduce the bug > > Some logs related to code cache will show up at some point with GC activity: > > > [648.733s][info][codecache ] Triggering aggressive GC due to having only 14.970% free memory > > > And then it will stop and we'll end up with the following message: > > > [672.714s][info][codecache ] Code cache is full - disabling compilation > > > L... I have a question regarding the existing code/logic. // In case the GC is concurrent, we make sure only one thread requests the GC. if (Atomic::cmpxchg(&_unloading_threshold_gc_requested, false, true) == false) { log_info(codecache)("Triggering aggressive GC due to having only %.3f%% free memory", free_ratio * 100.0); Universe::heap()->collect(GCCause::_codecache_GC_aggressive); } Why making sure only one thread calls `collect(...)`? I believe this API can be invoked concurrently. Would removing `_unloading_threshold_gc_requested` resolve this problem? > I have been able to reproduce this issue with ParallelGC and G1GC, and I imagine that other GC can be impacted as well. For ParallelGC, `ParallelScavengeHeap::collect` contains the following to ensure `System.gc` gccause and similar ones guarantee a full-gc. if (!GCCause::is_explicit_full_gc(cause)) { return; } However, the current logic that a young-gc can cancel a full-gc (`_codecache_GC_aggressive` in this case) also seems surprising. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23656#issuecomment-2847860414 From pbk at openjdk.org Fri May 2 18:56:49 2025 From: pbk at openjdk.org (Peter B. Kessler) Date: Fri, 2 May 2025 18:56:49 GMT Subject: Integrated: 8354347: Increase the default padding size for aarch64 in JDK code. In-Reply-To: References: Message-ID: On Fri, 2 May 2025 00:49:45 GMT, Peter B. Kessler wrote: > Increase the default padding for C++ fields to avoid false sharing. This pull request has now been integrated. Changeset: 60ba81d7 Author: Peter B. Kessler URL: https://git.openjdk.org/jdk/commit/60ba81d77f0e299b8131cf23b1253689fa898e85 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8354347: Increase the default padding size for aarch64 in JDK code. Reviewed-by: aph, ecaspole ------------- PR: https://git.openjdk.org/jdk/pull/24994 From sviswanathan at openjdk.org Fri May 2 20:54:47 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 2 May 2025 20:54:47 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v9] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 11:31:01 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Refactoring code to create a seperate VM_Features class src/hotspot/cpu/x86/vm_version_x86.cpp line 464: > 462: __ movl(rcx, 0x18000000); // cpuid1 bits osxsave | avx > 463: __ andl(rcx, Address(rsi, 8)); // cpuid1 bits osxsave | avx > 464: __ jccb(Assembler::equal, done); // jump if AVX is not supported This doesn't not have same effect as before. Consider input is 0x10000000, the andl result will not be zero with this code and so jump to done will not happen. Whereas prior to this change, the cmpl with 0x18000000 will fail for equality and so a jump to done will happen. This is the case for all the places where we are checking more than 1 set bit. src/hotspot/cpu/x86/vm_version_x86.cpp line 468: > 466: __ movl(rax, 0x6); > 467: __ andl(rax, Address(rbp, in_bytes(VM_Version::xem_xcr0_offset()))); // xcr0 bits sse | ymm > 468: __ jccb(Assembler::notEqual, start_simd_check); // return if AVX is not supported See prior comment, need the cmpl and jmp here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2072134109 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2072136639 From vlivanov at openjdk.org Fri May 2 20:59:46 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 2 May 2025 20:59:46 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v9] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 11:31:01 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Refactoring code to create a seperate VM_Features class Jatin, are you done with the refactorings? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24329#issuecomment-2848107604 From sparasa at openjdk.org Fri May 2 22:23:00 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 2 May 2025 22:23:00 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v15] In-Reply-To: References: Message-ID: <1BuovDH-emkhBK02ZeqXsh24E7LyLb2cNbpfLn8M99I=.3853015b-2a18-4fd9-a85c-08881b632491@github.com> > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. Srinivas Vamsi Parasa has updated the pull request incrementally with six additional commits since the last revision: - replace evex_opcode_and_int16_nf with evex_opcode_prefix_and_encode - Remove unused functions - replace evex_opcode_and_int16_ndd with evex_opcode_prefix_and_encode_swap - replace evex_opcode_int24_ndd with evex_opcode_prefix_and_encode - Replace evex_opcode_int16_ndd with evex_opcode_prefix_encoding - refactor emit_arith and evex_prefix_int8_operand ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/6a01e747..a1bea4e4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=13-14 Stats: 180 lines in 2 files changed: 41 ins; 53 del; 86 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From sparasa at openjdk.org Fri May 2 22:38:25 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 2 May 2025 22:38:25 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v16] In-Reply-To: References: Message-ID: > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: add /*use_prefixq*/ comment next to boolean literal ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/a1bea4e4..386ebe41 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=14-15 Stats: 22 lines in 1 file changed: 0 ins; 0 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From sparasa at openjdk.org Fri May 2 22:48:02 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 2 May 2025 22:48:02 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v17] In-Reply-To: References: Message-ID: > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: more clarifying comments next to boolean literals ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/386ebe41..d1c1b077 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=15-16 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From sviswanathan at openjdk.org Fri May 2 23:14:51 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 2 May 2025 23:14:51 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v17] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 22:48:02 GMT, Srinivas Vamsi Parasa wrote: >> The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > more clarifying comments next to boolean literals src/hotspot/cpu/x86/assembler_x86.hpp line 807: > 805: > 806: void evex_prefix_int8_operand_ndd(Register dst, Register src1, Address src2, VexSimdPrefix pre, VexOpcode opc, > 807: InstructionAttr *attributes, int byte1, bool use_prefixq = false, bool no_flags = false, bool is_map1 = true); Looks like this function is not being used and could be removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2072228933 From sviswanathan at openjdk.org Fri May 2 23:37:47 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 2 May 2025 23:37:47 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v17] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 22:48:02 GMT, Srinivas Vamsi Parasa wrote: >> The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > more clarifying comments next to boolean literals src/hotspot/cpu/x86/assembler_x86.cpp line 4403: > 4401: emit_arith(0x0B, 0xC0, dst, src); > 4402: } > 4403: No need to add this function now. src/hotspot/cpu/x86/assembler_x86.cpp line 12979: > 12977: } > 12978: emit_operand(src1, src2, 0); > 12979: } This function could be removed, not used. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2072236026 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2072236681 From iveresov at openjdk.org Sat May 3 01:13:41 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Sat, 3 May 2025 01:13:41 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v11] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request incrementally with two additional commits since the last revision: - Fix additional issues - Make sure command line flags that affect MDO layout are consistent ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24886/files - new: https://git.openjdk.org/jdk/pull/24886/files/014b0ec5..9676039c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=09-10 Stats: 54 lines in 3 files changed: 52 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From qamai at openjdk.org Sat May 3 01:23:18 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 3 May 2025 01:23:18 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v62] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: alignment wording ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/693cec2c..56ffe4f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=61 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=60-61 Stats: 12 lines in 1 file changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From asmehra at openjdk.org Sat May 3 04:21:31 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Sat, 3 May 2025 04:21:31 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache Message-ID: [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. ------------- Commit messages: - Remove irrelevant comment - Fix win64 compile failures - Fix AOTCodeFlags.java test - Fix compile failure in minimal config - Revert back changes that added AOTRuntimeConstants. - Fix merge conflicts - Store/load AsmRemarks and DbgStrings in aot code cache - Add missing external address in aarch64 - 8354887: Preserve runtime blobs in AOT code cache Changes: https://git.openjdk.org/jdk/pull/25019/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25019&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354887 Stats: 1048 lines in 22 files changed: 815 ins; 132 del; 101 mod Patch: https://git.openjdk.org/jdk/pull/25019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25019/head:pull/25019 PR: https://git.openjdk.org/jdk/pull/25019 From iveresov at openjdk.org Sat May 3 05:25:35 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Sat, 3 May 2025 05:25:35 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v12] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Fix compile ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24886/files - new: https://git.openjdk.org/jdk/pull/24886/files/9676039c..2441ad71 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=10-11 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From jbhateja at openjdk.org Sat May 3 07:26:29 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 3 May 2025 07:26:29 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/a9258174..051c416c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=08-09 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From jbhateja at openjdk.org Sat May 3 07:32:46 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 3 May 2025 07:32:46 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v9] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 20:57:17 GMT, Vladimir Ivanov wrote: > Jatin, are you done with the refactorings? @iwanowww, I have addressed your comments. Let me know if you have further comments / feedback. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24329#issuecomment-2848484313 From jbhateja at openjdk.org Sat May 3 07:32:47 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 3 May 2025 07:32:47 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v9] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 20:47:01 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Refactoring code to create a seperate VM_Features class > > src/hotspot/cpu/x86/vm_version_x86.cpp line 464: > >> 462: __ movl(rcx, 0x18000000); // cpuid1 bits osxsave | avx >> 463: __ andl(rcx, Address(rsi, 8)); // cpuid1 bits osxsave | avx >> 464: __ jccb(Assembler::equal, done); // jump if AVX is not supported > > This doesn't not have same effect as before. Consider input is 0x10000000, the andl result will not be zero with this code and so jump to done will not happen. Whereas prior to this change, the cmpl with 0x18000000 will fail for equality and so a jump to done will happen. This is the case for all the places where we are checking more than 1 set bit. Thanks @sviswa7 , sub-optimality was mainly around single-bit comparisons, where we could save redundant CMP after AND, and by flipping the predicate of subsequent flag-consuming JMP, multibits compares should remain unaltered. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2072341101 From vlivanov at openjdk.org Sat May 3 07:44:48 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 3 May 2025 07:44:48 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: References: Message-ID: On Sat, 3 May 2025 07:26:29 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Ok, thanks! I wasn't sure you finished the pass. I'm still seeing dynamic memory allocation which IMO unnecessarily complicates the implementation. Bitmap size is fixed and well-known at compile time. It enables `VM_Feature` class to embed the array of proper size inline. And it eliminates all the problems related to undesired sharing of backed array. (Also, `pre_initialize()` is not needed as well.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24329#issuecomment-2848488960 From jbhateja at openjdk.org Sat May 3 07:45:45 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 3 May 2025 07:45:45 GMT Subject: RFR: 8354473: Incorrect results for compress/expand tests with -XX:+EnableX86ECoreOpts In-Reply-To: References: Message-ID: On Thu, 1 May 2025 06:17:44 GMT, Jatin Bhateja wrote: >>> @vpaprotsk Can you please give a little more details about what exactly went wrong here, and why your change is correct? >> >> @eme64 Thanks for looking. Point form in attempt to be concise: >> - Jatin brought this to my attention, we weren't sure whose code was at fault (i.e. I wrote the blend emulation, he wrote the compress_expand) and I got to the investigation first (i.e. see https://bugs.openjdk.org/browse/JDK-8354473) >> - The mask for vblendvps instruction; actual instruction only cares about the MSB but for emulation we must have the mask to be either `FFF..FF` or `000..00`. In many places blend is used, this is already the case, so no need to recompute the mask. That's why the flag is provided (i.e. optimization). >> - (Without fully understanding the entirety of compress_expand), it appears to me that in this function the mask in `permv` _must_ be computed explicitly. That's why the flag is changed. > >> > @vpaprotsk Can you please give a little more details about what exactly went wrong here, and why your change is correct? >> >> @eme64 Thanks for looking. Point form in attempt to be concise: >> >> * Jatin brought this to my attention, we weren't sure whose code was at fault (i.e. I wrote the blend emulation, he wrote the compress_expand) and I got to the investigation first (i.e. see https://bugs.openjdk.org/browse/JDK-8354473) >> * The mask for vblendvps instruction; actual instruction only cares about the MSB but for emulation we must have the mask to be either `FFF..FF` or `000..00`. In many places blend is used, this is already the case, so no need to recompute the mask. That's why the flag is provided (i.e. optimization). >> * (Without fully understanding the entirety of compress_expand), it appears to me that in this function the mask in `permv` _must_ be computed explicitly. That's why the flag is changed. > > Hi @vpaprotsk , @eme64, > > Just to fill in the missing details about compress/expand handling on AVX2, we maintain an in-memory lookup table of permutation indices corresponding to a mask value. Each row of lookup table either holds a valid permute index, which is a positive index value less than the vector lane count OR a -1 index. > > Since blend emulation always expects to operate over a blend mask vector whose lanes either hold a -1 or a 0 value hence there is a need to re-compose the desired blend mask by signed extending the MSB bits to fill the entire lane. Your fix to recompute the mask looks good to me. > > > Best Regards, > Jatin > > @jatin-bhateja It seems the flag `-XX:+EnableX86ECoreOpts` only is enabled on some very specific machines. How important / wide spread are these machines? Will they become more wide spread over time? Or is this rather rare, and not worth investing too many resources? How does their importance compare to AVX and AVX2, or machines with only SSE2 or SSE4.1? Because we put a focus on SSE/AVX in internal testing, but I'm wondering if we should also test `EnableX86ECoreOpts` more. How does this flag interact with AVX features? Do ECore machines always have AVX2 for example? What would be good flag combinations here? > > Testing with EnableX86ECoreOpts would be good, these machines have AVX2. >>> How important / wide spread are these machines? Will they become more wide spread over time? Or is this rather rare, and not worth investing too many resources? E-core Xeons, Sapphire Rapids is widely deployed by all major CSPs, -XX:+EnableX86ECoreOpts enables certain micro-architectural optimization for these systems and JIT code may be different than using -XX:UseAVX=2 on regular P-core Xeons (AVX512 family) targets. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24645#issuecomment-2848489508 From jbhateja at openjdk.org Sat May 3 07:54:46 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 3 May 2025 07:54:46 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: References: Message-ID: On Sat, 3 May 2025 07:41:43 GMT, Vladimir Ivanov wrote: > Ok, thanks! I wasn't sure you finished the pass. > > I'm still seeing dynamic memory allocation which IMO unnecessarily complicates the implementation. Bitmap size is fixed and well-known at compile time. It enables `VM_Feature` class to embed the array of proper size inline. And it eliminates all the problems related to undesired sharing of backed array. (Also, `pre_initialize()` is not needed as well.) pre_initialize was put in place because codeCache_init () proceeds VM_Version_init() and it makes calls to some assembler routines which checks for existinace of certain targets features. Its an ordering issue, pre_initialize simply allocates feature vector upfront to prevent crashing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24329#issuecomment-2848492777 From vlivanov at openjdk.org Sat May 3 07:54:47 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 3 May 2025 07:54:47 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: References: Message-ID: On Sat, 3 May 2025 07:26:29 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution src/hotspot/cpu/x86/vm_version_x86.cpp line 2867: > 2865: > 2866: uint64_t VM_Version::CpuidInfo::feature_flags() const { > 2867: uint64_t result = 0; It's unfortunate you migrated away from operating on a local copy. Why don't you declare a local copy (`VM_Version result`) and migrate bit manipulation to bit field accessors on it? `VM_Version::CpuidInfo::feature_flags()` can still return it by value (once you get rid of heap memory allocation, copying becomes trivial). src/hotspot/share/runtime/abstract_vm_version.hpp line 88: > 86: static VM_Features _dynamic_cpu_features; > 87: > 88: #define SET_CPU_FEATURE(feature) \ Why don't you supersede macros with instance methods on `VM_Version` instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2072344671 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2072343204 From jbhateja at openjdk.org Sat May 3 07:57:45 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 3 May 2025 07:57:45 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: References: Message-ID: On Sat, 3 May 2025 07:52:45 GMT, Jatin Bhateja wrote: > Ok, thanks! I wasn't sure you finished the pass. > > I'm still seeing dynamic memory allocation which IMO unnecessarily complicates the implementation. Bitmap size is fixed and well-known at compile time. It enables `VM_Feature` class to embed the array of proper size inline. And it eliminates all the problems related to undesired sharing of backed array. (Also, `pre_initialize()` is not needed as well.) I made it dynamic since to keep it flexible, but the bitmap size depends on maximum feature enum value. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24329#issuecomment-2848493614 From jbhateja at openjdk.org Sat May 3 08:08:46 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 3 May 2025 08:08:46 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: References: Message-ID: On Sat, 3 May 2025 07:52:21 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution > > src/hotspot/cpu/x86/vm_version_x86.cpp line 2867: > >> 2865: >> 2866: uint64_t VM_Version::CpuidInfo::feature_flags() const { >> 2867: uint64_t result = 0; > > It's unfortunate you migrated away from operating on a local copy. Why don't you declare a local copy (`VM_Version result`) and migrate bit manipulation to bit field accessors on it? `VM_Version::CpuidInfo::feature_flags()` can still return it by value (once you get rid of heap memory allocation, copying becomes trivial). New implimentation directly modify the feature vector bits though macros. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2072346669 From vlivanov at openjdk.org Sat May 3 08:17:49 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 3 May 2025 08:17:49 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: References: Message-ID: <3t1R35B9bafRtfvqfE7D2dAeLrjaDukXlDUGb-3VtaA=.46d64318-e9fb-4bf3-8a68-8dba2c2b7b26@github.com> On Sat, 3 May 2025 07:55:10 GMT, Jatin Bhateja wrote: > Bitmap size depends on the maximum feature enum value, I made it dynamic to keep it flexible. Do you want the feature vector size to be made constant and manually bump it when we exhaust the limit? Yes, please. (The limit may be precise - number of elements in Feature_Flag enum - but the logic which computes the size of backing array can automatically round it and bump the size once the actual limit is reached.) > pre_initialize was put in place because codeCache_init() proceeds VM_Version_init() I wanted to say that the sole purpose of `pre_initialize` is to allocate memory. Once it goes away, there's no reason to keep it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24329#issuecomment-2848507499 From vlivanov at openjdk.org Sat May 3 08:28:47 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 3 May 2025 08:28:47 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: References: Message-ID: On Sat, 3 May 2025 08:06:10 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/vm_version_x86.cpp line 2867: >> >>> 2865: >>> 2866: uint64_t VM_Version::CpuidInfo::feature_flags() const { >>> 2867: uint64_t result = 0; >> >> It's unfortunate you migrated away from operating on a local copy. Why don't you declare a local copy (`VM_Version result`) and migrate bit manipulation to bit field accessors on it? `VM_Version::CpuidInfo::feature_flags()` can still return it by value (once you get rid of heap memory allocation, copying becomes trivial). > > New implimentation directly modify the feature vector bits though macros. I prefer explicit accessor calls on corresponding instance fields. It's confusing to see `VM_Version::CpuidInfo::feature_flags()` implicitly modifying `_dynamic_features_vector` through macros. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2072349610 From jbhateja at openjdk.org Sat May 3 08:33:46 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 3 May 2025 08:33:46 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: References: Message-ID: On Sat, 3 May 2025 08:26:19 GMT, Vladimir Ivanov wrote: >> New implimentation directly modify the feature vector bits though macros. > > I prefer explicit accessor calls on corresponding instance fields. > > It's confusing to see `VM_Version::CpuidInfo::feature_flags()` implicitly modifying `_dynamic_features_vector` through macros. VM_Version::CpuidInfo::feature_flags() is local to x86 targets, how about changing its name to VM_Version::CpuidInfo::install_feature_flags() and use macros ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2072350359 From jkarthikeyan at openjdk.org Sat May 3 17:13:32 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Sat, 3 May 2025 17:13:32 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v12] In-Reply-To: References: Message-ID: > Hi all, > This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: > > > Baseline Patch > Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement > VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) > VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) > VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) > > > I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Whitespace and benchmark tweak ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23413/files - new: https://git.openjdk.org/jdk/pull/23413/files/8c00ef84..03ee1154 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=10-11 Stats: 3 lines in 2 files changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23413/head:pull/23413 PR: https://git.openjdk.org/jdk/pull/23413 From jkarthikeyan at openjdk.org Sat May 3 17:32:49 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Sat, 3 May 2025 17:32:49 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v12] In-Reply-To: References: Message-ID: <05AJmJd1G9_Z5TzYb6kuA1KcXqN96C2-ivfhnstgfCM=.aadfc52f-7748-4abb-a497-8f5049ab608b@github.com> On Sat, 3 May 2025 17:13:32 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: >> >> >> Baseline Patch >> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement >> VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) >> VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) >> VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) >> >> >> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Whitespace and benchmark tweak Thanks a lot for running the benchmark on your AVX512 machine! The results are very interesting, in the char cases it looks like we over-unroll the loop with SuperWord enabled even though we don't end up vectorizing the loop, fixing that could solve the slowdown. Since you mentioned the unroll amount was 32x, it might be unrolling to fill a vector (`512/sizeof(char) = 32`). > Wait, but you seem to say that you want to support `casting to T_CHAR`. But is the issue not casting FROM char? You are correct, I think that is my mistake. It looks like casting to char is supported because stores to both short and char become `StoreC`, but casting from char isn't supported because we have no `VectorCastC2X` node. I'll update the bug to make it more accurate. I've also pushed a small commit to remove some extra whitespace and to make the benchmark run faster. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2848723503 From kvn at openjdk.org Sat May 3 17:46:49 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 3 May 2025 17:46:49 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache In-Reply-To: References: Message-ID: On Sat, 3 May 2025 04:10:01 GMT, Ashutosh Mehra wrote: > [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. > This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. > `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. Few comments. src/hotspot/cpu/x86/x86_64.ad line 1868: > 1866: } else { > 1867: __ mov64(r10, (int64_t) $meth$$method); > 1868: } I think we should do it always, not conditionally. On AArch64 it is unconditional - relocation processing know how to do that. src/hotspot/share/asm/codeBuffer.hpp line 303: > 301: os::free((void*)_string); > 302: _string = nullptr; > 303: } Move it into `.cpp` so you don't need to include `os.hpp` here. src/hotspot/share/code/aotCodeCache.cpp line 27: > 25: #ifdef COMPILER1 > 26: #include "c1/c1_Runtime1.hpp" > 27: #endif Conditional includes are placed at the end of includes. src/hotspot/share/code/aotCodeCache.cpp line 414: > 412: log_debug(aot, codecache, init)("AOT Code Cache disabled: it was created with CompressedOops::base() = %p vs current %p", _compressedOopBase, CompressedOops::base()); > 413: return false; > 414: } I think we have relocation for CompressedOops::base() so we can patch. No need to bailout. Do you have stub/blob which missed relocation? src/hotspot/share/code/aotCodeCache.cpp line 422: > 420: log_debug(aot, codecache, init)("AOT Code Cache disabled: it was created with CompressedKlassPointers::base() = %p vs current %p", _compressedKlassBase, CompressedKlassPointers::base()); > 421: return false; > 422: } I would suggest to use relocation for klass's base too but not in these changes. I think bailout is fine here. src/hotspot/share/code/aotCodeCache.cpp line 774: > 772: return false; > 773: } > 774: log_info(aot, codecache, stubs)("Writing blob '%s' (id=%u, kind=%s) to AOT Code Cache", name, id, AOTCodeEntry::kind_string(entry_kind)); Please, use `log_debug()` in final changes. src/hotspot/share/code/aotCodeCache.cpp line 880: > 878: CodeBlob* blob = reader.compile_code_blob(name, entry_offset_count, entry_offsets); > 879: > 880: log_info(aot, codecache, stubs)("Read blob '%s' (id=%u, kind=%s) from AOT Code Cache", name, id, AOTCodeEntry::kind_string(entry_kind)); Use `log_debug()` src/hotspot/share/code/aotCodeCache.cpp line 1119: > 1117: uint n = write_bytes(&offset, sizeof(uint)); > 1118: if (n != sizeof(uint)) { > 1119: return false; Consider using `id_for_C_string()` and record ID instead of coping string. These strings should be recorded in C strings table already. If `id_for_C_string()` does not find - assert. We should add `add_C_string()` in missing place. src/hotspot/share/code/aotCodeCache.cpp line 1158: > 1156: log_trace(aot, codecache, stubs)("dbg string=%s", str); > 1157: uint len = (uint)strlen(str) + 1; // including '\0' char > 1158: uint n = write_bytes(str, len); Same here src/hotspot/share/opto/runtime.cpp line 161: > 159: C2_STUB_C_FUNC(name), \ > 160: C2_STUB_NAME(name), \ > 161: (int)C2_STUB_ID(name), \ Please, align `` src/hotspot/share/opto/runtime.cpp line 175: > 173: C2_JVMTI_STUB_C_FUNC(name), \ > 174: C2_STUB_NAME(name), \ > 175: (int)C2_STUB_ID(name), \ Here too. ------------- PR Review: https://git.openjdk.org/jdk/pull/25019#pullrequestreview-2813266337 PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2072427336 PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2072427806 PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2072431635 PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2072433399 PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2072433955 PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2072434259 PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2072434351 PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2072435358 PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2072435774 PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2072428528 PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2072428575 From kvn at openjdk.org Sat May 3 17:46:50 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 3 May 2025 17:46:50 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache In-Reply-To: References: Message-ID: On Sat, 3 May 2025 16:44:52 GMT, Vladimir Kozlov wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > src/hotspot/cpu/x86/x86_64.ad line 1868: > >> 1866: } else { >> 1867: __ mov64(r10, (int64_t) $meth$$method); >> 1868: } > > I think we should do it always, not conditionally. On AArch64 it is unconditional - relocation processing know how to do that. I will update `premain` code too later. > src/hotspot/share/asm/codeBuffer.hpp line 303: > >> 301: os::free((void*)_string); >> 302: _string = nullptr; >> 303: } > > Move it into `.cpp` so you don't need to include `os.hpp` here. Constructor too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2072427496 PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2072427920 From dnsimon at openjdk.org Sat May 3 19:37:48 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Sat, 3 May 2025 19:37:48 GMT Subject: RFR: 8355896: Lossy narrowing cast of JVMCINMethodData::size In-Reply-To: <_NK25ix3znr_ZqssJTXhXQ1-BqHNq_d3toJSGQ9S1mU=.a5279c90-4441-4a70-943e-c9c5f97d9252@github.com> References: <_NK25ix3znr_ZqssJTXhXQ1-BqHNq_d3toJSGQ9S1mU=.a5279c90-4441-4a70-943e-c9c5f97d9252@github.com> Message-ID: On Wed, 30 Apr 2025 13:10:19 GMT, Boris Ulasevich wrote: > In https://github.com/openjdk/jdk/pull/21276 mutable_data, which includes relocations, metadata, and jvmci_data, was moved to a separately malloc'ed blob. The nmethod (a CodeBlob) holds a pointer to the mutable_data blob and stores its internal offsets. > > As part of that change, I reused the former uint16_t offset field to store jvmci_data_size. This turned out to be incorrect, since jvmci_data can exceed 64 KB (as shown in https://github.com/openjdk/jdk/pull/24753). > > The most direct fix would be to change jvmci_data_size to uint, placing it alongside other int fields to avoid padding. However, in fact on my build this increases the size of the nmethod structure from 240 to 248 bytes, which I would prefer to avoid. > > Instead, I propose storing metadata_size in the existing uint16_t field. The average metadata_size is approximately 140 bytes, and the maximum observed in practice is around 4 KB. While, like oops_size, this value is not formally guaranteed to remain below 64 KB, no cases have been observed where this limit is exceeded. A GUARANTEE check is included to immediately catch any overflow if it ever occurs. > > Testing: in progress. LGTM. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24965#pullrequestreview-2813293227 From kvn at openjdk.org Sat May 3 22:47:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 3 May 2025 22:47:55 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 13:18:33 GMT, Marc Chevalier wrote: > A first part toward a better support of pure functions. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! > > IR framework and IGV needed a little bit of fixing. > > Thanks, > Marc Hi @marc-chevalier > doesn't propose a way to move pure calls around I agree that we should not do that in these changes. But did you consider to move/clone such call (new macro node) **down** to "users" in case the result is not used on some paths? They will be executed only where they are needed. And I think it is safe since current control dominates paths where the result is used. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2848841268 From kvn at openjdk.org Sat May 3 23:35:45 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 3 May 2025 23:35:45 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache In-Reply-To: References: Message-ID: On Sat, 3 May 2025 04:10:01 GMT, Ashutosh Mehra wrote: > [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. > This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. > `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. We need to do something about Compressed Klass base: java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:+AOTAdapterCaching -XX:+AOTStubCaching -Xlog:aot+codecache+init=debug -cp hello.jar HelloWorld [0.009s][debug][aot,codecache,init] Mapped 548864 bytes at address 0x00007f17b1611000 at AOT Code Cache [0.009s][info ][aot,codecache,init] Loaded 384 AOT code entries from AOT Code Cache [0.009s][debug][aot,codecache,init] Adapters: total=316 [0.009s][debug][aot,codecache,init] Shared Blobs: total=14 [0.009s][debug][aot,codecache,init] C1 Blobs: total=34 [0.009s][debug][aot,codecache,init] C2 Blobs: total=20 [0.009s][debug][aot,codecache,init] AOT code cache size: 543392 bytes [0.009s][debug][aot,codecache,init] Loaded 1 C strings of total length 28 at offset 521860 from AOT Code Cache [0.010s][debug][aot,codecache,init] AOT Code Cache disabled: it was created with CompressedKlassPointers::base() = 0x7f56c6000000 vs current 0x7f1762000000 [0.010s][info ][aot,codecache,init] Unable to use AOT Code Cache. Hellow World! Which blob/stubs decompress/compress klass using the base? May be we should use Relocation for it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2848857110 PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2848857305 PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2848857738 From jbhateja at openjdk.org Sun May 4 07:54:55 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 4 May 2025 07:54:55 GMT Subject: RFR: 8351950: C2: masked vector MIN/MAX AVX512: SIGFPE / no valid evex tuple_table entry Message-ID: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> PR adds missing EVEX compressed displacement attributes used for computing the scale factor (N) of compressed displacement. AVX512 memory operand instructions use compressed disp8 encoding if the displacement is a multiple of scale (N), which depends on Vector Length, embedded broadcasting, and lane size. Please refer to section 2.7.5 of Intel SDM for more details. e.g., Consider two instructions, one with displacement 0x10203040 and the other with displacement 0x40, instruction operates over full 64-byte vector hence scale N = 64. Displacement of latter instruction is a multiple of scale, thus can be represented by 1 byte displacement encoding, while the former requires 4 bytes to represent displacement in instruction encoding. 1) vpternlogq $0xff,0x10203040(%r20,%r21,8),%zmm23,%zmm24 EVEX OP MR SIB DISP IMM --------------|----|----|----|---------------|-----| 62 6b c1 40 25 84 ec 40 30 20 10 ff 2) vpternlogq $0xff,0x40(%r20,%r21,8),%zmm23,%zmm24 For full vector width operation, scalar matches with vector size, hence scale N = 64 effective displacement / compressed DISP8 = OFFSET(64) / 64 = 0x1 EVEX OP MR SIB DISP IMM -------------|----|---|---|-----------|---| 62 6b c1 40 25 44 ec 01 ff Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - 8351950: C2: masked vector MIN/MAX AVX512: SIGFPE / no valid evex tuple_table entry Changes: https://git.openjdk.org/jdk/pull/25021/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25021&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351950 Stats: 4047 lines in 37 files changed: 4046 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25021.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25021/head:pull/25021 PR: https://git.openjdk.org/jdk/pull/25021 From jbhateja at openjdk.org Mon May 5 03:57:22 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 5 May 2025 03:57:22 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v11] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Reveiw comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/051c416c..b314ed0e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=09-10 Stats: 376 lines in 22 files changed: 25 ins; 68 del; 283 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From jbhateja at openjdk.org Mon May 5 03:57:22 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 5 May 2025 03:57:22 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v11] In-Reply-To: References: Message-ID: <9d9DVuqRAeb_8kiEwkPQH6g2eBU5Jc_5ZSBAi1in9X0=.1d955598-f466-46ff-8b1f-71c87abd6313@github.com> On Sat, 3 May 2025 08:26:19 GMT, Vladimir Ivanov wrote: >> New implimentation directly modify the feature vector bits though macros. > > I prefer explicit accessor calls on corresponding instance fields. > > It's confusing to see `VM_Version::CpuidInfo::feature_flags()` implicitly modifying `_dynamic_features_vector` through macros. I have changed this local rountine name to install_feature_flags to confirm to its semantics ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2072818174 From jbhateja at openjdk.org Mon May 5 04:06:02 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 5 May 2025 04:06:02 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v12] In-Reply-To: References: Message-ID: <2ioSQVtfXhnqvAXqiadwR1HuJsz3t9nytY0wRps-x68=.35220ade-0e70-41c6-9ebd-a271e7dcb2bb@github.com> > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains two new commits since the last revision: - Updating comment - Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/b314ed0e..7b414b8c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=10-11 Stats: 13 lines in 4 files changed: 0 ins; 8 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From amitkumar at openjdk.org Mon May 5 04:22:47 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 5 May 2025 04:22:47 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 08:01:49 GMT, Martin Doerr wrote: > Interesting. Thanks for finding it out! So, this makes the behavior different to all other platforms which write all bytes before the address which is not writable. I think the behaviour is still same with the C++ implementation. There might some more checks in C++, which tries to give better performance for specific `sizes`. But if store is unaligned then C++ implementation will also choose `mvc` instruction - which again fill the memory in same fashion as it is going to do now. With that being said, if you have further question, then I can try to find answers, Otherwise a approval will be nice to have ;-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2849872337 From chagedorn at openjdk.org Mon May 5 06:15:46 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 May 2025 06:15:46 GMT Subject: RFR: 8356084: C2: Data is wrongly rewired to Initialized Assertion Predicates instead of Template Assertion Predicates In-Reply-To: <9OvLoKhN1DbjqW9IpXAeQ-Xt-MwyBZXrseoIyXmjNqo=.eed50620-9c5d-4f2e-876c-c51e42fd8c2e@github.com> References: <9OvLoKhN1DbjqW9IpXAeQ-Xt-MwyBZXrseoIyXmjNqo=.eed50620-9c5d-4f2e-876c-c51e42fd8c2e@github.com> Message-ID: <5CiPF4wz7WOJCkZQMY8jzcbGHt0KfjmhzzZnK_7pjpM=.c6edc2c0-6df1-40d3-b8ac-fce12ce85d54@github.com> On Fri, 2 May 2025 14:14:26 GMT, Christian Hagedorn wrote: > Before the Assertion Predicate refactorings, we rewired data dependencies either to the newly created Initialized Assertion Predicates (for Loop Peeling) or to the zero trip guard (for main and post loops). Both was incomplete when we further split a loop - we missed to update these data dependencies accordingly. > > Now that the (almost) complete Assertion Predicate fix is in with [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577), we are now finally able to fix this by always rewiring the data dependencies to the Template Assertion Predicates which will be kept until either no more loop splitting can be done for a loop or until loop opts are over. > > We could have already fixed that with JDK-8350577 but it was simply missed. As an intermediate solution, we always rewired the data dependencies to the Initialized Assertion Predicates which only worked in some cases when the Initialized Assertion Predicates were folded away: They ended up at the Template Assertion Predicates above and from there we could update the data dependencies further. But if that did not happen, we could not find these data dependencies at the Template Assertion Predicates and failed to further update them when the loop was split again. As a result, we could perform some loads too early and crash (not observable, though). > > How we could end up with such a crash is described in the newly added regression test `testPeelingThreeTimesDataUpdate()`. Here is a snippet from the graph after applying Loop Peeling several times without the patch: > > ![image](https://github.com/user-attachments/assets/c40f5918-3ef4-4c4b-ab7d-dc4fdbf41fdf) > > All `LoadN` data dependencies are piled up at an Initialized Assertion Predicate from where we can no longer update them in further loop splitting optimizations because we only look at Template Assertion Predicates for that. By correctly rewiring the data dependencies to Template Assertion Predicates, we fix this which is proposed with this patch. > > This was found by a new stress peeling mode ([JDK-8355488](https://bugs.openjdk.org/browse/JDK-8355488)) @marc-chevalier > is currently working on. I was able to come up with a reproducer that does not use the new stressing but it shows that the new stressing is useful in finding hard to discover bugs. > > Thanks, > Christian Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25007#issuecomment-2850004686 From chagedorn at openjdk.org Mon May 5 06:15:48 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 May 2025 06:15:48 GMT Subject: RFR: 8354284: Add more compiler test folders to tier1 runs [v2] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 08:43:23 GMT, Marc Chevalier wrote: >> Some folders in jtreg/compiler have been reported not to be run in any tier, while tier1 was probably intended, but the tier definition was mistakenly not updated. I've checked which folders are not referenced into `TEST.groups`. >> >> The unmentioned ones: >> - `ccp` >> - `ciReplay` >> - `ciTypeFlow` >> - `compilercontrol` >> - `debug` >> - `oracle` >> - `predicates` >> - `print` >> - `relocations` >> - `sharedstubs` >> - `splitif` >> - `tiered` >> - `whitebox` >> >> And those, that are not test folders: >> - `lib` >> - `patches` >> - `testlibraries` >> >> I'm adding `ccp`, `ciTypeFlow`, `predicates`, `sharedstubs` and `splitif` to tier1. >> >> The other folders seems to have been around for very long (since at least mid-2021). It's not clear how meaningful it'd be to add them/what the intent from them was. I've rather focused on the recently(-ish) added folders, that one forgot to put in a tier when adding it. >> >> Feel free to tell if other folders should be included (and in which tier). >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > speed up slowest test Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24817#pullrequestreview-2813874515 From epeter at openjdk.org Mon May 5 06:42:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 May 2025 06:42:08 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v62] In-Reply-To: References: Message-ID: On Sat, 3 May 2025 01:23:18 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > alignment wording Some more minor comments around `adjust_lo`. src/hotspot/share/opto/rangeinference.cpp line 366: > 364: // We start with all bits where lo[x] == zeros[x] == 0: > 365: // 0 1 1 0 0 0 0 1 > 366: U either = lo | bits._zeros; I can let this one slide, but the code gives you exactly the negation of what the text describes. The reader might be confused about this, and have to figure out that the bits are all inverted. Not horrible, but not hard to fix either. Up to you. src/hotspot/share/opto/rangeinference.cpp line 370: > 368: // lo[x] == zeros[x] == 0. The last one of these bits must be at index i. > 369: // 0 1 1 0 0 0 0 0 > 370: U tmp = ~either & find_mask; To me a variable name `tmp` smells a little. I prefer expressive names. Up to you :) src/hotspot/share/opto/rangeinference.cpp line 379: > 377: // In our example, i == 2 > 378: // 0 0 1 0 0 0 0 0 > 379: U alignment = tmp & (-tmp); This line is still magic. Most compiler devs I know do not see this as "standard" math. Could be nice to at least refer to something one could find online on this. I did sleep over it and had a proof in mind: What do we know about `-tmp`? It cannot have any bits after the last bit set in `bit`, otherwise those bits would not zero out in `tmp + -tmp`. `-tmp` must have the same last bit set as `tmp`, otherwise it would not cancle out. The addition of those bits create a carry bit, that must be cancled out all the way up. This means that the bits before that last bit must be set either exactly in `tmp` or in `-tmp`, but certainly not in both, otherwise the carry bit would not be cancled away. Hence, only that last bit remains in `tmp & (-tmp)`. ------------- PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2813893517 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2072898993 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2072900385 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2072904495 From epeter at openjdk.org Mon May 5 06:42:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 May 2025 06:42:08 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v59] In-Reply-To: References: <6ky_9MDLXFrdtvVobVuxOoT4RadbG1jOslNkaS1x92s=.4104a1ef-3fab-4857-9aef-d8a7e26308c3@github.com> Message-ID: <5ymbffEtZDVcRc3t0hL3PJucIMlP8RW-DLPh8Oi_tlc=.696f390a-224c-42f1-9a4e-fd90764ba157@github.com> On Fri, 2 May 2025 16:38:59 GMT, Quan Anh Mai wrote: >> You are diverting too much from the base assumption of this function. Formally, this function assumes that a result exists, which means that `i` exists, which leads to `tmp != 0`. The converse is also true, if `tmp != 0`, an index value `i` exists, which leads to a value not smaller than `lo` and satisfies `bits`. This implies that there does not exist one such value if and only if `tmp == 0`. In that case we know exactly that what we return satisfies bits. That's all we need to know in this section. > > I changed the comment at the return point of this function to highlight this fact more clearly. To me it seemed you mentioned the case that there may be no such i from the beginning, where we have to return a value < lo. But I think it is ok now with your comments below as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2072906513 From mchevalier at openjdk.org Mon May 5 06:44:44 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 5 May 2025 06:44:44 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 13:18:33 GMT, Marc Chevalier wrote: > A first part toward a better support of pure functions. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! > > IR framework and IGV needed a little bit of fixing. > > Thanks, > Marc I've considered it, but rather for a follow-up. My thought was to first introduce the node types, removal mechanics and such, but keep it pined by control and not touch that in this change. In the follow-up, I was hoping I would have "just" the control-pinning problem to address. Moving the calls down may be beneficial in case the result is not used in a branch (and then we save the call when executing the branch not using it), but if the usage is in a loop, we rather want the call to stay (or be hoisted) before the loop. The heuristic "out of as many loops as possible, and the later possible" seems to also apply here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2850052986 From mchevalier at openjdk.org Mon May 5 06:46:45 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 5 May 2025 06:46:45 GMT Subject: RFR: 8354284: Add more compiler test folders to tier1 runs [v2] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 08:43:23 GMT, Marc Chevalier wrote: >> Some folders in jtreg/compiler have been reported not to be run in any tier, while tier1 was probably intended, but the tier definition was mistakenly not updated. I've checked which folders are not referenced into `TEST.groups`. >> >> The unmentioned ones: >> - `ccp` >> - `ciReplay` >> - `ciTypeFlow` >> - `compilercontrol` >> - `debug` >> - `oracle` >> - `predicates` >> - `print` >> - `relocations` >> - `sharedstubs` >> - `splitif` >> - `tiered` >> - `whitebox` >> >> And those, that are not test folders: >> - `lib` >> - `patches` >> - `testlibraries` >> >> I'm adding `ccp`, `ciTypeFlow`, `predicates`, `sharedstubs` and `splitif` to tier1. >> >> The other folders seems to have been around for very long (since at least mid-2021). It's not clear how meaningful it'd be to add them/what the intent from them was. I've rather focused on the recently(-ish) added folders, that one forgot to put in a tier when adding it. >> >> Feel free to tell if other folders should be included (and in which tier). >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > speed up slowest test Thanks @lmesnik @vnkozlov and @chhagedorn for comments and reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24817#issuecomment-2850055256 From duke at openjdk.org Mon May 5 06:46:45 2025 From: duke at openjdk.org (duke) Date: Mon, 5 May 2025 06:46:45 GMT Subject: RFR: 8354284: Add more compiler test folders to tier1 runs [v2] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 08:43:23 GMT, Marc Chevalier wrote: >> Some folders in jtreg/compiler have been reported not to be run in any tier, while tier1 was probably intended, but the tier definition was mistakenly not updated. I've checked which folders are not referenced into `TEST.groups`. >> >> The unmentioned ones: >> - `ccp` >> - `ciReplay` >> - `ciTypeFlow` >> - `compilercontrol` >> - `debug` >> - `oracle` >> - `predicates` >> - `print` >> - `relocations` >> - `sharedstubs` >> - `splitif` >> - `tiered` >> - `whitebox` >> >> And those, that are not test folders: >> - `lib` >> - `patches` >> - `testlibraries` >> >> I'm adding `ccp`, `ciTypeFlow`, `predicates`, `sharedstubs` and `splitif` to tier1. >> >> The other folders seems to have been around for very long (since at least mid-2021). It's not clear how meaningful it'd be to add them/what the intent from them was. I've rather focused on the recently(-ish) added folders, that one forgot to put in a tier when adding it. >> >> Feel free to tell if other folders should be included (and in which tier). >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > speed up slowest test @marc-chevalier Your change (at version 3232e5b8b2424ee75683fbf387fead6c016987d3) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24817#issuecomment-2850056181 From jbhateja at openjdk.org Mon May 5 06:51:48 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 5 May 2025 06:51:48 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 06:56:06 GMT, Jatin Bhateja wrote: >> @jatin-bhateja Thanks for the updates! I have a few more requests :) > > Hi @eme64 , I have addressed and responded to your comments, please verify. > @jatin-bhateja Thanks for the updates! I think I now understand everything except this, so we are making good progress ? > > ``` > // For upper bound estimation of result value range with a constant input we > // pessimistically pick max_int value to prevent incorrect constant folding > // in case input equals above estimated lower bound. > hi = src_type->hi_as_long() == lo ? hi : src_type->hi_as_long(); > hi = result_bit_width < mask_bit_width ? (1L << result_bit_width) - 1 : hi; > ``` > > Can you please explain it with an example, and walk me through the steps to the incorrect constant folding? Let's assume the following - The input was a constant value Integer.MIN_VALUE, hence ideal type TypeInt will have both _lo and _hi set to MIN_VALUE, - Currently, _lo value of result value range flip b/w 0 or MIN_VALUE, lets take that to be MIN_VALUE in our case. Earlier _hi value of the result value range was set to _hi value of the source value range. ` hi = mask_max_bw < max_bw ? (1L << mask_max_bw) - 1 : src_type->hi_as_long(); ` If the result bit width was less than the maximum bit width of the integral type, in that case both _hi and _lo values were being set to MIN_VALUE resulting into a constant value. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2850066186 From mchevalier at openjdk.org Mon May 5 06:59:51 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 5 May 2025 06:59:51 GMT Subject: Integrated: 8354284: Add more compiler test folders to tier1 runs In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:44:04 GMT, Marc Chevalier wrote: > Some folders in jtreg/compiler have been reported not to be run in any tier, while tier1 was probably intended, but the tier definition was mistakenly not updated. I've checked which folders are not referenced into `TEST.groups`. > > The unmentioned ones: > - `ccp` > - `ciReplay` > - `ciTypeFlow` > - `compilercontrol` > - `debug` > - `oracle` > - `predicates` > - `print` > - `relocations` > - `sharedstubs` > - `splitif` > - `tiered` > - `whitebox` > > And those, that are not test folders: > - `lib` > - `patches` > - `testlibraries` > > I'm adding `ccp`, `ciTypeFlow`, `predicates`, `sharedstubs` and `splitif` to tier1. > > The other folders seems to have been around for very long (since at least mid-2021). It's not clear how meaningful it'd be to add them/what the intent from them was. I've rather focused on the recently(-ish) added folders, that one forgot to put in a tier when adding it. > > Feel free to tell if other folders should be included (and in which tier). > > Thanks, > Marc This pull request has now been integrated. Changeset: 69d0f7a3 Author: Marc Chevalier Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/69d0f7a3954048da358bd2ac5ab458fb37fa25a6 Stats: 9 lines in 2 files changed: 6 ins; 1 del; 2 mod 8354284: Add more compiler test folders to tier1 runs Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/24817 From epeter at openjdk.org Mon May 5 07:14:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 May 2025 07:14:06 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v62] In-Reply-To: References: Message-ID: On Sat, 3 May 2025 01:23:18 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > alignment wording Just looked over the rest. Wow, now we are very very close, exciting :) src/hotspot/share/opto/rangeinference.cpp line 480: > 478: if (h < ~bounds._hi) { > 479: return AdjustResult>::make_empty(); > 480: } Another nit: I feel like the "overflow" case here would not have to spill outside of `adjust_lo`. And `Optional` style return value would make more sense for the reader at this point, then the reader does not have to worry about why we do a comparison here, and does not have to dive deeper into `adjust_lo`. I leave this up to you though. src/hotspot/share/opto/rangeinference.cpp line 717: > 715: {MAX2(i1->_ulo, i2->_ulo), MIN2(i1->_uhi, i2->_uhi)}, > 716: {i1->_bits._zeros | i2->_bits._zeros, i1->_bits._ones | i2->_bits._ones}}, > 717: MIN2(i1->_widen, i2->_widen), true); You need to at least indent the `meet` comment. I would also prefer if you had a `else` block and indent there equally, just for optical balance. But I leave that up to you ;) src/hotspot/share/utilities/intn_t.hpp line 39: > 37: // nbits == 16 gives a type equivalent to int16_t, and so on. This class may be > 38: // used to verify the correctness of an algorithm that is supposed to be > 39: // applicable to all fixed-width integral types. With a few bits, it makes it Suggestion: // applicable to all fixed-width integral types. With small nbits, it makes it src/hotspot/share/utilities/intn_t.hpp line 44: > 42: // Implementation-wise, this class currently only supports 0 < nbits <= 8. Also > 43: // note that this class is implemented so that overflows in alrithmetic > 44: // operations are well-defined and wrap-around. Suggestion: // operations are well-defined and wrap-around, just like jint, juint, jlong and julong. ------------- PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2813916929 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2072911927 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2072925047 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2072935663 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2072936373 From epeter at openjdk.org Mon May 5 07:14:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 May 2025 07:14:07 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v54] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 00:55:07 GMT, Quan Anh Mai wrote: >> Hmm ok. I think someone with a deeper knowledge of the type system has to check this. To me this does not make sense, but you clearly have much more knowledge here. > > I hope it is clearer now. This function always calculates the union of 2 `Type` instances. It is just that the `TypeInt`s have their subset relationship reversed if `_is_dual` is `true`, which makes it look like we are calculating the `join` but only when both arguments are `TypeInt`s. This comes from the fact that the `meet` of 2 `Type`s is the dual of the join of the 2 duals of the incoming `Type`s. Of course this duality dance is pretty convoluted and I am thinking about getting rid of it and calculating the join like a normal person. Yeah, it sounds like a little bit of technical debt here, that keeps confusing most of us ? Thanks for the explanations! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2072923418 From epeter at openjdk.org Mon May 5 07:14:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 May 2025 07:14:08 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v56] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 08:51:45 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/type.hpp line 293: >> >>> 291: >>> 292: template >>> 293: const TypeClass* try_cast() const; >> >> This is a way to get to the `isa_...` via templated types, right? >> >> I wonder if it might be better to name it `isa_???`, or even just `isa`, so that it is clearer that it is about the `isa` query. >> >> Currently, it is not very clear what it does, until you look at the implementation. That's a bit unfortunate. > > I like the naming `try_cast` better because it aligns with the semantics of `std::dynamic_cast`. `isa` is a bad name. Ok, I don't care enough either way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2072933355 From epeter at openjdk.org Mon May 5 07:14:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 May 2025 07:14:08 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v57] In-Reply-To: References: <-RjffrOs-0kK0Wy2eLL-4FfPQt3Wf98q9y40b229M1A=.25a220cf-a07a-4012-addc-c4897ef43133@github.com> Message-ID: On Fri, 2 May 2025 11:35:33 GMT, Quan Anh Mai wrote: >> Those also have a lot more operations to test... > > The thing is that the other operations are so trivial that it would be counter-productive to test them, it is like testing `add(int x, int y) { return x + y; }` :) The operations I test here are the non-trivial ones, that is sign extension and comparison. I have added some sanity `static_assert` to catch off-by-one errors, though. Well, it could be relatively easy to get a `>>` or equal operator wrong, because of the higher bits. I tend to get these things wrong, and tests save me there. You are not me, so I leave it up to you ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2072939922 From dfenacci at openjdk.org Mon May 5 07:26:45 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 5 May 2025 07:26:45 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 19:00:23 GMT, Aleksey Shipilev wrote: > There is an unfortunate limitation with default tiered policy that we would have at least 2 threads on 1 CPU machine: 1 thread for C1, and 1 thread for C2. > > But if we select C1-only or C2-only modes, we _also_ get 2 compiler threads, for which we have no good reason. These threads would just step on each other toes. The fix changes the behavior for 1..3 CPU hosts in C1/C2-only configurations, by using 1 thread instead of 2 threads. The change for 1 CPU config is what we really need. The change in 2..3 CPU configs is an additional effect, but I think it is still good not to use 100%/66% of the CPUs in those configurations as well. > > > $ for I in `seq 1 8`; do build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:-TieredCompilation -XX:ActiveProcessorCount=${I} \ > -XX:+PrintFlagsFinal 2>&1 | grep "CICompilerCount "; done > > # Before > intx CICompilerCount = 2 > intx CICompilerCount = 2 > intx CICompilerCount = 2 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 4 > > # After > intx CICompilerCount = 1 > intx CICompilerCount = 1 > intx CICompilerCount = 1 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 4 > > > It is a minor bug in `CompilationPolicy::initialize`, but it gets in the way studying Leyden in tight CPU scenarios. > > Additional testing: > - [x] New regression test passes with the fix, fails without it > - [ ] GHA Thanks @shipilev for fixing this (it might be a minor bug but still... inconsistent ?) test/hotspot/jtreg/compiler/arguments/TestCompilerCounts.java line 64: > 62: ProcessBuilder pb = ProcessTools.createLimitedTestJavaProcessBuilder(args); > 63: OutputAnalyzer output = new OutputAnalyzer(pb.start()); > 64: output.shouldHaveExitValue(0); I was wondering if we should check the output as well, e.g. with a test that prints the actual number of compiler threads (like the one in the description, to make it a bit more like a regression test). ------------- PR Review: https://git.openjdk.org/jdk/pull/24972#pullrequestreview-2813980617 PR Review Comment: https://git.openjdk.org/jdk/pull/24972#discussion_r2072949819 From rcastanedalo at openjdk.org Mon May 5 07:53:49 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 5 May 2025 07:53:49 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v11] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 08:57:20 GMT, Emanuel Peter wrote: > And then it seems that the 32x unrolling leads to some interesting use of registers. I think that the issue is that first all loads are done, and we don't have enough regular registers, so we start pushing to `xmm` registers. And later move them back to regular registers. That creates a very long loop, and that is not very efficient ? I don't know the code well but would expect `OptoRegScheduling` to mitigate this issue by producing a more register-pressure aware schedule. Do we know what is preventing that? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2850187055 From epeter at openjdk.org Mon May 5 08:15:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 May 2025 08:15:47 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 06:49:29 GMT, Jatin Bhateja wrote: >> Hi @eme64 , I have addressed and responded to your comments, please verify. > >> @jatin-bhateja Thanks for the updates! I think I now understand everything except this, so we are making good progress ? >> >> ``` >> // For upper bound estimation of result value range with a constant input we >> // pessimistically pick max_int value to prevent incorrect constant folding >> // in case input equals above estimated lower bound. >> hi = src_type->hi_as_long() == lo ? hi : src_type->hi_as_long(); >> hi = result_bit_width < mask_bit_width ? (1L << result_bit_width) - 1 : hi; >> ``` >> >> Can you please explain it with an example, and walk me through the steps to the incorrect constant folding? > > Let's assume the following > - The input was a constant value Integer.MIN_VALUE, hence ideal type TypeInt will have both _lo and _hi set to MIN_VALUE, > - Currently, _lo value of result value range flip b/w 0 or MIN_VALUE, lets take that to be MIN_VALUE in our case. > > Earlier _hi value of the result value range was set to _hi value of the source value range. > > ` hi = mask_max_bw < max_bw ? (1L << mask_max_bw) - 1 : src_type->hi_as_long(); > ` > > If the result bit width was less than the maximum bit width of the integral type, in that case both _hi and _lo values were being set to MIN_VALUE resulting into a constant value. @jatin-bhateja Thanks for explaining the case with `MIN_VALUE`. I suppose the same could happen if `lo = 0` and `src_type->_hi = 0`? I would still like to see a argument/proof of line `hi = src_type->hi_as_long() == lo ? hi : src_type->hi_as_long();` Let's assume `mask_type = [min_int, max_int] = int`. `lo = min_int`, `hi = max_int` before the line in question. Now lets assume `src_type = [min_int+1, -1]`. So now the line in question sets `hi = src_type->hi_as_long() = -1`. And the line below does not change it, because `result_bit_width < mask_bit_width`. So we now have a remaining range `lo = min_int, hi = -1`, i.e. we return a type `[min_int, -1]`. But imagine at runtime we have `mask_type = 0`, then obviously the result is `0`, which is outside the bounds! Here the counter-example: public class Test { public static int test(int src, int mask) { // src_type = [min_int + 1, -1] src = Math.max(Integer.MIN_VALUE + 1, Math.min(src, -1)); int result = Integer.compress(src, mask); // The type is now calculated to be #int:<=-1 // Hence, the test below must always be true. // But at runtime we only pass in mask = 0, so result should be 0. if (result < 0) { throw new RuntimeException("woopsies " + result); } return result; } public static void main(String[] args) { for (int i = 0; i < 10_000; i++) { test(0, 0); } } } Running it with: `./java -Xbatch -XX:CompileCommand=compileonly,Test::test -XX:+PrintIdeal Test.java`: CompileCommand: compileonly Test.test bool compileonly = true AFTER: print_ideal 0 Root === 0 42 [[ 0 1 3 23 24 37 ]] inner 1 Con === 0 [[ ]] #top 3 Start === 3 0 [[ 3 5 6 7 8 9 10 11 ]] #{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address, 5:int, 6:int} 5 Parm === 3 [[ 38 ]] Control !jvms: Test::test @ bci:-1 (line 4) 6 Parm === 3 [[ 38 ]] I_O !jvms: Test::test @ bci:-1 (line 4) 7 Parm === 3 [[ 38 ]] Memory Memory: @BotPTR *+bot, idx=Bot; !jvms: Test::test @ bci:-1 (line 4) 8 Parm === 3 [[ 38 42 ]] FramePtr !jvms: Test::test @ bci:-1 (line 4) 9 Parm === 3 [[ 38 ]] ReturnAdr !jvms: Test::test @ bci:-1 (line 4) 10 Parm === 3 [[ 25 ]] Parm0: int !jvms: Test::test @ bci:-1 (line 4) 11 Parm === 3 [[ 27 ]] Parm1: int !jvms: Test::test @ bci:-1 (line 4) 23 ConI === 0 [[ 26 ]] #int:min+1 24 ConI === 0 [[ 25 ]] #int:-1 25 MinI === _ 10 24 [[ 26 ]] !jvms: Test::test @ bci:4 (line 4) 26 MaxI === _ 23 25 [[ 27 ]] !jvms: Test::test @ bci:7 (line 4) 27 CompressBits === _ 26 11 [[ 38 ]] #int:<=-1:www !jvms: Test::test @ bci:13 (line 5) 37 ConI === 0 [[ 38 ]] #int:22 38 CallStaticJava === 5 6 7 8 9 (37 1 1 27 ) [[ 39 ]] # Static uncommon_trap(reason='unloaded' action='reinterpret' index='22' debug_id='0') void ( int ) C=0.000100 Test::test @ bci:21 (line 10) !jvms: Test::test @ bci:21 (line 10) 39 Proj === 38 [[ 42 ]] #0 !jvms: Test::test @ bci:21 (line 10) 42 Halt === 39 1 1 8 1 [[ 0 ]] !jvms: Test::test @ bci:21 (line 10) Exception in thread "main" java.lang.RuntimeException: woopsies 0 at Test.test(Test.java:10) at Test.main(Test.java:17) You can see the the exception is wrongly thrown, and you can see the wrong type of the `CompressBits`. If I run it in intepreter only, I do not see the exception: `./java -Xbatch -XX:CompileCommand=compileonly,Test::test -XX:+PrintIdeal -Xint Test.java` --------------------------- We got this code wrong before, and now again. How can we gain confidence that it will be correct on the next attempt? My Opinion? I really want to see a solid **proof** of this code. Because it is so easy to get these things wrong. And it seems our tests are also not good enough to catch this. So we obviously **need better tests** too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2850233348 From duke at openjdk.org Mon May 5 08:22:29 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 5 May 2025 08:22:29 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v7] In-Reply-To: References: Message-ID: <6SzsrZTCIZqTp9AHuCSIql0zrO9mZYz74SlWSILuC88=.8f69c499-b570-422b-8c1f-ca62289bcf10@github.com> > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: reorder instructions to make RVV instructions contiguous ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/9ba27686..a64dc26e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=05-06 Stats: 6 lines in 1 file changed: 2 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From mdoerr at openjdk.org Mon May 5 08:46:48 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 5 May 2025 08:46:48 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: <10OuWNyTDvYdxuFhK8yjYIe6Vkm41SgUIHrsddqpqBM=.6c4a4f41-aa12-4db0-89de-dead4563f78f@github.com> On Wed, 23 Apr 2025 06:09:25 GMT, Amit Kumar wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > improved mvc implementation Thanks for checking the gcc generated code! Not sure if mvc usage should be treated as bug. I have no idea why the "atomic" version is used if it doesn't matter how much got written in case of a signal. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2850310481 From dlunden at openjdk.org Mon May 5 09:04:49 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 5 May 2025 09:04:49 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v2] In-Reply-To: <3xXLZZOHl6oejisEzmNv206aQo4y6FuJoWhsOO_GWqM=.682d7701-baa2-4654-8216-e4de526456d1@github.com> References: <3xXLZZOHl6oejisEzmNv206aQo4y6FuJoWhsOO_GWqM=.682d7701-baa2-4654-8216-e4de526456d1@github.com> Message-ID: On Wed, 30 Apr 2025 10:09:53 GMT, Daniel Lund?n wrote: >> test/hotspot/jtreg/compiler/loopopts/TestSplitIfPinnedLoadInStripMinedLoop.java line 141: >> >>> 139: >>> 140: // Same as test2 but with reference to inner loop induction variable 'j' and different order of instructions. >>> 141: // Triggers an assert in PhaseCFG::raise_above_anti_dependences if loop strip mining verification is disabled: >> >> Is this test still valid? According to the comment it should trigger an assert but this assert appears to be removed? Is the test correct if the test is passing even though the assert has been removed? See my above comment on the removal of this assertion. > > Ah, good catch. Let me try to verify that my new asserts also trigger if I revert the fix for [JDK-8260420](https://bugs.openjdk.org/browse/JDK-8260420). Unfortunately, it was not straightforward to revert the fix for [JDK-8260420](https://bugs.openjdk.org/browse/JDK-8260420) (too many changes since then). If `!LCA_orig->dominates(pred_block) || early->dominates(pred_block)` failed at some point, then the new assert `early->dominates(LCA_orig)` must also fail in that situation (in theory). See the details in my other response above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2073093682 From amitkumar at openjdk.org Mon May 5 09:11:50 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 5 May 2025 09:11:50 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: <10OuWNyTDvYdxuFhK8yjYIe6Vkm41SgUIHrsddqpqBM=.6c4a4f41-aa12-4db0-89de-dead4563f78f@github.com> References: <10OuWNyTDvYdxuFhK8yjYIe6Vkm41SgUIHrsddqpqBM=.6c4a4f41-aa12-4db0-89de-dead4563f78f@github.com> Message-ID: On Mon, 5 May 2025 08:43:39 GMT, Martin Doerr wrote: > Thanks for checking the gcc generated code! Not sure if mvc usage should be treated as bug. I have no idea why the "atomic" version is used if it doesn't matter how much got written in case of a signal. Any idea who can help us with that information then ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2850370301 From duke at openjdk.org Mon May 5 09:23:05 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Mon, 5 May 2025 09:23:05 GMT Subject: RFR: 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list [v3] In-Reply-To: References: Message-ID: > Issue: The assertion failure , `assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list`, occurs when [loop striping mining ](https://bugs.openjdk.org/browse/JDK-8186027)may create a [MaxL](https://bugs.openjdk.org/browse/JDK-8324655) after macro expansion. > > Analysis : Before the macro nodes are expanded in` expand_macro_nodes`, there is a process where nodes from the macro list are eliminated. This also includes elimination of any `OuterStripMinedLoop` node in the macro list. The bug occurs due to the refining of the strip mined loop in `adjust_strip_mined_loop` function just before it is eliminated. In this case, a` MaxL` node is added to the macro list in `adjust_strip_mined_loop`. > > Fix: The fix involves performing the refining of the strip mined loop before elimination process. More specifically, moving the `adjust_strip_mined_loop` function outside the elimination loop. > > Improvement: The process of eliminating macro nodes by calling `eliminate_macro_nodes` and performing additional Opaque and LoopLimit nodes elimination in ` expand_macro_nodes` is unintuitive as suggested in [JDK-8325478 ](https://bugs.openjdk.org/browse/JDK-8325478) and the current fix should be moved along with the other elimination code. Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: moving the fix to a separate method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24890/files - new: https://git.openjdk.org/jdk/pull/24890/files/8d045cb1..a45e1340 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24890&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24890&range=01-02 Stats: 18 lines in 2 files changed: 11 ins; 5 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24890/head:pull/24890 PR: https://git.openjdk.org/jdk/pull/24890 From shade at openjdk.org Mon May 5 09:24:04 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 May 2025 09:24:04 GMT Subject: RFR: 8356153: Shenandoah stubs are missing in AOT Code Cache addresses table Message-ID: <8_Giy7duTflvM90PUBe2z8A01tc2kSe23RO1rEq-JHc=.875b1c68-31da-4688-a55c-0d2db8205113@github.com> See the bug for reproducer. We actually have similar Shenandoah hunks down in Leyden repository, but we have apparently missed them when upstreaming [JDK-8350209](https://bugs.openjdk.org/browse/JDK-8350209). We only need a small subset of stubs that adapters use: pre-barriers and phantom load-barriers. This matches what we do for G1 and Z as well. Additional testing: - [x] Linux x86_64 server fastdebug, `runtime/cds` with `-XX:+UseShenandoahGC` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/25028/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25028&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356153 Stats: 10 lines in 1 file changed: 8 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25028.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25028/head:pull/25028 PR: https://git.openjdk.org/jdk/pull/25028 From fyang at openjdk.org Mon May 5 09:31:45 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 5 May 2025 09:31:45 GMT Subject: RFR: 8356030: RISC-V: enable (part of) BasicDoubleOpTest.java In-Reply-To: <3UkoITinG0CBPVt9q5O8vpnHKh154itJ4STteFDM1cc=.b5da8c9f-2ca8-4d4a-91b6-70ae0a949a94@github.com> References: <3UkoITinG0CBPVt9q5O8vpnHKh154itJ4STteFDM1cc=.b5da8c9f-2ca8-4d4a-91b6-70ae0a949a94@github.com> Message-ID: On Thu, 1 May 2025 11:31:50 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > Originally, I was going to enable all test cases on riscv in this test file. But seems there was already a try to implement RoundDoubleModeV (which is IRNode.ROUND_DOUBLE_MODE_V) in https://github.com/openjdk/jdk/pull/21164, but failed because of some performance regression. > So I'll just enable part of test cases in this pr. > > Thanks! LGTM. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24983#pullrequestreview-2814276408 From shade at openjdk.org Mon May 5 09:49:50 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 May 2025 09:49:50 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 07:23:39 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move to oops Looking for more Reviewers, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2850471947 From shade at openjdk.org Mon May 5 09:54:20 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 May 2025 09:54:20 GMT Subject: RFR: 8356122: Client build fails after JDK-8350209 Message-ID: See bug for samples of build failures. I reproduced and fixed both with this PR. Additional testing: - [x] Linux x86_64 client release build - [x] Linux x86_64 client fastdebug build ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/25030/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25030&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356122 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25030.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25030/head:pull/25030 PR: https://git.openjdk.org/jdk/pull/25030 From duke at openjdk.org Mon May 5 10:17:27 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 5 May 2025 10:17:27 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v8] In-Reply-To: References: Message-ID: <5e1o1xtN0ZdQZGJi2aVmgCEApW625koeE9F53VhDi5E=.2390045d-844e-4800-8d4b-075a2a3a8793@github.com> > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: change slli+add sequence to shadd ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/a64dc26e..4e9ad18f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=06-07 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From jbhateja at openjdk.org Mon May 5 10:21:48 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 5 May 2025 10:21:48 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 06:49:29 GMT, Jatin Bhateja wrote: >> Hi @eme64 , I have addressed and responded to your comments, please verify. > >> @jatin-bhateja Thanks for the updates! I think I now understand everything except this, so we are making good progress ? >> >> ``` >> // For upper bound estimation of result value range with a constant input we >> // pessimistically pick max_int value to prevent incorrect constant folding >> // in case input equals above estimated lower bound. >> hi = src_type->hi_as_long() == lo ? hi : src_type->hi_as_long(); >> hi = result_bit_width < mask_bit_width ? (1L << result_bit_width) - 1 : hi; >> ``` >> >> Can you please explain it with an example, and walk me through the steps to the incorrect constant folding? > > Let's assume the following > - The input was a constant value Integer.MIN_VALUE, hence ideal type TypeInt will have both _lo and _hi set to MIN_VALUE, > - Currently, _lo value of result value range flip b/w 0 or MIN_VALUE, lets take that to be MIN_VALUE in our case. > > Earlier _hi value of the result value range was set to _hi value of the source value range. > > ` hi = mask_max_bw < max_bw ? (1L << mask_max_bw) - 1 : src_type->hi_as_long(); > ` > > If the result bit width was less than the maximum bit width of the integral type, in that case both _hi and _lo values were being set to MIN_VALUE resulting into a constant value. > @jatin-bhateja Thanks for explaining the case with `MIN_VALUE`. I suppose the same could happen if `lo = 0` and `src_type->_hi = 0`? > This patch handles constant folding of ZERO value upfront during ::Value transforms, we only land here to constrain the value range. **Glad you are picking up value range-based optimization**, these are tricky ones, and **that is why we need a more robust infrastructure like KnowBits**, which makes the job easy. Let me tune this check and update the test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2850543682 From fyang at openjdk.org Mon May 5 10:24:45 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 5 May 2025 10:24:45 GMT Subject: RFR: 8355699: RISC-V: support SUADD/SADD/SUSUB/SSUB In-Reply-To: References: Message-ID: On Fri, 2 May 2025 12:19:53 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch to add SUADD/SADD/SUSUB/SSUB for vector api? > > Thanks! > > ## Test > > running in progress ... Seems OK. I only have one minor comment. src/hotspot/cpu/riscv/riscv_v.ad line 696: > 694: match(Set dst_src (SaturatingAddV (Binary dst_src src1) v0)); > 695: ins_cost(VEC_COST); > 696: format %{ "vsadd_masked $dst_src, $dst_src, $src1" %} Nit: Seems the mask register (`v0`) is missing in opto asm for these masked operations. For integrity, we always print the mask register as the last operand for other masked nodes. `format %{ "vsadd_masked $dst_src, $dst_src, $src1, $v0" %}` ------------- PR Review: https://git.openjdk.org/jdk/pull/25005#pullrequestreview-2814374168 PR Review Comment: https://git.openjdk.org/jdk/pull/25005#discussion_r2073190452 From epeter at openjdk.org Mon May 5 10:42:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 May 2025 10:42:48 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 06:49:29 GMT, Jatin Bhateja wrote: >> Hi @eme64 , I have addressed and responded to your comments, please verify. > >> @jatin-bhateja Thanks for the updates! I think I now understand everything except this, so we are making good progress ? >> >> ``` >> // For upper bound estimation of result value range with a constant input we >> // pessimistically pick max_int value to prevent incorrect constant folding >> // in case input equals above estimated lower bound. >> hi = src_type->hi_as_long() == lo ? hi : src_type->hi_as_long(); >> hi = result_bit_width < mask_bit_width ? (1L << result_bit_width) - 1 : hi; >> ``` >> >> Can you please explain it with an example, and walk me through the steps to the incorrect constant folding? > > Let's assume the following > - The input was a constant value Integer.MIN_VALUE, hence ideal type TypeInt will have both _lo and _hi set to MIN_VALUE, > - Currently, _lo value of result value range flip b/w 0 or MIN_VALUE, lets take that to be MIN_VALUE in our case. > > Earlier _hi value of the result value range was set to _hi value of the source value range. > > ` hi = mask_max_bw < max_bw ? (1L << mask_max_bw) - 1 : src_type->hi_as_long(); > ` > > If the result bit width was less than the maximum bit width of the integral type, in that case both _hi and _lo values were being set to MIN_VALUE resulting into a constant value. @jatin-bhateja Well, `KnownBits` will only make testing more difficult, it does not remove challenges, rather increases the challenges. At that point we do not only have to test constants, and ranges, but also all sorts of bit patterns. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2850587647 From epeter at openjdk.org Mon May 5 10:42:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 May 2025 10:42:50 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v5] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 08:36:50 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. >> >> Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. >> >> New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resoultions > Let me tune this check and update the test. For me to approve this code, you will have to do more than that. I will need: - Proof of the implemented logic. - More tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2850590832 From qamai at openjdk.org Mon May 5 10:46:48 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 5 May 2025 10:46:48 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 06:49:29 GMT, Jatin Bhateja wrote: >> Hi @eme64 , I have addressed and responded to your comments, please verify. > >> @jatin-bhateja Thanks for the updates! I think I now understand everything except this, so we are making good progress ? >> >> ``` >> // For upper bound estimation of result value range with a constant input we >> // pessimistically pick max_int value to prevent incorrect constant folding >> // in case input equals above estimated lower bound. >> hi = src_type->hi_as_long() == lo ? hi : src_type->hi_as_long(); >> hi = result_bit_width < mask_bit_width ? (1L << result_bit_width) - 1 : hi; >> ``` >> >> Can you please explain it with an example, and walk me through the steps to the incorrect constant folding? > > Let's assume the following > - The input was a constant value Integer.MIN_VALUE, hence ideal type TypeInt will have both _lo and _hi set to MIN_VALUE, > - Currently, _lo value of result value range flip b/w 0 or MIN_VALUE, lets take that to be MIN_VALUE in our case. > > Earlier _hi value of the result value range was set to _hi value of the source value range. > > ` hi = mask_max_bw < max_bw ? (1L << mask_max_bw) - 1 : src_type->hi_as_long(); > ` > > If the result bit width was less than the maximum bit width of the integral type, in that case both _hi and _lo values were being set to MIN_VALUE resulting into a constant value. @jatin-bhateja This operation is non-trivial, I expect the level of coverage to be on par with #23089. If you want to have a quick fix, I suggest removing all the logic and simply returning the bottom type. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2850599543 From jbhateja at openjdk.org Mon May 5 11:13:46 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 5 May 2025 11:13:46 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: References: Message-ID: <1DhsnJRTaLtm043iNvSB4qqg3HtGlt1-4HWptfg4Kv8=.38169f8a-cfd7-4995-a92b-b35e58d4f62b@github.com> On Mon, 5 May 2025 06:49:29 GMT, Jatin Bhateja wrote: >> Hi @eme64 , I have addressed and responded to your comments, please verify. > >> @jatin-bhateja Thanks for the updates! I think I now understand everything except this, so we are making good progress ? >> >> ``` >> // For upper bound estimation of result value range with a constant input we >> // pessimistically pick max_int value to prevent incorrect constant folding >> // in case input equals above estimated lower bound. >> hi = src_type->hi_as_long() == lo ? hi : src_type->hi_as_long(); >> hi = result_bit_width < mask_bit_width ? (1L << result_bit_width) - 1 : hi; >> ``` >> >> Can you please explain it with an example, and walk me through the steps to the incorrect constant folding? > > Let's assume the following > - The input was a constant value Integer.MIN_VALUE, hence ideal type TypeInt will have both _lo and _hi set to MIN_VALUE, > - Currently, _lo value of result value range flip b/w 0 or MIN_VALUE, lets take that to be MIN_VALUE in our case. > > Earlier _hi value of the result value range was set to _hi value of the source value range. > > ` hi = mask_max_bw < max_bw ? (1L << mask_max_bw) - 1 : src_type->hi_as_long(); > ` > > If the result bit width was less than the maximum bit width of the integral type, in that case both _hi and _lo values were being set to MIN_VALUE resulting into a constant value. > @jatin-bhateja Well, `KnownBits` will only make testing more difficult, it does not remove challenges, rather increases the challenges. At that point we do not only have to test constants, and ranges, but also all sorts of bit patterns. @eme64 , glad you are picking that up. I don't want to comment on KnownBits on this PR I will add my review suggestions on #17508, Need a lil time to refresh memory, but we are excited to contribute to it. I think for this bug fix, its better to be safe for now, let me update the revision and test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2850658201 From thartmann at openjdk.org Mon May 5 11:35:49 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 5 May 2025 11:35:49 GMT Subject: RFR: 8258229: Crash in nmethod::reloc_string_for [v2] In-Reply-To: References: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> Message-ID: On Tue, 29 Apr 2025 12:56:07 GMT, Manuel H?ssig wrote: >> ## Issue Summary >> >> The issue manifests in intermittent failures of test cases with `-XX:+PrintAssembly`. The reason for these intermittent failures is a deoptimization of the method before or during printing its assembly. In case that deoptimization makes the method not entrant, then the entry of that method is patched, but the relocation information is not updated. If the instruction at the method entry before patching had relocation info that prints a comment during assembly printing, printing that comment for the patched entry fails in case the operands of the original and patched instructions do not match. >> >> ## Change Summary >> >> To fix this issue, this PR updates the relocation info when patching the method entry. To avoid any races between printing and deoptimizing, this PR acquires the`NMethodState_lock`for printing an `nmethod`. >> >> All changes of this PR summarized: >> - add a regression test, >> - update the relocation information after patching the method entry for making it not entrant, >> - acquire the `NMethodStat_lock` in `print_nmethod()` to avoid changing the relocation information during printing. >> >> ## Testing >> >> I ran tiers 1 through 3 and Oracle internal testing. > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into jdk-8258229-nmethod > - Add DeoptimizeALot and fix typo in test > - Hold NMethodState_lock while printing an nmethod > > This prevents data races on the relocation info when code is patched. > - Update relocation info when making method not entrant > - Add regression test Great job narrowing this down! The fix looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24831#pullrequestreview-2814520554 From rcastanedalo at openjdk.org Mon May 5 11:37:04 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 5 May 2025 11:37:04 GMT Subject: RFR: 8354520: IGV: dump contextual information [v6] In-Reply-To: References: Message-ID: > This changeset extends the IGV graph dumps with additional properties that ease tracing the dumps back to the context in which they were produced. The changeset dumps, for every compilation, the following additional properties: > > - JVM arguments > - platform information > - JVM version information > - date and time > - process ID > - (compiler) thread ID > > ![compilation-properties](https://github.com/user-attachments/assets/8ddc8fb9-c348-4761-8e19-c70633a1b59f) > > Additionally, the changeset produces and dumps the C2 stack trace from which each graph is dumped: > > ![c2-stack-trace](https://github.com/user-attachments/assets/085547ee-b0b3-4a38-86f1-9df79cf1cc01) > > This should be particularly useful in an interactive context, where the user steps through C2 code using a debugger and dumps graphs at different points. To produce a stack trace in this context, the usual debugger-entry C2 functions (`igv_print`, `igv_append`, `Node::dump_bfs`, ...) are extended with extra arguments to specify the stack handling registers (stack pointer, frame pointer, and program counter): > > ![c2-stack-trace-from-gdb](https://github.com/user-attachments/assets/29de2964-ee2d-4f5f-bcf7-d81e1bc6c8a6) > > The inconvenience of manually specifying the stack handling registers can be addressed by hiding them in debugger user-defined commands, e.g.: > > > define igv > p igv_print(true, $sp, $fp, $pc) > end > > define igv_node > p find_node($arg0)->dump_bfs(0, 0, "!", $sp, $fp, $pc) > end > > > Thanks to @TobiHartmann for providing useful feedback! > > #### Testing > > - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). > - Tested interactive usage manually via `gdb` and `rr` on linux-x64. > - Tested automatically that dumping thousands of graphs does not trigger any assertion failure. Roberto Casta?eda Lozano has updated the pull request incrementally with seven additional commits since the last revision: - Dump CPU features - Make frame pointer parameters const whenever possible - Pass pointer to initial frame to print_stack - Refactor loop - Inline _current into its only use - Improve naming and commenting of stack-walking predicates - Extend comments with debugger usage examples ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24724/files - new: https://git.openjdk.org/jdk/pull/24724/files/dd1ad6ad..cb781a95 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24724&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24724&range=04-05 Stats: 46 lines in 6 files changed: 16 ins; 3 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/24724.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24724/head:pull/24724 PR: https://git.openjdk.org/jdk/pull/24724 From qamai at openjdk.org Mon May 5 11:51:50 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 5 May 2025 11:51:50 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v63] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Emanuel's reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/56ffe4f2..6be30c51 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=62 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=61-62 Stats: 39 lines in 2 files changed: 10 ins; 5 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Mon May 5 11:51:52 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 5 May 2025 11:51:52 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v62] In-Reply-To: References: Message-ID: <1pXrKl_x2o6pUZTRPuQYybroDad0b5k3SslhVE0Rhl8=.5eb36e99-2018-4686-8085-fd4b425327dc@github.com> On Mon, 5 May 2025 06:27:14 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> alignment wording > > src/hotspot/share/opto/rangeinference.cpp line 366: > >> 364: // We start with all bits where lo[x] == zeros[x] == 0: >> 365: // 0 1 1 0 0 0 0 1 >> 366: U either = lo | bits._zeros; > > I can let this one slide, but the code gives you exactly the negation of what the text describes. The reader might be confused about this, and have to figure out that the bits are all inverted. Not horrible, but not hard to fix either. Up to you. Nice catch, I changed it to `neither` and do the `not` here. > src/hotspot/share/opto/rangeinference.cpp line 370: > >> 368: // lo[x] == zeros[x] == 0. The last one of these bits must be at index i. >> 369: // 0 1 1 0 0 0 0 0 >> 370: U tmp = ~either & find_mask; > > To me a variable name `tmp` smells a little. I prefer expressive names. Up to you :) I find a name `neither_upto_first_violation` which seems more expressive, I'm not sure if it may raise any confusion. > src/hotspot/share/opto/rangeinference.cpp line 379: > >> 377: // In our example, i == 2 >> 378: // 0 0 1 0 0 0 0 0 >> 379: U alignment = tmp & (-tmp); > > This line is still magic. Most compiler devs I know do not see this as "standard" math. Could be nice to at least refer to something one could find online on this. > > I did sleep over it and had a proof in mind: > What do we know about `-tmp`? It cannot have any bits after the last bit set in `bit`, otherwise those bits would not zero out in `tmp + -tmp`. `-tmp` must have the same last bit set as `tmp`, otherwise it would not cancle out. The addition of those bits create a carry bit, that must be cancled out all the way up. This means that the bits before that last bit must be set either exactly in `tmp` or in `-tmp`, but certainly not in both, otherwise the carry bit would not be cancled away. Hence, only that last bit remains in `tmp & (-tmp)`. I refer to the x86 `blsi` instruction, which does exactly this. The doc also says that it does `(-SRC) bitwiseAND (SRC)`. https://www.felixcloutier.com/x86/blsi > src/hotspot/share/opto/rangeinference.cpp line 480: > >> 478: if (h < ~bounds._hi) { >> 479: return AdjustResult>::make_empty(); >> 480: } > > Another nit: I feel like the "overflow" case here would not have to spill outside of `adjust_lo`. > And `Optional` style return value would make more sense for the reader at this point, then the reader does not have to worry about why we do a comparison here, and does not have to dive deeper into `adjust_lo`. > > I leave this up to you though. That's a good point, however I think I will do this later as we don't have an `Optional` in Hotspot yet. > src/hotspot/share/utilities/intn_t.hpp line 44: > >> 42: // Implementation-wise, this class currently only supports 0 < nbits <= 8. Also >> 43: // note that this class is implemented so that overflows in alrithmetic >> 44: // operations are well-defined and wrap-around. > > Suggestion: > > // operations are well-defined and wrap-around, just like jint, juint, jlong and julong. Overflow in `jint` and `jlong` is actually UB. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2073291807 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2073292416 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2073293867 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2073294529 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2073296022 From rcastanedalo at openjdk.org Mon May 5 11:53:11 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 5 May 2025 11:53:11 GMT Subject: RFR: 8354520: IGV: dump contextual information [v5] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 14:14:53 GMT, Emanuel Peter wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Document workaround for lldb issue > > src/hotspot/share/opto/compile.cpp line 5207: > >> 5205: >> 5206: // Called from debugger. Prints method to the default file with the default phase name. >> 5207: // This works regardless of any Ideal Graph Visualizer flags set or not. > > Suggestion: > > // This works regardless of any Ideal Graph Visualizer flags set or not. > // Use in debugger (gdb / rr): p igv_print($sp, $fp, $pc) Done (commit 6b8e659). > src/hotspot/share/opto/compile.cpp line 5222: > >> 5220: // the network flags for the Ideal Graph Visualizer, or to the default file depending on the 'network' argument. >> 5221: // This works regardless of any Ideal Graph Visualizer flags set or not. >> 5222: void igv_print(bool network, void* sp, void* fp, void* pc) { > > Suggestion: > > // Use in debugger (gdb / rr): p igv_print(true, $sp, $fp, $pc) > void igv_print(bool network, void* sp, void* fp, void* pc) { Done (commit 6b8e659). > src/hotspot/share/opto/compile.cpp line 5231: > >> 5229: } >> 5230: >> 5231: // Same as igv_print(bool network) above but with a specified phase name. > > Suggestion: > > // Same as igv_print(bool network, void* sp, void* fp, void* pc) above but with a specified phase name. > // Use in debugger (gdb / rr): p igv_print(true, "MyPhase", $sp, $fp, $pc) Done (commit 6b8e659). > src/hotspot/share/opto/compile.cpp line 5248: > >> 5246: // Called from debugger, especially when replaying a trace in which the program state cannot be altered like with rr replay. >> 5247: // A method is appended to an existing default file with the default phase name. This means that igv_append() must follow >> 5248: // an earlier igv_print(*) call which sets up the file. This works regardless of any Ideal Graph Visualizer flags set or not. > > Suggestion: > > // an earlier igv_print(*) call which sets up the file. This works regardless of any Ideal Graph Visualizer flags set or not. > // Use in debugger (gdb / rr): p igv_append($sp, $fp, $pc) Done (commit 6b8e659). > src/hotspot/share/opto/compile.cpp line 5254: > >> 5252: } >> 5253: >> 5254: // Same as igv_append() above but with a specified phase name. > > Suggestion: > > // Same as igv_append(void* sp, void* fp, void* pc) above but with a specified phase name. > // Use in debugger (gdb / rr): p igv_append("MyPhase", $sp, $fp, $pc) Done (commit 6b8e659). > src/hotspot/share/opto/idealGraphPrinter.cpp line 380: > >> 378: print_prop(COMPILATION_PROCESS_ID_PROPERTY, os::current_process_id()); >> 379: print_prop(COMPILATION_THREAD_ID_PROPERTY, os::current_thread_id()); >> 380: > > What about CPU features? Could be nice to know if we have `avx2` or `asimd`, etc. Done (commit cb781a95). > src/hotspot/share/opto/idealGraphPrinter.cpp line 907: > >> 905: } >> 906: >> 907: static bool skip_frame(const char* name) { > > Nit: the name suggests that this is "skipping a frame". But you are asking if we "should skip the frame". So I would recommend a name like `is_skip_frame`, `must_skip_frame`, `should_skip_frame` or alike. > > You could even invert the condition, and make the condition positive: `can_print_stack_frame`. > > Totally optional, up to you :) Thanks, I went with `should_skip_frame` (commit 1210180b). > src/hotspot/share/opto/idealGraphPrinter.cpp line 917: > >> 915: static bool stop_frame_walk(const char* name) { >> 916: return strstr(name, "C2Compiler::compile_method") != nullptr; >> 917: } > > Nit: You could write `must_stop_frame_walk`. Btw, it could be nice to have a comment to explain why this is the condition to stop on. Maybe that comment could then turn into an even better method name? Thanks, I went with `should_end_stack_walk`, and added a comment as requested (commit 1210180b). > src/hotspot/share/opto/idealGraphPrinter.cpp line 921: > >> 919: void IdealGraphPrinter::print_stack(frame fr, outputStream* graph_name) { >> 920: char buf[O_BUFLEN]; >> 921: Thread* _current = Thread::current_or_null(); > > Is this a local variable? If so, I would drop the underscore. It generally suggests that it is a field, right? > It would also not hurt to call it `current_thread` for clarity. Thanks, the underscore was just a leftover from copying some code from `NativeStackPrinter::print_stack_from_frame()`. I just inlined `Thread::current_or_null()` into its only use, which I think is even clearer (commit `13604b75`). > src/hotspot/share/opto/idealGraphPrinter.cpp line 924: > >> 922: int count = 0; >> 923: int frame = 0; >> 924: while (count++ < StackPrintLimit && fr.pc() != nullptr) { > > Could this be formulated as a `for` loop? > Suggestion: > > for (int count = 0; count < StackPrintLimit && fr.pc() != nullptr; count++) { > > Just an idea, totally optional. Done, thanks (commit 9cd331d2). > src/hotspot/share/opto/idealGraphPrinter.hpp line 122: > >> 120: // graph_name == nullptr) or the graph name based on the highest C2 frame (if >> 121: // graph_name != nullptr). >> 122: void print_stack(frame fr, outputStream* graph_name); > > Are you passing this `frame` by value on purpose? I refactored this code in commit 6fa328e0 to pass the initial frame by reference. > src/hotspot/share/opto/node.cpp line 1799: > >> 1797: const char* _options; >> 1798: outputStream* _output; >> 1799: frame* _frame; > > Could this be `const`, like the other pointers? Done (commit 3c3108b8), thanks. > src/hotspot/share/opto/node.cpp line 2428: > >> 2426: } >> 2427: >> 2428: // Call this from debugger, with stack handling register arguments for IGV dumps. > > Suggestion: > > // Call this from debugger, with stack handling register arguments for IGV dumps. > // Example: p find_node(741)->dump_bfs(7, find_node(741), "c+A!", $sp, $fp, $pc) Done (commit 6b8e659). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2073286020 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2073286083 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2073286175 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2073286248 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2073286345 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2073287440 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2073288991 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2073290430 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2073293032 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2073293540 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2073296132 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2073297968 PR Review Comment: https://git.openjdk.org/jdk/pull/24724#discussion_r2073286585 From rcastanedalo at openjdk.org Mon May 5 11:52:55 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 5 May 2025 11:52:55 GMT Subject: RFR: 8354520: IGV: dump contextual information [v5] In-Reply-To: <5V9ebfuzDLssnpTsSlCsGIGKs71Ic_YZV9dO2F7J21c=.cb97f630-8784-48e0-94f1-de7c0262daea@github.com> References: <5V9ebfuzDLssnpTsSlCsGIGKs71Ic_YZV9dO2F7J21c=.cb97f630-8784-48e0-94f1-de7c0262daea@github.com> Message-ID: On Wed, 30 Apr 2025 14:30:31 GMT, Roberto Casta?eda Lozano wrote: >> Commit dd1ad6a documents how to dump the C2 stack trace when using `lldb` (default debugger on macOS platforms). Thanks @dafedafe for reporting and helping out! > >> Thanks a lot for adding this feature @robcasloz. > > Thanks for reviewing, Damon! > Nice work @robcasloz ! > > I just left a few suggestions below, but I think they are basically all nits / optional :) Thanks Emanuel, I addressed your comments and suggestions, including dumping of CPU features (as seen by C2, i.e. `Abstract_VM_Version::_features_string`). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24724#issuecomment-2850719472 From chagedorn at openjdk.org Mon May 5 12:00:45 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 May 2025 12:00:45 GMT Subject: RFR: 8356122: Client build fails after JDK-8350209 In-Reply-To: References: Message-ID: On Mon, 5 May 2025 09:49:41 GMT, Aleksey Shipilev wrote: > See bug for samples of build failures. I reproduced and fixed both with this PR. > > Additional testing: > - [x] Linux x86_64 client release build > - [x] Linux x86_64 client fastdebug build Looks good and trivial! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25030#pullrequestreview-2814571379 From chagedorn at openjdk.org Mon May 5 12:07:47 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 May 2025 12:07:47 GMT Subject: RFR: 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list [v3] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 09:23:05 GMT, Saranya Natarajan wrote: >> Issue: The assertion failure , `assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list`, occurs when [loop striping mining ](https://bugs.openjdk.org/browse/JDK-8186027)may create a [MaxL](https://bugs.openjdk.org/browse/JDK-8324655) after macro expansion. >> >> Analysis : Before the macro nodes are expanded in` expand_macro_nodes`, there is a process where nodes from the macro list are eliminated. This also includes elimination of any `OuterStripMinedLoop` node in the macro list. The bug occurs due to the refining of the strip mined loop in `adjust_strip_mined_loop` function just before it is eliminated. In this case, a` MaxL` node is added to the macro list in `adjust_strip_mined_loop`. >> >> Fix: The fix involves performing the refining of the strip mined loop before elimination process. More specifically, moving the `adjust_strip_mined_loop` function outside the elimination loop. >> >> Improvement: The process of eliminating macro nodes by calling `eliminate_macro_nodes` and performing additional Opaque and LoopLimit nodes elimination in ` expand_macro_nodes` is unintuitive as suggested in [JDK-8325478 ](https://bugs.openjdk.org/browse/JDK-8325478) and the current fix should be moved along with the other elimination code. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > moving the fix to a separate method Otherwise, looks good, thanks for the update! src/hotspot/share/opto/macro.cpp line 2354: > 2352: > 2353: void PhaseMacroExpand::refine_strip_mined_loop_macro_node() { > 2354: // Perform refining of strip mined loop node in the macro nodes list. Should probably added as method comment: Suggestion: // Perform refining of strip mined loop node in the macro nodes list. void PhaseMacroExpand::refine_strip_mined_loop_macro_node() { src/hotspot/share/opto/macro.hpp line 205: > 203: _igvn.set_delay_transform(true); > 204: } > 205: void refine_strip_mined_loop_macro_node(); Suggestion: void refine_strip_mined_loop_macro_node(); ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24890#pullrequestreview-2814584871 PR Review Comment: https://git.openjdk.org/jdk/pull/24890#discussion_r2073315860 PR Review Comment: https://git.openjdk.org/jdk/pull/24890#discussion_r2073316455 From aboldtch at openjdk.org Mon May 5 12:18:45 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 5 May 2025 12:18:45 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 02:29:34 GMT, Quan Anh Mai wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > What I meant is that we should map a relocation to BOTH the instruction start and the patch site. APX has not even released yet so I think it is more efficient to make a better fix than to make a quicker one. I think @merykitty solution with two different relocations based on wether we support APX or not. And only emit the after and nop when `VM_Version::supports_apx_f()` is true. On the other hand maybe we can solve this with a minimal change by simply looking for the REX2 prefix when we patch the code. Something along the line of: diff --git a/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp b/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp index 9cdf0b229c0..4a956b450bd 100644 --- a/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp +++ b/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp @@ -1328,7 +1328,13 @@ void ZBarrierSetAssembler::patch_barrier_relocation(address addr, int format) { const uint16_t value = patch_barrier_relocation_value(format); uint8_t* const patch_addr = (uint8_t*)addr + offset; if (format == ZBarrierRelocationFormatLoadGoodBeforeShl) { - *patch_addr = (uint8_t)value; + if (VM_Version::supports_apx_f()) { + NativeInstruction* instruction = nativeInstruction_at(addr); + uint8_t* const rex2_patch_addr = patch_addr + (instruction->has_rex2_prefix() ? 1 : 0); + *rex2_patch_addr = (uint8_t)value; + } else { + *patch_addr = (uint8_t)value; + } } else { *(uint16_t*)patch_addr = value; } As for the solution to have the relocation point at the entry. While they were not designed to be used this way, It looks like it works. (At least from a barrier patching point of view, as we only want to iterate over all relocations, never map a PC to an relocation). But changing invariants are scary. And is probably better to evaluate as a part of the [JDK-8355341](https://bugs.openjdk.org/browse/JDK-8355341) RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2850807205 From duke at openjdk.org Mon May 5 12:22:46 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 5 May 2025 12:22:46 GMT Subject: RFR: 8356030: RISC-V: enable (part of) BasicDoubleOpTest.java In-Reply-To: <3UkoITinG0CBPVt9q5O8vpnHKh154itJ4STteFDM1cc=.b5da8c9f-2ca8-4d4a-91b6-70ae0a949a94@github.com> References: <3UkoITinG0CBPVt9q5O8vpnHKh154itJ4STteFDM1cc=.b5da8c9f-2ca8-4d4a-91b6-70ae0a949a94@github.com> Message-ID: On Thu, 1 May 2025 11:31:50 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > Originally, I was going to enable all test cases on riscv in this test file. But seems there was already a try to implement RoundDoubleModeV (which is IRNode.ROUND_DOUBLE_MODE_V) in https://github.com/openjdk/jdk/pull/21164, but failed because of some performance regression. > So I'll just enable part of test cases in this pr. > > Thanks! Looks good to me as well. I also ran some testing that passed fine. ------------- Marked as reviewed by mhaessig at github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/jdk/pull/24983#pullrequestreview-2814621074 From jbhateja at openjdk.org Mon May 5 12:31:30 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 5 May 2025 12:31:30 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v6] In-Reply-To: References: Message-ID: <7bjbWKwHAmQtdTs2rC5pRdFF7tje9hLrPx3Rx4wfIHU=.3e881f87-3258-4e1c-93ea-9a94a2913a02@github.com> > Hi All, > > This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. > > Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. > > New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Fine tuning the hi bound computation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23947/files - new: https://git.openjdk.org/jdk/pull/23947/files/18b5c239..73f9749a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=04-05 Stats: 7 lines in 1 file changed: 2 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23947/head:pull/23947 PR: https://git.openjdk.org/jdk/pull/23947 From coleenp at openjdk.org Mon May 5 12:32:24 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 5 May 2025 12:32:24 GMT Subject: RFR: 8356172: IdealGraphPrinter doesn't need ThreadCritical Message-ID: Please review this possibly trivial change. Tested with hs-precheckin-comp test list. ------------- Commit messages: - 8356172: IdealGraphPrinter doesn't need ThreadCritical Changes: https://git.openjdk.org/jdk/pull/25035/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25035&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356172 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25035/head:pull/25035 PR: https://git.openjdk.org/jdk/pull/25035 From kxu at openjdk.org Mon May 5 12:38:51 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 5 May 2025 12:38:51 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 16:42:41 GMT, Emanuel Peter wrote: >> Hello @eme64. I pinged you in [an in-line review](https://github.com/openjdk/jdk/pull/23506#discussion_r2042974649). Could you please provide some commons on this assertion? This is currently blocking my progress and breaking the build. Thank you very much! > > @tabjy Thanks for your patience, this one took me longer than I wanted. I responded like this above: > >> Hmm, ok I see. Why don't you remove the asserts for now, and we see how clear the code looks now. I think I asked for the consistency check because I was confused by the previous code structure. Maybe it is ok now as it is. Sorry I forgot to ping you @eme64 this time. I addressed all change requests as best as I could and I think another review would be appropriate. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-2850856305 From qamai at openjdk.org Mon May 5 12:42:24 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 5 May 2025 12:42:24 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v64] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: add more intn_t tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/6be30c51..77aa4062 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=63 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=62-63 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Mon May 5 12:42:24 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 5 May 2025 12:42:24 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v53] In-Reply-To: References: <4J2bhK1v1UcrBUtW7tSR6KR9lcYZ2IrUYxeic6vwfZg=.2c3489ac-dd4e-4a7b-97d6-5deae5223354@github.com> Message-ID: On Fri, 2 May 2025 15:39:08 GMT, Emanuel Peter wrote: >> @eme64 Please let me know if you disagree with any answer from me. I am fairly confident in this patch, especially with the exhaustive tests exercising `intn_t` values. After this patch, I will work on allowing the test infrastructure to work with `Type` instances directly, templatizing `TypeInt` and `TypeLong` so that we can work with `TypeInt>`. > > @merykitty Ok, that is all I can do this week, enjoy the weekend ? ? > If I don't resume commenting on Monday then feel free to ping me with a reminder ? @eme64 Thanks for not forgetting me :) I have answered your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2850864452 From qamai at openjdk.org Mon May 5 12:42:24 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 5 May 2025 12:42:24 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v57] In-Reply-To: References: <-RjffrOs-0kK0Wy2eLL-4FfPQt3Wf98q9y40b229M1A=.25a220cf-a07a-4012-addc-c4897ef43133@github.com> Message-ID: On Mon, 5 May 2025 07:10:28 GMT, Emanuel Peter wrote: >> The thing is that the other operations are so trivial that it would be counter-productive to test them, it is like testing `add(int x, int y) { return x + y; }` :) The operations I test here are the non-trivial ones, that is sign extension and comparison. I have added some sanity `static_assert` to catch off-by-one errors, though. > > Well, it could be relatively easy to get a `>>` or equal operator wrong, because of the higher bits. > I tend to get these things wrong, and tests save me there. You are not me, so I leave it up to you ;) For signed integers, they are tricky, that's why I have tests for comparisons, and `>>` is not implemented yet. For unsigned ones, though, we simply chop off the higher bits so it is much less of a concern. I have added some more tests for the cases the input integer is outside the range of the `intn_t` so that the results need to be wrapped. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2073360162 From thartmann at openjdk.org Mon May 5 12:44:51 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 5 May 2025 12:44:51 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v13] In-Reply-To: <9lzS-yIyl2z6YFwV9VpAyaYcnLujK1gwamhBVSmngeA=.b209089f-69eb-4ea0-8171-c25fbc1f28fe@github.com> References: <9lzS-yIyl2z6YFwV9VpAyaYcnLujK1gwamhBVSmngeA=.b209089f-69eb-4ea0-8171-c25fbc1f28fe@github.com> Message-ID: On Mon, 7 Apr 2025 06:08:55 GMT, Emanuel Peter wrote: >> We should extend the functionality of Verify.checkEQ: >> - Allow different NaN encodings to be seen as equal (by default). >> - Compare VectorAPI vectors. >> - Compare Exceptions, and their messages. >> - Compare arbitrary Objects via Reflection. >> >> Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Andrey Turbanov Looks good to me! I just found a few minor typos. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 35: > 33: > 34: /** > 35: * The {@link Verify} class provide {@link Verify#checkEQ} and {@link Verify#checkEQWithRawBits}, Suggestion: * The {@link Verify} class provides {@link Verify#checkEQ} and {@link Verify#checkEQWithRawBits}, test/hotspot/jtreg/compiler/lib/verify/Verify.java line 36: > 34: /** > 35: * The {@link Verify} class provide {@link Verify#checkEQ} and {@link Verify#checkEQWithRawBits}, > 36: * which recursively compare the two {@link Object}s by value. They deconstruct an array of objects, Suggestion: * which recursively compare the two {@link Object}s by value. They deconstruct an array of objects, test/hotspot/jtreg/compiler/lib/verify/Verify.java line 38: > 36: * which recursively compare the two {@link Object}s by value. They deconstruct an array of objects, > 37: * compare boxed primitive types, compare the content of arrays and {@link MemorySegment}s, and check > 38: * that the messages of two {@link Exception}s are equal. They also checks for the equivalent content Suggestion: * that the messages of two {@link Exception}s are equal. They also check for the equivalent content test/hotspot/jtreg/compiler/lib/verify/Verify.java line 486: > 484: } > 485: > 486: Suggestion: ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24224#pullrequestreview-2814621188 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2073336816 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2073337119 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2073339500 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2073354421 From shade at openjdk.org Mon May 5 12:53:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 May 2025 12:53:53 GMT Subject: RFR: 8356122: Client build fails after JDK-8350209 In-Reply-To: References: Message-ID: <_-rCKYOPTacyyHKJSITdUAvInfYqVTQ7qjv_0jCTJeY=.98e9f0f3-bcac-49c1-bdd3-bd2675969db1@github.com> On Mon, 5 May 2025 09:49:41 GMT, Aleksey Shipilev wrote: > See bug for samples of build failures. I reproduced and fixed both with this PR. > > Additional testing: > - [x] Linux x86_64 client release build > - [x] Linux x86_64 client fastdebug build Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25030#issuecomment-2850894167 From shade at openjdk.org Mon May 5 12:53:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 May 2025 12:53:54 GMT Subject: Integrated: 8356122: Client build fails after JDK-8350209 In-Reply-To: References: Message-ID: <7WIHFwxah3fpqvtsm7npoP5nadFADb87ULTw1FTF3A0=.acb38766-195f-41a3-9a46-4160f9ed4d55@github.com> On Mon, 5 May 2025 09:49:41 GMT, Aleksey Shipilev wrote: > See bug for samples of build failures. I reproduced and fixed both with this PR. > > Additional testing: > - [x] Linux x86_64 client release build > - [x] Linux x86_64 client fastdebug build This pull request has now been integrated. Changeset: 1501a5e4 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/1501a5e41e59162a374cf5b8cfc37faced48a6ed Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8356122: Client build fails after JDK-8350209 Reviewed-by: chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/25030 From epeter at openjdk.org Mon May 5 13:02:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 May 2025 13:02:36 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v14] In-Reply-To: References: Message-ID: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply Tobias' review suggestions Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/49f6789c..187aa54f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=12-13 Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From epeter at openjdk.org Mon May 5 13:06:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 May 2025 13:06:36 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v15] In-Reply-To: References: Message-ID: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects - Apply Tobias' review suggestions Co-authored-by: Tobias Hartmann - Apply suggestions from code review Co-authored-by: Andrey Turbanov - For Christian - Apply suggestions from code review Co-authored-by: Christian Hagedorn - fix whitespace issues - Updates for Christian - Apply suggestions from code review Co-authored-by: Christian Hagedorn - refactor with checkEQWithRawBits - upate copyright - ... and 7 more: https://git.openjdk.org/jdk/compare/191ca09e...23a22389 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/187aa54f..23a22389 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=13-14 Stats: 319875 lines in 3263 files changed: 105989 ins; 202060 del; 11826 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From epeter at openjdk.org Mon May 5 13:16:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 May 2025 13:16:07 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v64] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:42:24 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add more intn_t tests @merykitty Ok then, thanks for working through all this with me! I'm really happy with the result now, and I feel like I learned a lot from you ? @vnkozlov You might want to have another pass over it, as we reworked quite a bit. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-2814771554 From epeter at openjdk.org Mon May 5 13:16:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 May 2025 13:16:09 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v62] In-Reply-To: <1pXrKl_x2o6pUZTRPuQYybroDad0b5k3SslhVE0Rhl8=.5eb36e99-2018-4686-8085-fd4b425327dc@github.com> References: <1pXrKl_x2o6pUZTRPuQYybroDad0b5k3SslhVE0Rhl8=.5eb36e99-2018-4686-8085-fd4b425327dc@github.com> Message-ID: On Mon, 5 May 2025 11:47:17 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/rangeinference.cpp line 480: >> >>> 478: if (h < ~bounds._hi) { >>> 479: return AdjustResult>::make_empty(); >>> 480: } >> >> Another nit: I feel like the "overflow" case here would not have to spill outside of `adjust_lo`. >> And `Optional` style return value would make more sense for the reader at this point, then the reader does not have to worry about why we do a comparison here, and does not have to dive deeper into `adjust_lo`. >> >> I leave this up to you though. > > That's a good point, however I think I will do this later as we don't have an `Optional` in Hotspot yet. Oh, you basically already defined "Optional" whenever you have a `_present` flag. >> src/hotspot/share/utilities/intn_t.hpp line 44: >> >>> 42: // Implementation-wise, this class currently only supports 0 < nbits <= 8. Also >>> 43: // note that this class is implemented so that overflows in alrithmetic >>> 44: // operations are well-defined and wrap-around. >> >> Suggestion: >> >> // operations are well-defined and wrap-around, just like jint, juint, jlong and julong. > > Overflow in `jint` and `jlong` is actually UB. Ah right. Only well defined with `java_add` etc. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2073421714 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r2073420231 From epeter at openjdk.org Mon May 5 13:21:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 May 2025 13:21:47 GMT Subject: RFR: 8354520: IGV: dump contextual information [v6] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 11:37:04 GMT, Roberto Casta?eda Lozano wrote: >> This changeset extends the IGV graph dumps with additional properties that ease tracing the dumps back to the context in which they were produced. The changeset dumps, for every compilation, the following additional properties: >> >> - JVM arguments >> - platform information >> - JVM version information >> - date and time >> - process ID >> - (compiler) thread ID >> >> ![compilation-properties](https://github.com/user-attachments/assets/8ddc8fb9-c348-4761-8e19-c70633a1b59f) >> >> Additionally, the changeset produces and dumps the C2 stack trace from which each graph is dumped: >> >> ![c2-stack-trace](https://github.com/user-attachments/assets/085547ee-b0b3-4a38-86f1-9df79cf1cc01) >> >> This should be particularly useful in an interactive context, where the user steps through C2 code using a debugger and dumps graphs at different points. To produce a stack trace in this context, the usual debugger-entry C2 functions (`igv_print`, `igv_append`, `Node::dump_bfs`, ...) are extended with extra arguments to specify the stack handling registers (stack pointer, frame pointer, and program counter): >> >> ![c2-stack-trace-from-gdb](https://github.com/user-attachments/assets/29de2964-ee2d-4f5f-bcf7-d81e1bc6c8a6) >> >> The inconvenience of manually specifying the stack handling registers can be addressed by hiding them in debugger user-defined commands, e.g.: >> >> >> define igv >> p igv_print(true, $sp, $fp, $pc) >> end >> >> define igv_node >> p find_node($arg0)->dump_bfs(0, 0, "!", $sp, $fp, $pc) >> end >> >> >> Thanks to @TobiHartmann for providing useful feedback! >> >> #### Testing >> >> - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). >> - Tested interactive usage manually via `gdb` and `rr` on linux-x64. >> - Tested automatically that dumping thousands of graphs does not trigger any assertion failure. > > Roberto Casta?eda Lozano has updated the pull request incrementally with seven additional commits since the last revision: > > - Dump CPU features > - Make frame pointer parameters const whenever possible > - Pass pointer to initial frame to print_stack > - Refactor loop > - Inline _current into its only use > - Improve naming and commenting of stack-walking predicates > - Extend comments with debugger usage examples @robcasloz Amazing, thanks for the updates :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24724#pullrequestreview-2814789769 From duke at openjdk.org Mon May 5 13:29:57 2025 From: duke at openjdk.org (snake66) Date: Mon, 5 May 2025 13:29:57 GMT Subject: RFR: 8356182: Build fails on aarch64 without ZGC Message-ID: jvmciCodeInstaller_aarch64.cpp references symbols defined by the ZGC unconditionally, causing the build to fail when ZGC is not included. This work is sponsored by The FreeBSD FOundation ------------- Commit messages: - 8356182: Build fails on aarch64 without ZGC Changes: https://git.openjdk.org/jdk/pull/25039/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25039&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356182 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25039/head:pull/25039 PR: https://git.openjdk.org/jdk/pull/25039 From stefank at openjdk.org Mon May 5 13:46:48 2025 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 5 May 2025 13:46:48 GMT Subject: RFR: 8356182: Build fails on aarch64 without ZGC In-Reply-To: References: Message-ID: On Mon, 5 May 2025 13:24:51 GMT, snake66 wrote: > jvmciCodeInstaller_aarch64.cpp references symbols defined by the ZGC unconditionally, causing the build to fail when ZGC is not included. > > This work is sponsored by The FreeBSD FOundation Looks good to me. Needs another review from a compiler dev as well. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25039#pullrequestreview-2814870397 From duke at openjdk.org Mon May 5 13:50:45 2025 From: duke at openjdk.org (snake66) Date: Mon, 5 May 2025 13:50:45 GMT Subject: RFR: 8356182: Build fails on aarch64 without ZGC In-Reply-To: References: Message-ID: <_VwRtES2lVww-_VOLFjT67PIMv9w5YHg4cugFYWF56A=.c8203eff-1b70-4181-aa64-0d344bd6c1b5@github.com> On Mon, 5 May 2025 13:43:56 GMT, Stefan Karlsson wrote: >> jvmciCodeInstaller_aarch64.cpp references symbols defined by the ZGC unconditionally, causing the build to fail when ZGC is not included. >> >> This work is sponsored by The FreeBSD FOundation > > Looks good to me. Needs another review from a compiler dev as well. @stefank Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25039#issuecomment-2851073034 From epeter at openjdk.org Mon May 5 13:53:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 May 2025 13:53:53 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v12] In-Reply-To: <05AJmJd1G9_Z5TzYb6kuA1KcXqN96C2-ivfhnstgfCM=.aadfc52f-7748-4abb-a497-8f5049ab608b@github.com> References: <05AJmJd1G9_Z5TzYb6kuA1KcXqN96C2-ivfhnstgfCM=.aadfc52f-7748-4abb-a497-8f5049ab608b@github.com> Message-ID: On Sat, 3 May 2025 17:29:39 GMT, Jasmine Karthikeyan wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Whitespace and benchmark tweak > > Thanks a lot for running the benchmark on your AVX512 machine! The results are very interesting, in the char cases it looks like we over-unroll the loop with SuperWord enabled even though we don't end up vectorizing the loop, fixing that could solve the slowdown. Since you mentioned the unroll amount was 32x, it might be unrolling to fill a vector (`512/sizeof(char) = 32`). > >> Wait, but you seem to say that you want to support `casting to T_CHAR`. But is the issue not casting FROM char? > > You are correct, I think that is my mistake. It looks like casting to char is supported because stores to both short and char become `StoreC`, but casting from char isn't supported because we have no `VectorCastC2X` node. I'll update the bug to make it more accurate. > > I've also pushed a small commit to remove some extra whitespace and to make the benchmark run faster. @jaskarth Just checked the internal testing. Saw this failure with `-XX:UseAVX=1`: Failed IR Rules (2) of Methods (2) ---------------------------------- 1) Method "public java.lang.Object[] compiler.loopopts.superword.TestCompatibleUseDefTypeSize.testByteToLong(byte[],long[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#VECTOR_CAST_B2L#_", "_ at min(max_byte, max_long)", ">0"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={"AlignVector", "false", "UseCompactObjectHeaders", "false"}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={"avx", "true"}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(VectorCastB2X.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 2) Method "public java.lang.Object[] compiler.loopopts.superword.TestCompatibleUseDefTypeSize.testLongToByte(long[],byte[])" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#VECTOR_CAST_L2B#_", "_ at min(max_long, max_byte)", ">0"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={"AlignVector", "false", "UseCompactObjectHeaders", "false"}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={"avx", "true"}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(VectorCastL2X.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2851082595 From rehn at openjdk.org Mon May 5 14:15:50 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 5 May 2025 14:15:50 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v8] In-Reply-To: <5e1o1xtN0ZdQZGJi2aVmgCEApW625koeE9F53VhDi5E=.2390045d-844e-4800-8d4b-075a2a3a8793@github.com> References: <5e1o1xtN0ZdQZGJi2aVmgCEApW625koeE9F53VhDi5E=.2390045d-844e-4800-8d4b-075a2a3a8793@github.com> Message-ID: <8yKly3rraQHaFVLDz1x_1p9LgmrTrrenmyJcMuXJ52k=.ed75a499-9f76-483d-80ea-ab61edcd0337@github.com> On Mon, 5 May 2025 10:17:27 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > change slli+add sequence to shadd I think what @RealFYang is saying: You don't need to know the vector size, i.e.: const int nof_vec_elems = MaxVectorSize; .... mv(t1, nof_vec_elems); vsetvli(t0, t1, Assembler::e32, Assembler::m4); You can set vsetvli to to cnt round down to nearest 4 byte. And let vsetvli process as much as it can per iteration. It will never process more than vlen, so the last loop it may process only 4 bytes. Here is example of a memcopy: https://github.com/riscvarchive/riscv-v-spec/blob/master/example/memcpy.s This means the main loop is vector register length agonistic. Now you have 3 or less bytes left to process with normal scalar ops. ------------- PR Review: https://git.openjdk.org/jdk/pull/17413#pullrequestreview-2814965363 From jbhateja at openjdk.org Mon May 5 14:21:07 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 5 May 2025 14:21:07 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v7] In-Reply-To: References: Message-ID: <7sL3-TEh2o6nT6GvvjYUpQfBbqbzeXgrJST9JeAcjLc=.df22b69d-6087-44c6-883d-e0604b92a44d@github.com> > Hi All, > > This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. > > Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. > > New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Adding additional test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23947/files - new: https://git.openjdk.org/jdk/pull/23947/files/73f9749a..4c4d1688 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=05-06 Stats: 18 lines in 2 files changed: 17 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23947/head:pull/23947 PR: https://git.openjdk.org/jdk/pull/23947 From chagedorn at openjdk.org Mon May 5 14:26:47 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 5 May 2025 14:26:47 GMT Subject: RFR: 8356182: Build fails on aarch64 without ZGC In-Reply-To: References: Message-ID: <1EsTsRhVHlC5lsFXRZXBGff4nOPUj-hyAE6VivJpw3w=.4fcfc225-2bda-403b-bf26-6d01347d5e38@github.com> On Mon, 5 May 2025 13:24:51 GMT, snake66 wrote: > jvmciCodeInstaller_aarch64.cpp references symbols defined by the ZGC unconditionally, causing the build to fail when ZGC is not included. > > This work is sponsored by The FreeBSD FOundation Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25039#pullrequestreview-2815008769 From jbhateja at openjdk.org Mon May 5 14:26:51 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 5 May 2025 14:26:51 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v5] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 10:40:02 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resoultions > >> Let me tune this check and update the test. > > For me to approve this code, you will have to do more than that. I will need: > - Proof of the implemented logic. > - More tests. Hi @eme64 , can you kindly run this latest version through your testing, please. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2851179612 From epeter at openjdk.org Mon May 5 14:26:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 5 May 2025 14:26:59 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v15] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 18:04:38 GMT, Kangcheng Xu wrote: >> [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. >> >> When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) >> >> The following was implemented to address this issue. >> >> if (UseNewCode2) { >> *multiplier = bt == T_INT >> ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows >> : ((jlong) 1) << con->get_int(); >> } else { >> *multiplier = ((jlong) 1 << con->get_int()); >> } >> >> >> Two new bitshift overflow tests were added. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > remove asserts, add more documentation Did a quick pass over half of the code before finishing for the day. I still think that the patterns covered are more limited than preferable. It would be nice if we could cover more cases, especially more cases where the RHS has different patterns too. One question: what if LHS and RHS are flipped, can we recognize patterns like this: `a + (a << CON)`? src/hotspot/share/opto/addnode.cpp line 417: > 415: // => n*a > 416: // > 417: // Due to the iterative nature of iGVN, MulNode transformed from first few AddNode terms may be further transformed into Suggestion: // Due to the iterative nature of iGVN, MulNode transformed from first few AddNode terms may be further transformed into Suggestion: // Due to the iterative nature of IGVN, MulNode transformed from first few AddNode terms may be further transformed into src/hotspot/share/opto/addnode.cpp line 425: > 423: // - (2) Simple lshift: a << CON > 424: // - (3) Simple multiplication: CON * a > 425: // - (4) Power-of-two addition: (a << CON1) + (a << CON2) Suggestion: // - (1) Simple addition: LHS = a + a // - (2) Simple lshift: LHS = a << CON // - (3) Simple multiplication: LHS = CON * a // - (4) Power-of-two addition: LHS = (a << CON1) + (a << CON2) I suggest adding the `LHS =` explicitly here. I tend to not always read all the explanatory text, and search for patterns first. Then it is not immediately clear that you are actually matching a `a + a + a` rather than a `a + a` for `1)`. src/hotspot/share/opto/addnode.cpp line 428: > 426: // > 427: // Note this also converts, for example, original expression `(a*3) + a` into `4*a` and `(a<<2) + a` into `5*a`. A more > 428: // generalized pattern `(a*b) + (a*c)` into `a*(b + c)` is handled by AddNode::IdealIL(). What about the pattern `(a * CON1) + (a << CON2)`? Is that handled here or by `AddNode::IdealIL()`? src/hotspot/share/opto/addnode.cpp line 438: > 436: > 437: Node* in1 = in(1); > 438: Node* in2 = in(2); For consistency, you should either only use `LHS` / `RHS`, or only `in1` / `in2`. src/hotspot/share/opto/addnode.cpp line 440: > 438: Node* in2 = in(2); > 439: > 440: // (1) Simple addition pattern (e.g., a + a) Suggestion: // (1) Simple addition pattern (e.g., in1 = a + a) Do the same below. src/hotspot/share/opto/addnode.cpp line 481: > 479: } > 480: > 481: // Try to match `a << CON`. On success, return a struct with `.valid = true`, `variable = a`, and Suggestion: // Try to match `n = a << CON`. On success, return a struct with `.valid = true`, `variable = a`, and Do the same below. src/hotspot/share/opto/addnode.cpp line 494: > 492: } > 493: > 494: return Multiplication{}; Suggestion: return Multiplication::make_invalid(); I would prefer wrapping this in something that is immediately clear at the call site, so the reader does not have to go look at the internals of `Multiplication`. src/hotspot/share/opto/addnode.hpp line 54: > 52: inline static bool is_valid_multiplication(const Multiplication& mul, const Node* variable) { > 53: return mul.valid && mul.variable == variable; > 54: } Why not make it a `is_valid_with(variable)` method of the struct? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23506#pullrequestreview-2814911911 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2073541357 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2073539198 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2073504103 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2073542233 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2073543232 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2073546572 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2073549440 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2073527630 From shade at openjdk.org Mon May 5 14:51:44 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 May 2025 14:51:44 GMT Subject: RFR: 8356153: Shenandoah stubs are missing in AOT Code Cache addresses table In-Reply-To: <8_Giy7duTflvM90PUBe2z8A01tc2kSe23RO1rEq-JHc=.875b1c68-31da-4688-a55c-0d2db8205113@github.com> References: <8_Giy7duTflvM90PUBe2z8A01tc2kSe23RO1rEq-JHc=.875b1c68-31da-4688-a55c-0d2db8205113@github.com> Message-ID: On Mon, 5 May 2025 09:19:40 GMT, Aleksey Shipilev wrote: > See the bug for reproducer. We actually have similar Shenandoah hunks down in Leyden repository, but we have apparently missed them when upstreaming [JDK-8350209](https://bugs.openjdk.org/browse/JDK-8350209). We only need a small subset of stubs that adapters use: pre-barriers and phantom load-barriers. This matches what we do for G1 and Z as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `runtime/cds` > - [x] Linux x86_64 server fastdebug, `runtime/cds` with `-XX:+UseShenandoahGC` Attn @vnkozlov :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25028#issuecomment-2851255866 From asmehra at openjdk.org Mon May 5 15:14:47 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Mon, 5 May 2025 15:14:47 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache In-Reply-To: References: Message-ID: On Sat, 3 May 2025 23:30:44 GMT, Vladimir Kozlov wrote: > We need to do something about Compressed Klass base: I too feel bailing out is too restrictive. In my tests I have seen this happening too frequently to ignore. > Which blob/stubs decompress/compress klass using the base? `CompressedKlassPointers::base()` is called by C1 blob for `is_instance_of` [0]. [0] https://github.com/openjdk/jdk/blob/1501a5e41e59162a374cf5b8cfc37faced48a6ed/src/hotspot/cpu/x86/c1_Runtime1_x86.cpp#L1131 https://github.com/openjdk/jdk/blob/1501a5e41e59162a374cf5b8cfc37faced48a6ed/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L5439 > May be we should use Relocation for it. @vnkozlov How about updating `decode_klass_not_null` like this? if (CompressedKlassPointers::base() != nullptr) { if (AOTCodeCache::is_on_for_dump()) { movptr(tmp, ExternalAddress(CompressedKlassPointers::base_addr())); } else { mov64(tmp, (int64_t)CompressedKlassPointers::base()); } addq(r, tmp); } and adding `CompressedKlassPointers::base_addr()` to the AOTAddressTable. > I think we have relocation for CompressedOops::base() so we can patch. We have relocation for `CompressedOops::base()` only when heap is not yet initialized: void MacroAssembler::reinit_heapbase() { if (UseCompressedOops) { if (Universe::heap() != nullptr) { if (CompressedOops::base() == nullptr) { MacroAssembler::xorptr(r12_heapbase, r12_heapbase); } else { mov64(r12_heapbase, (int64_t)CompressedOops::base()); } } else { movptr(r12_heapbase, ExternalAddress(CompressedOops::base_addr())); } } } In premain we do add `CompressedOops::base_addr()` to the AOT Address table. But I don't think we are accessing `CompressedOops::base` in adapters or blobs that we are targeting in mainline. At least I haven't come across the need to have `CompressedOops::base` in the address table. Is that correct @vnkozlov @adinn ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2851321094 PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2851326869 PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2073639605 From duke at openjdk.org Mon May 5 15:19:45 2025 From: duke at openjdk.org (snake66) Date: Mon, 5 May 2025 15:19:45 GMT Subject: RFR: 8356182: Build fails on aarch64 without ZGC In-Reply-To: <1EsTsRhVHlC5lsFXRZXBGff4nOPUj-hyAE6VivJpw3w=.4fcfc225-2bda-403b-bf26-6d01347d5e38@github.com> References: <1EsTsRhVHlC5lsFXRZXBGff4nOPUj-hyAE6VivJpw3w=.4fcfc225-2bda-403b-bf26-6d01347d5e38@github.com> Message-ID: On Mon, 5 May 2025 14:24:12 GMT, Christian Hagedorn wrote: >> jvmciCodeInstaller_aarch64.cpp references symbols defined by the ZGC unconditionally, causing the build to fail when ZGC is not included. >> >> This work is sponsored by The FreeBSD FOundation > > Looks good to me, too. @chhagedorn Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25039#issuecomment-2851338858 From rcastanedalo at openjdk.org Mon May 5 15:24:45 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 5 May 2025 15:24:45 GMT Subject: RFR: 8356172: IdealGraphPrinter doesn't need ThreadCritical In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:27:22 GMT, Coleen Phillimore wrote: > Please review this possibly trivial change. > Tested with hs-precheckin-comp test list. Looks good to me, thanks. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25035#pullrequestreview-2815190706 From duke at openjdk.org Mon May 5 15:25:55 2025 From: duke at openjdk.org (duke) Date: Mon, 5 May 2025 15:25:55 GMT Subject: Withdrawn: 8349563: Improve AbsNode::Value() for integer types In-Reply-To: References: Message-ID: On Wed, 19 Feb 2025 05:10:04 GMT, Jasmine Karthikeyan wrote: > Hi all, > This is a small patch that improves the implementation of Value() for `AbsINode` and `AbsLNode` by returning the absolute value of the input range. Most of the logic is trivial except for the special case where `_lo == jint_min/jlong_min` which must return the entire type range when encountered, for which I've added a small proof in the comments. I've also added some unit tests and updated the file to limit IR check platforms with more granularity. > > Thoughts and reviews would be appreciated! This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23685 From duke at openjdk.org Mon May 5 15:28:45 2025 From: duke at openjdk.org (duke) Date: Mon, 5 May 2025 15:28:45 GMT Subject: RFR: 8356182: Build fails on aarch64 without ZGC In-Reply-To: References: Message-ID: On Mon, 5 May 2025 13:24:51 GMT, snake66 wrote: > jvmciCodeInstaller_aarch64.cpp references symbols defined by the ZGC unconditionally, causing the build to fail when ZGC is not included. > > This work is sponsored by The FreeBSD FOundation @snake66 Your change (at version 7b492d8efcc55c026500db4f27486d76ac2d1f65) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25039#issuecomment-2851363707 From kvn at openjdk.org Mon May 5 15:35:46 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 5 May 2025 15:35:46 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache In-Reply-To: References: Message-ID: On Sat, 3 May 2025 04:10:01 GMT, Ashutosh Mehra wrote: > [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. > This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. > `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. Yes, please do that in `decode_klass_not_null()`. We update other methods for nmethod caching in JDK 26. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2851382631 From sparasa at openjdk.org Mon May 5 15:44:05 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 5 May 2025 15:44:05 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v18] In-Reply-To: References: Message-ID: > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: remove unused functions: orw and evex_prefix_int8_operand_ndd ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/d1c1b077..67d9b3b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=16-17 Stats: 29 lines in 2 files changed: 0 ins; 29 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From sparasa at openjdk.org Mon May 5 15:44:08 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 5 May 2025 15:44:08 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v17] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 23:30:25 GMT, Sandhya Viswanathan wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> more clarifying comments next to boolean literals > > src/hotspot/cpu/x86/assembler_x86.cpp line 4403: > >> 4401: emit_arith(0x0B, 0xC0, dst, src); >> 4402: } >> 4403: > > No need to add this function now. Please see orw() removed in the updated code. > src/hotspot/cpu/x86/assembler_x86.cpp line 12979: > >> 12977: } >> 12978: emit_operand(src1, src2, 0); >> 12979: } > > This function could be removed, not used. Please see the unused function removed in the updated code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2073688221 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2073687771 From kvn at openjdk.org Mon May 5 15:55:48 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 5 May 2025 15:55:48 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache In-Reply-To: References: Message-ID: <77aRy2Ss1fvM3EOR9VCGoKiwg4-8PsHOCDRgCd-ekn0=.203e4b18-7b98-4fa2-8391-94e7a412000d@github.com> On Sat, 3 May 2025 04:10:01 GMT, Ashutosh Mehra wrote: > [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. > This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. > `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. I think `reinit_heapbase()` is used after JNI calls to put it back into register for compiled code. I don't think For debugging your changes you can add print Into `reinit_heapbase()` using UseNewCode flag and set the flag in `compiler_stubs_init()` ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2851441182 From duke at openjdk.org Mon May 5 16:03:59 2025 From: duke at openjdk.org (duke) Date: Mon, 5 May 2025 16:03:59 GMT Subject: Withdrawn: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 09:59:44 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands, which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::has_initial_implicit_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (measured in percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo chopin benchmarks: > > ![C2-inc-hit-rate-jdk-25+1-vs-jdk-25+1-with-8345067](https://github.com/user-attachments/assets/8d114058-c6b2-4254-a374-0d0b220af718) > > The larger number of implicit null checks results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > A further extension of the optimization to arbitrary memory access instructions (including e.g. G1 object stores, which emit multiple memory accesses at arbitrary address offsets) will be investigated separately as part of [JDK-8344627](https://bugs.openjdk.org/browse/JDK-8344627). > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/22678 From kvn at openjdk.org Mon May 5 16:14:49 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 5 May 2025 16:14:49 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: <1TLtkRe2ydHcPB5lnREFbmF4hlQ4rOBHyNXbplFujM0=.427f9764-dda9-41e4-a228-95f47426cf25@github.com> On Wed, 30 Apr 2025 07:23:39 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move to oops Looks fine to me too. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24018#pullrequestreview-2815333573 From kvn at openjdk.org Mon May 5 16:32:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 5 May 2025 16:32:51 GMT Subject: RFR: 8356153: Shenandoah stubs are missing in AOT Code Cache addresses table In-Reply-To: <8_Giy7duTflvM90PUBe2z8A01tc2kSe23RO1rEq-JHc=.875b1c68-31da-4688-a55c-0d2db8205113@github.com> References: <8_Giy7duTflvM90PUBe2z8A01tc2kSe23RO1rEq-JHc=.875b1c68-31da-4688-a55c-0d2db8205113@github.com> Message-ID: <6X17kLbmnXqNUUBLyKhBKDKpFpDD5XfBYUzMPA5yClI=.aa69e1b1-5bd2-4650-8a85-8397b0e8e557@github.com> On Mon, 5 May 2025 09:19:40 GMT, Aleksey Shipilev wrote: > See the bug for reproducer. We actually have similar Shenandoah hunks down in Leyden repository, but we have apparently missed them when upstreaming [JDK-8350209](https://bugs.openjdk.org/browse/JDK-8350209). We only need a small subset of stubs that adapters use: pre-barriers and phantom load-barriers. This matches what we do for G1 and Z as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `runtime/cds` > - [x] Linux x86_64 server fastdebug, `runtime/cds` with `-XX:+UseShenandoahGC` Good. And trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25028#pullrequestreview-2815370805 PR Comment: https://git.openjdk.org/jdk/pull/25028#issuecomment-2851551108 From shade at openjdk.org Mon May 5 16:32:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 May 2025 16:32:51 GMT Subject: RFR: 8356153: Shenandoah stubs are missing in AOT Code Cache addresses table In-Reply-To: <8_Giy7duTflvM90PUBe2z8A01tc2kSe23RO1rEq-JHc=.875b1c68-31da-4688-a55c-0d2db8205113@github.com> References: <8_Giy7duTflvM90PUBe2z8A01tc2kSe23RO1rEq-JHc=.875b1c68-31da-4688-a55c-0d2db8205113@github.com> Message-ID: On Mon, 5 May 2025 09:19:40 GMT, Aleksey Shipilev wrote: > See the bug for reproducer. We actually have similar Shenandoah hunks down in Leyden repository, but we have apparently missed them when upstreaming [JDK-8350209](https://bugs.openjdk.org/browse/JDK-8350209). We only need a small subset of stubs that adapters use: pre-barriers and phantom load-barriers. This matches what we do for G1 and Z as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `runtime/cds` > - [x] Linux x86_64 server fastdebug, `runtime/cds` with `-XX:+UseShenandoahGC` Thanks! I am integrating to unblock AOT testing with various GCs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25028#issuecomment-2851554905 From shade at openjdk.org Mon May 5 16:32:52 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 May 2025 16:32:52 GMT Subject: Integrated: 8356153: Shenandoah stubs are missing in AOT Code Cache addresses table In-Reply-To: <8_Giy7duTflvM90PUBe2z8A01tc2kSe23RO1rEq-JHc=.875b1c68-31da-4688-a55c-0d2db8205113@github.com> References: <8_Giy7duTflvM90PUBe2z8A01tc2kSe23RO1rEq-JHc=.875b1c68-31da-4688-a55c-0d2db8205113@github.com> Message-ID: On Mon, 5 May 2025 09:19:40 GMT, Aleksey Shipilev wrote: > See the bug for reproducer. We actually have similar Shenandoah hunks down in Leyden repository, but we have apparently missed them when upstreaming [JDK-8350209](https://bugs.openjdk.org/browse/JDK-8350209). We only need a small subset of stubs that adapters use: pre-barriers and phantom load-barriers. This matches what we do for G1 and Z as well. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `runtime/cds` > - [x] Linux x86_64 server fastdebug, `runtime/cds` with `-XX:+UseShenandoahGC` This pull request has now been integrated. Changeset: f6876449 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/f68764490c9d355770475d26202fe10005375388 Stats: 10 lines in 1 file changed: 8 ins; 0 del; 2 mod 8356153: Shenandoah stubs are missing in AOT Code Cache addresses table Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/25028 From shade at openjdk.org Mon May 5 16:55:49 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 5 May 2025 16:55:49 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 07:23:39 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move to oops Thank you! I'll wait a bit if @kimbarrett is able to confirm this matches the idea he had back in JBS comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2851636080 From kvn at openjdk.org Mon May 5 17:02:44 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 5 May 2025 17:02:44 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 13:18:33 GMT, Marc Chevalier wrote: > A first part toward a better support of pure functions. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! > > IR framework and IGV needed a little bit of fixing. > > Thanks, > Marc Nice work. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24966#pullrequestreview-2815464620 From duke at openjdk.org Mon May 5 18:12:49 2025 From: duke at openjdk.org (Yuri Gaevsky) Date: Mon, 5 May 2025 18:12:49 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v8] In-Reply-To: <5e1o1xtN0ZdQZGJi2aVmgCEApW625koeE9F53VhDi5E=.2390045d-844e-4800-8d4b-075a2a3a8793@github.com> References: <5e1o1xtN0ZdQZGJi2aVmgCEApW625koeE9F53VhDi5E=.2390045d-844e-4800-8d4b-075a2a3a8793@github.com> Message-ID: On Mon, 5 May 2025 10:17:27 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > change slli+add sequence to shadd As you can expect I am trying to implement the following code with RVV: for (; i + (N-1) < cnt; i += N) { h = 31^^N * h + 31^^(N-1) * val[i + 0] + 31^^(N-2) * val[i + 1] ... + 31^^1 * val[i + (N-2)] + 31^^0 * val[i + (N-1)]; } for (; i < cnt; i++) { h = 31 * h + val[i]; } where `N` is a number of processing array elements in "chunk". IIUC, the main issue with your approach is "reverse" order of array elements versus preloaded `31^^X` coeffs WHEN the remaining number of elems is less than `N`, say `M=N-1`. h = 31^^M * h + 31^^(M-1) * val[i + 0] + 31^^(M-2) * val[i + 1] ... + 31^^1 * val[i + (M-2)] + 32^^0 * val[i + (M-1)]; or returning to our `N` for clarity h = 31^^(N-1) * h + 31^^(N-2) * val[i + 0] + 31^^(N-3) * val[i + 1] ... + 31^^1 * val[i + (N-3)] + 31^^0 * val[i + (N-2)]; Now we need to "slide down" preloaded multiplier coeffs in designated vector register by one (as `M=N-1`) to be in "sync" with `val[i + X]` (may be move them into temporary VR in the process), and moreover, DO this operation IFF the remaining `cnt` is less than `N` (==>an additional check on every iteration). That's probably acceptable only at tail phase as one-time operation but NOT inside of main loop... ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-2851905398 From kxu at openjdk.org Mon May 5 18:15:54 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 5 May 2025 18:15:54 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 16:42:41 GMT, Emanuel Peter wrote: >> Hello @eme64. I pinged you in [an in-line review](https://github.com/openjdk/jdk/pull/23506#discussion_r2042974649). Could you please provide some commons on this assertion? This is currently blocking my progress and breaking the build. Thank you very much! > > @tabjy Thanks for your patience, this one took me longer than I wanted. I responded like this above: > >> Hmm, ok I see. Why don't you remove the asserts for now, and we see how clear the code looks now. I think I asked for the consistency check because I was confused by the previous code structure. Maybe it is ok now as it is. @eme64 > I still think that the patterns covered are more limited than preferable. [...] can we recognize patterns like this: a + (a << CON)? I only aimed to cover cases minimally needed for `a + a + ... + a` at fear of adding even more complexity. Cases like `CON * a + a` are considered unintended side-effects due to the way pattern matching is implemented. Flipped LHS and RHS like `a + (a << CON)` is not recognized (by me or `IdealIL()`). That is, I only match the RHS being a base variable (i.e., `a`). I understand you think giving the effort, we could explore more opportunities here. To explain the complexity, consider the following expression with base variable `a`: a + a + ((a + a) + a) ^ LHS(a) = RHS(a) = 2*a + ((a + a) + a) ^ unable to resolve RHS to 3*a without some recursion I could, at very least, try to swap LHS and RHS if no match is found, but the case above will still not benefit from swapping (without a recursive algorithm). Simpler cases like `a + (a << CON)` might. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-2851911494 From coleenp at openjdk.org Mon May 5 18:20:47 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 5 May 2025 18:20:47 GMT Subject: RFR: 8356172: IdealGraphPrinter doesn't need ThreadCritical In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:27:22 GMT, Coleen Phillimore wrote: > Please review this possibly trivial change. > Tested with hs-precheckin-comp test list. Is it trivial? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25035#issuecomment-2851923870 From vpaprotski at openjdk.org Mon May 5 18:35:46 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Mon, 5 May 2025 18:35:46 GMT Subject: RFR: 8354473: Incorrect results for compress/expand tests with -XX:+EnableX86ECoreOpts In-Reply-To: References: Message-ID: On Thu, 1 May 2025 06:42:18 GMT, Emanuel Peter wrote: >>> > @vpaprotsk Can you please give a little more details about what exactly went wrong here, and why your change is correct? >>> >>> @eme64 Thanks for looking. Point form in attempt to be concise: >>> >>> * Jatin brought this to my attention, we weren't sure whose code was at fault (i.e. I wrote the blend emulation, he wrote the compress_expand) and I got to the investigation first (i.e. see https://bugs.openjdk.org/browse/JDK-8354473) >>> * The mask for vblendvps instruction; actual instruction only cares about the MSB but for emulation we must have the mask to be either `FFF..FF` or `000..00`. In many places blend is used, this is already the case, so no need to recompute the mask. That's why the flag is provided (i.e. optimization). >>> * (Without fully understanding the entirety of compress_expand), it appears to me that in this function the mask in `permv` _must_ be computed explicitly. That's why the flag is changed. >> >> Hi @vpaprotsk , @eme64, >> >> Just to fill in the missing details about compress/expand handling on AVX2, we maintain an in-memory lookup table of permutation indices corresponding to a mask value. Each row of lookup table either holds a valid permute index, which is a positive index value less than the vector lane count OR a -1 index. >> >> Since blend emulation always expects to operate over a blend mask vector whose lanes either hold a -1 or a 0 value hence there is a need to re-compose the desired blend mask by signed extending the MSB bits to fill the entire lane. Your fix to recompute the mask looks good to me. >> >> >> Best Regards, >> Jatin > > @jatin-bhateja It seems the flag `-XX:+EnableX86ECoreOpts` only is enabled on some very specific machines. How important / wide spread are these machines? Will they become more wide spread over time? Or is this rather rare, and not worth investing too many resources? How does their importance compare to AVX and AVX2, or machines with only SSE2 or SSE4.1? Because we put a focus on SSE/AVX in internal testing, but I'm wondering if we should also test `EnableX86ECoreOpts` more. How does this flag interact with AVX features? Do ECore machines always have AVX2 for example? What would be good flag combinations here? @eme64 Let me know how those tests fared? And if/when I can integrate, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24645#issuecomment-2851965915 From kxu at openjdk.org Mon May 5 18:49:07 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 5 May 2025 18:49:07 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v16] In-Reply-To: References: Message-ID: > [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. > > When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) > > The following was implemented to address this issue. > > if (UseNewCode2) { > *multiplier = bt == T_INT > ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows > : ((jlong) 1) << con->get_int(); > } else { > *multiplier = ((jlong) 1 << con->get_int()); > } > > > Two new bitshift overflow tests were added. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: improve comment readability and struct helper functions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23506/files - new: https://git.openjdk.org/jdk/pull/23506/files/7cce522f..133774b6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23506&range=14-15 Stats: 37 lines in 2 files changed: 5 ins; 1 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/23506.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23506/head:pull/23506 PR: https://git.openjdk.org/jdk/pull/23506 From kxu at openjdk.org Mon May 5 18:49:07 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 5 May 2025 18:49:07 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v15] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 13:58:10 GMT, Emanuel Peter wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> remove asserts, add more documentation > > src/hotspot/share/opto/addnode.cpp line 428: > >> 426: // >> 427: // Note this also converts, for example, original expression `(a*3) + a` into `4*a` and `(a<<2) + a` into `5*a`. A more >> 428: // generalized pattern `(a*b) + (a*c)` into `a*(b + c)` is handled by AddNode::IdealIL(). > > What about the pattern `(a * CON1) + (a << CON2)`? Is that handled here or by `AddNode::IdealIL()`? Please see https://github.com/openjdk/jdk/pull/23506#issuecomment-2851911494 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2073984538 From vlivanov at openjdk.org Mon May 5 19:08:48 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 5 May 2025 19:08:48 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 13:18:33 GMT, Marc Chevalier wrote: > A first part toward a better support of pure functions. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! > > IR framework and IGV needed a little bit of fixing. > > Thanks, > Marc Good work, Marc. High-level comment: I don't know what are the future plans, but as the patch stands now, it feels like it complicates both the design and the implementation. Original implementation relies on macro nodes which are later expanded into leaf runtime calls. What you propose introduce new concept of "pure calls" which is: (1) not a CallNode anymore; and (2) relies on subclassing (which makes it hard to mix with other node properties). Moreover, I don't see much benefit in committing to runtime call representation from the very beginning (early in high-level IR). Going forward, IMO the sweet sport is to support arbitrary nodes to be lowered into leaf runtime calls. You make a big step in that direction by relaxing requirements on `PureCall` to be just a CFG node (and not a full-blown `CallLeaf` node). Next step would be to relax CFG node requirement and let compiler pick the right place to insert it. (Existing expensive node support in C2 addresses some similar challenges.) And, as a complementary options, in some cases it may be just enough to mark individual call nodes as pure, so they can be pruned later if nobody consumes result of their computation anymore. ------------- PR Review: https://git.openjdk.org/jdk/pull/24966#pullrequestreview-2815810010 From asmehra at openjdk.org Mon May 5 21:13:24 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Mon, 5 May 2025 21:13:24 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: > [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. > This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. > `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Merge branch 'master' into preserve-runtime-blobs-master - Address Vladimir's comments Signed-off-by: Ashutosh Mehra - Remove irrelevant comment Signed-off-by: Ashutosh Mehra - Fix win64 compile failures Signed-off-by: Ashutosh Mehra - Fix AOTCodeFlags.java test Signed-off-by: Ashutosh Mehra - Fix compile failure in minimal config Signed-off-by: Ashutosh Mehra - Revert back changes that added AOTRuntimeConstants. Ensure CompressedOops::base and CompressedKlssPointers::base does not change in production run Signed-off-by: Ashutosh Mehra - Fix merge conflicts Signed-off-by: Ashutosh Mehra - Store/load AsmRemarks and DbgStrings in aot code cache Signed-off-by: Ashutosh Mehra - Add missing external address in aarch64 Signed-off-by: Ashutosh Mehra - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab ------------- Changes: https://git.openjdk.org/jdk/pull/25019/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25019&range=01 Stats: 1116 lines in 25 files changed: 874 ins; 125 del; 117 mod Patch: https://git.openjdk.org/jdk/pull/25019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25019/head:pull/25019 PR: https://git.openjdk.org/jdk/pull/25019 From asmehra at openjdk.org Mon May 5 21:13:24 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Mon, 5 May 2025 21:13:24 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache In-Reply-To: References: Message-ID: On Mon, 5 May 2025 15:32:54 GMT, Vladimir Kozlov wrote: > Yes, please do that in decode_klass_not_null(). > We update other methods for nmethod caching in JDK 26. I updated both encode and decode versions. Thanks to @adinn for providing aarch64 changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2852346882 From asmehra at openjdk.org Mon May 5 21:13:25 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Mon, 5 May 2025 21:13:25 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Sat, 3 May 2025 17:34:38 GMT, Vladimir Kozlov wrote: >> Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Merge branch 'master' into preserve-runtime-blobs-master >> - Address Vladimir's comments >> >> Signed-off-by: Ashutosh Mehra >> - Remove irrelevant comment >> >> Signed-off-by: Ashutosh Mehra >> - Fix win64 compile failures >> >> Signed-off-by: Ashutosh Mehra >> - Fix AOTCodeFlags.java test >> >> Signed-off-by: Ashutosh Mehra >> - Fix compile failure in minimal config >> >> Signed-off-by: Ashutosh Mehra >> - Revert back changes that added AOTRuntimeConstants. >> Ensure CompressedOops::base and CompressedKlssPointers::base does not >> change in production run >> >> Signed-off-by: Ashutosh Mehra >> - Fix merge conflicts >> >> Signed-off-by: Ashutosh Mehra >> - Store/load AsmRemarks and DbgStrings in aot code cache >> >> Signed-off-by: Ashutosh Mehra >> - Add missing external address in aarch64 >> >> Signed-off-by: Ashutosh Mehra >> - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab > > src/hotspot/share/code/aotCodeCache.cpp line 1119: > >> 1117: uint n = write_bytes(&offset, sizeof(uint)); >> 1118: if (n != sizeof(uint)) { >> 1119: return false; > > Consider using `id_for_C_string()` and record ID instead of coping string. These strings should be recorded in C strings table already. > If `id_for_C_string()` does not find - assert. We should add `add_C_string()` in missing place. The asm remarks and dbg strings are not currently recorded in C string table. I tried to add these strings by calling `AOTCodeCache::add_C_string()` in `AsmRemarkCollection::insert()` but this results in adding LOTS of strings. So I add the strings to the string table only when writing the asm remarks. This keeps the string count in check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2074200389 From dlong at openjdk.org Mon May 5 21:35:47 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 5 May 2025 21:35:47 GMT Subject: RFR: 8356172: IdealGraphPrinter doesn't need ThreadCritical In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:27:22 GMT, Coleen Phillimore wrote: > Please review this possibly trivial change. > Tested with hs-precheckin-comp test list. Please remove the #include "runtime/threadCritical.hpp" at the top of the file. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25035#issuecomment-2852388209 From kvn at openjdk.org Mon May 5 21:59:48 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 5 May 2025 21:59:48 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 21:13:24 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - Fix win64 compile failures > > Signed-off-by: Ashutosh Mehra > - Fix AOTCodeFlags.java test > > Signed-off-by: Ashutosh Mehra > - Fix compile failure in minimal config > > Signed-off-by: Ashutosh Mehra > - Revert back changes that added AOTRuntimeConstants. > Ensure CompressedOops::base and CompressedKlssPointers::base does not > change in production run > > Signed-off-by: Ashutosh Mehra > - Fix merge conflicts > > Signed-off-by: Ashutosh Mehra > - Store/load AsmRemarks and DbgStrings in aot code cache > > Signed-off-by: Ashutosh Mehra > - Add missing external address in aarch64 > > Signed-off-by: Ashutosh Mehra > - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab This looks good. Let me test it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2852429548 From sviswanathan at openjdk.org Mon May 5 22:02:48 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 5 May 2025 22:02:48 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v18] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 15:44:05 GMT, Srinivas Vamsi Parasa wrote: >> The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > remove unused functions: orw and evex_prefix_int8_operand_ndd src/hotspot/cpu/x86/assembler_x86.cpp line 12976: > 12974: if (pre == VEX_SIMD_66) { > 12975: emit_int8(0x66); > 12976: } We could do this based on size instead: if (size == EVEX_16bit). src/hotspot/cpu/x86/assembler_x86.cpp line 12993: > 12991: if (pre == VEX_SIMD_66) { > 12992: emit_int8(0x66); > 12993: } We could do this based on size instead: if (size == EVEX_16bit). src/hotspot/cpu/x86/assembler_x86.cpp line 13009: > 13007: if (pre == VEX_SIMD_66) { > 13008: emit_int8(0x66); > 13009: } This is not used and could be removed. src/hotspot/cpu/x86/assembler_x86.cpp line 13044: > 13042: bool demote = is_demotable(no_flags, dst_enc, nds_enc); > 13043: if (demote) { > 13044: (size == EVEX_64bit) ? prefixq_and_encode(dst_enc) : prefix_and_encode(dst_enc); This could be: (size == EVEX_64bit) ? prefixq(dst_enc) : prefix(dst_enc); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2073975889 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2073976654 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2073972698 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2074129665 From kvn at openjdk.org Mon May 5 22:10:12 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 5 May 2025 22:10:12 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 21:13:24 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - Fix win64 compile failures > > Signed-off-by: Ashutosh Mehra > - Fix AOTCodeFlags.java test > > Signed-off-by: Ashutosh Mehra > - Fix compile failure in minimal config > > Signed-off-by: Ashutosh Mehra > - Revert back changes that added AOTRuntimeConstants. > Ensure CompressedOops::base and CompressedKlssPointers::base does not > change in production run > > Signed-off-by: Ashutosh Mehra > - Fix merge conflicts > > Signed-off-by: Ashutosh Mehra > - Store/load AsmRemarks and DbgStrings in aot code cache > > Signed-off-by: Ashutosh Mehra > - Add missing external address in aarch64 > > Signed-off-by: Ashutosh Mehra > - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab 1% improvement ;^) ghost29:jdk_git2$ (perf stat -r 100 ./build/product/images/jdk/bin/java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:-AOTAdapterCaching -XX:-AOTStubCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed 0.0252305 +- 0.0000543 seconds time elapsed ( +- 0.22% ) ghost29:jdk_git2$ (perf stat -r 100 ./build/product/images/jdk/bin/java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:+AOTAdapterCaching -XX:-AOTStubCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed 0.0234828 +- 0.0000415 seconds time elapsed ( +- 0.18% ) ghost29:jdk_git2$ (perf stat -r 100 ./build/product/images/jdk/bin/java -XX:AOTCache=app.aotcache -XX:+UnlockDiagnosticVMOptions -XX:+AOTAdapterCaching -XX:+AOTStubCaching -cp hello.jar HelloWorld > /dev/null) 2>&1 | grep elapsed 0.0232267 +- 0.0000355 seconds time elapsed ( +- 0.15% ) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2852446885 From kvn at openjdk.org Mon May 5 23:30:23 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 5 May 2025 23:30:23 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 21:13:24 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - Fix win64 compile failures > > Signed-off-by: Ashutosh Mehra > - Fix AOTCodeFlags.java test > > Signed-off-by: Ashutosh Mehra > - Fix compile failure in minimal config > > Signed-off-by: Ashutosh Mehra > - Revert back changes that added AOTRuntimeConstants. > Ensure CompressedOops::base and CompressedKlssPointers::base does not > change in production run > > Signed-off-by: Ashutosh Mehra > - Fix merge conflicts > > Signed-off-by: Ashutosh Mehra > - Store/load AsmRemarks and DbgStrings in aot code cache > > Signed-off-by: Ashutosh Mehra > - Add missing external address in aarch64 > > Signed-off-by: Ashutosh Mehra > - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab Test crashes on linux-x64 with debug VM: % make test JTREG=AOT_JDK=true CONF=fastdebug TEST=compiler/c2/cr6865031/Test.java ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2852646752 From lmesnik at openjdk.org Mon May 5 23:49:33 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 5 May 2025 23:49:33 GMT Subject: RFR: 8356089: java/lang/IO/IO.java fails with -XX:+AOTClassLinking [v2] In-Reply-To: <2GJNl0usfhrC1avlAfFDump-GdWzUWNMRl971HhUKVA=.2b70302f-e134-4934-9be3-335ee97c00fc@github.com> References: <2GJNl0usfhrC1avlAfFDump-GdWzUWNMRl971HhUKVA=.2b70302f-e134-4934-9be3-335ee97c00fc@github.com> Message-ID: > The failing test is excluded. > No plan to fix, so no bugid is used. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: fixed name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25052/files - new: https://git.openjdk.org/jdk/pull/25052/files/52d1ede8..064369bd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25052&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25052&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25052.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25052/head:pull/25052 PR: https://git.openjdk.org/jdk/pull/25052 From kvn at openjdk.org Mon May 5 23:53:18 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 5 May 2025 23:53:18 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 21:13:24 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - Fix win64 compile failures > > Signed-off-by: Ashutosh Mehra > - Fix AOTCodeFlags.java test > > Signed-off-by: Ashutosh Mehra > - Fix compile failure in minimal config > > Signed-off-by: Ashutosh Mehra > - Revert back changes that added AOTRuntimeConstants. > Ensure CompressedOops::base and CompressedKlssPointers::base does not > change in production run > > Signed-off-by: Ashutosh Mehra > - Fix merge conflicts > > Signed-off-by: Ashutosh Mehra > - Store/load AsmRemarks and DbgStrings in aot code cache > > Signed-off-by: Ashutosh Mehra > - Add missing external address in aarch64 > > Signed-off-by: Ashutosh Mehra > - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab Test `runtime/cds/appcds/aotClassLinking/MethodHandleTest.java` crashed on linux-aarch64: # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x0000ffffb4df1f84, pid=3645763, tid=3645766 # # JRE version: Java(TM) SE Runtime Environment (25.0) (fastdebug build 25-internal-LTS-2025-05-05-2218263.vladimir.kozlov.jdkgit2) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 25-internal-LTS-2025-05-05-2218263.vladimir.kozlov.jdkgit2, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) # Problematic frame: # v ~RuntimeStub::Shared Runtime throw_NullPointerException_at_call_blob 0x0000ffff73dd2db0 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2852738551 From lmesnik at openjdk.org Mon May 5 23:56:31 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 5 May 2025 23:56:31 GMT Subject: RFR: 8356089: java/lang/IO/IO.java fails with -XX:+AOTClassLinking [v3] In-Reply-To: <2GJNl0usfhrC1avlAfFDump-GdWzUWNMRl971HhUKVA=.2b70302f-e134-4934-9be3-335ee97c00fc@github.com> References: <2GJNl0usfhrC1avlAfFDump-GdWzUWNMRl971HhUKVA=.2b70302f-e134-4934-9be3-335ee97c00fc@github.com> Message-ID: > The failing test is excluded. > No plan to fix, so no bugid is used. Leonid Mesnik has updated the pull request incrementally with two additional commits since the last revision: - year fixed - spaces updated ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25052/files - new: https://git.openjdk.org/jdk/pull/25052/files/064369bd..62096b16 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25052&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25052&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25052.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25052/head:pull/25052 PR: https://git.openjdk.org/jdk/pull/25052 From kvn at openjdk.org Mon May 5 23:57:17 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 5 May 2025 23:57:17 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 21:13:24 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - Fix win64 compile failures > > Signed-off-by: Ashutosh Mehra > - Fix AOTCodeFlags.java test > > Signed-off-by: Ashutosh Mehra > - Fix compile failure in minimal config > > Signed-off-by: Ashutosh Mehra > - Revert back changes that added AOTRuntimeConstants. > Ensure CompressedOops::base and CompressedKlssPointers::base does not > change in production run > > Signed-off-by: Ashutosh Mehra > - Fix merge conflicts > > Signed-off-by: Ashutosh Mehra > - Store/load AsmRemarks and DbgStrings in aot code cache > > Signed-off-by: Ashutosh Mehra > - Add missing external address in aarch64 > > Signed-off-by: Ashutosh Mehra > - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab I attached hs_err file to RFE in JBS. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2852759147 From lmesnik at openjdk.org Tue May 6 00:12:12 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 6 May 2025 00:12:12 GMT Subject: RFR: 8356089: java/lang/IO/IO.java fails with -XX:+AOTClassLinking [v4] In-Reply-To: <2GJNl0usfhrC1avlAfFDump-GdWzUWNMRl971HhUKVA=.2b70302f-e134-4934-9be3-335ee97c00fc@github.com> References: <2GJNl0usfhrC1avlAfFDump-GdWzUWNMRl971HhUKVA=.2b70302f-e134-4934-9be3-335ee97c00fc@github.com> Message-ID: > The failing test is excluded. > No plan to fix, so no bugid is used. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25052/files - new: https://git.openjdk.org/jdk/pull/25052/files/62096b16..47606397 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25052&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25052&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25052.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25052/head:pull/25052 PR: https://git.openjdk.org/jdk/pull/25052 From ccheung at openjdk.org Tue May 6 00:42:12 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Tue, 6 May 2025 00:42:12 GMT Subject: RFR: 8356089: java/lang/IO/IO.java fails with -XX:+AOTClassLinking [v4] In-Reply-To: References: <2GJNl0usfhrC1avlAfFDump-GdWzUWNMRl971HhUKVA=.2b70302f-e134-4934-9be3-335ee97c00fc@github.com> Message-ID: On Tue, 6 May 2025 00:12:12 GMT, Leonid Mesnik wrote: >> The failing test is excluded. >> No plan to fix, so no bugid is used. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > fix Hi Leonid, Can you also problem list one more hotspot test? The test failed due to the same reason. diff --git a/test/hotspot/jtreg/ProblemList-AotJdk.txt b/test/hotspot/jtreg/ProblemList-AotJdk.txt index 2528f8d377e..047fc6d33f8 100644 --- a/test/hotspot/jtreg/ProblemList-AotJdk.txt +++ b/test/hotspot/jtreg/ProblemList-AotJdk.txt @@ -3,6 +3,7 @@ runtime/NMT/NMTWithCDS.java 0000000 generic-all runtime/symbols/TestSharedArchiveConfigFile.java 0000000 generic-all gc/arguments/TestSerialHeapSizeFlags.java 0000000 generic-all +gc/arguments/TestCompressedClassFlags.java 0000000 generic-all gc/TestAllocateHeapAtMultiple.java 0000000 generic-all gc/TestAllocateHeapAt.java 0000000 generic-all ------------- PR Comment: https://git.openjdk.org/jdk/pull/25052#issuecomment-2852850728 From bulasevich at openjdk.org Tue May 6 01:28:29 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 6 May 2025 01:28:29 GMT Subject: RFR: 8355896: Lossy narrowing cast of JVMCINMethodData::size In-Reply-To: <_NK25ix3znr_ZqssJTXhXQ1-BqHNq_d3toJSGQ9S1mU=.a5279c90-4441-4a70-943e-c9c5f97d9252@github.com> References: <_NK25ix3znr_ZqssJTXhXQ1-BqHNq_d3toJSGQ9S1mU=.a5279c90-4441-4a70-943e-c9c5f97d9252@github.com> Message-ID: On Wed, 30 Apr 2025 13:10:19 GMT, Boris Ulasevich wrote: > In https://github.com/openjdk/jdk/pull/21276 mutable_data, which includes relocations, metadata, and jvmci_data, was moved to a separately malloc'ed blob. The nmethod (a CodeBlob) holds a pointer to the mutable_data blob and stores its internal offsets. > > As part of that change, I reused the former uint16_t offset field to store jvmci_data_size. This turned out to be incorrect, since jvmci_data can exceed 64 KB (as shown in https://github.com/openjdk/jdk/pull/24753). > > The most direct fix would be to change jvmci_data_size to uint, placing it alongside other int fields to avoid padding. However, in fact on my build this increases the size of the nmethod structure from 240 to 248 bytes, which I would prefer to avoid. > > Instead, I propose storing metadata_size in the existing uint16_t field. The average metadata_size is approximately 140 bytes, and the maximum observed in practice is around 4 KB. While, like oops_size, this value is not formally guaranteed to remain below 64 KB, no cases have been observed where this limit is exceeded. A GUARANTEE check is included to immediately catch any overflow if it ever occurs. Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24965#issuecomment-2852944587 From bulasevich at openjdk.org Tue May 6 01:28:30 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 6 May 2025 01:28:30 GMT Subject: Integrated: 8355896: Lossy narrowing cast of JVMCINMethodData::size In-Reply-To: <_NK25ix3znr_ZqssJTXhXQ1-BqHNq_d3toJSGQ9S1mU=.a5279c90-4441-4a70-943e-c9c5f97d9252@github.com> References: <_NK25ix3znr_ZqssJTXhXQ1-BqHNq_d3toJSGQ9S1mU=.a5279c90-4441-4a70-943e-c9c5f97d9252@github.com> Message-ID: On Wed, 30 Apr 2025 13:10:19 GMT, Boris Ulasevich wrote: > In https://github.com/openjdk/jdk/pull/21276 mutable_data, which includes relocations, metadata, and jvmci_data, was moved to a separately malloc'ed blob. The nmethod (a CodeBlob) holds a pointer to the mutable_data blob and stores its internal offsets. > > As part of that change, I reused the former uint16_t offset field to store jvmci_data_size. This turned out to be incorrect, since jvmci_data can exceed 64 KB (as shown in https://github.com/openjdk/jdk/pull/24753). > > The most direct fix would be to change jvmci_data_size to uint, placing it alongside other int fields to avoid padding. However, in fact on my build this increases the size of the nmethod structure from 240 to 248 bytes, which I would prefer to avoid. > > Instead, I propose storing metadata_size in the existing uint16_t field. The average metadata_size is approximately 140 bytes, and the maximum observed in practice is around 4 KB. While, like oops_size, this value is not formally guaranteed to remain below 64 KB, no cases have been observed where this limit is exceeded. A GUARANTEE check is included to immediately catch any overflow if it ever occurs. This pull request has now been integrated. Changeset: aea28371 Author: Boris Ulasevich URL: https://git.openjdk.org/jdk/commit/aea2837143289800cfbb7044de4f105e87e233ff Stats: 12 lines in 2 files changed: 4 ins; 0 del; 8 mod 8355896: Lossy narrowing cast of JVMCINMethodData::size Reviewed-by: kvn, dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/24965 From vlivanov at openjdk.org Tue May 6 01:33:20 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 6 May 2025 01:33:20 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v12] In-Reply-To: <2ioSQVtfXhnqvAXqiadwR1HuJsz3t9nytY0wRps-x68=.35220ade-0e70-41c6-9ebd-a271e7dcb2bb@github.com> References: <2ioSQVtfXhnqvAXqiadwR1HuJsz3t9nytY0wRps-x68=.35220ade-0e70-41c6-9ebd-a271e7dcb2bb@github.com> Message-ID: On Mon, 5 May 2025 04:06:02 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains two new commits since the last revision: > > - Updating comment > - Review comments resolutions It does look much better now. Thanks! Some comments/suggestions follow. src/hotspot/cpu/x86/vm_version_x86.cpp line 853: > 851: > 852: if (cpu_family() > 4) { // it supports CPUID > 853: _features = _cpuid_info.feature_flags(); // These can be changed by VM settings You don't need to change this code if you equip `VM_Features` with a copy constructor. src/hotspot/cpu/x86/vm_version_x86.cpp line 1102: > 1100: size_t buf_iter = cpu_info_size; > 1101: for (uint64_t i = 0; i < features_vector_size(); i++) { > 1102: insert_features_names(features_vector_elem(i), buf + buf_iter, sizeof(buf) - buf_iter, _features_names, 64 * i); `Abstract_VM_Version::insert_features_names` is used only on x86. You can move it to `vm_version_x86.cpp/.hpp` and adjust to new layout. src/hotspot/cpu/x86/vm_version_x86.hpp line 707: > 705: // > 706: static bool supports_cpuid() { return _features != 0; } > 707: static bool supports_cmov() { return (_features & CPU_CMOV) != 0; } Since you touch this code anyway, I suggest to use this opportunity to automatically derive this code using `CPU_FEATURE_FLAGS` macro. (As an example [1].) [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/vm_version_aarch64.hpp#L147 src/hotspot/cpu/x86/vm_version_x86.hpp line 753: > 751: // Feature identification which can be affected by VM settings > 752: // > 753: static bool supports_cpuid() { return Abstract_VM_Version::vm_features_exist(); } Is `VM_Features::_features_vector_size > 0` equivalent to `_features != 0`? I believe you can simply drop `supports_cpuid()`. x86-32 bit port is gone and even there `cpuid` support was mandatory. src/hotspot/share/runtime/abstract_vm_version.hpp line 51: > 49: class VM_Features { > 50: public: > 51: using FeatureVector = uint64_t [MAX_FEATURE_VEC_SIZE]; Why did you decide to declare new type name for fixed size array type? I see you use `FeatureVector` in `vmStructs*` and JVMCI code. Does it make things simpler there? src/hotspot/share/runtime/abstract_vm_version.hpp line 91: > 89: > 90: // CPU feature flags vector, can be affected by VM settings. > 91: static VM_Features _vm_target_features; Unless we plan to migrate all platforms all at once, I suggest to move this code into `VM_Version` and keep the same names (`_features` and `_cpu_features`). Ideally, `_features` field can be moved to from `Abstract_VM_Version` to platform-specific `VM_Version`s across all platforms. But leaving it as is for now is also fine with me. There's a precedent: `VM_Version` already overrides `_features` field on s390 [1]. `VM_Features` class can start as x86-specific, but for advertisement purposes it makes sense to keep it in `abstract_vm_version.hpp`. Alternatively, `Abstract_VM_Version::_features` can be converted from `uint64_t` to `VM_Features` and non-x86 platforms can be covered by providing overloads for currently used operators (it's mostly `|=`, `&=`, and `&`, plus convertions). [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/s390/vm_version_s390.hpp#L130 src/hotspot/share/runtime/abstract_vm_version.hpp line 97: > 95: > 96: static void sync_cpu_features() { > 97: memcpy(_cpu_target_features._features_vector, _vm_target_features._features_vector, Any particular reason to use `memcpy`/`memset` and not a loop over `_features_vector` array? I believe once you define default and copy constructors for `VM_Features`, `sync_cpu_features()` and `clear_cpu_features()` won't be needed anymore. src/hotspot/share/runtime/abstract_vm_version.hpp line 183: > 181: static const char* printable_jdk_debug_level(); > 182: > 183: static uint64_t features() { Not used. Drop it. src/hotspot/share/runtime/init.cpp line 68: > 66: void codeCache_init(); > 67: void VM_Version_init(); > 68: void VM_Version_pre_init(); Redundant declaration. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/amd64/AMD64HotSpotVMConfig.java line 94: > 92: final long amd64CET_IBT = getConstant("VM_Version::CPU_CET_IBT", Long.class); > 93: final long amd64CET_SS = getConstant("VM_Version::CPU_CET_SS", Long.class); > 94: final long avx10_1 = getConstant("VM_Version::CPU_AVX10_1", Long.class); Leave them as is. @mur47x111 plans to remove them [1]. [1] https://github.com/openjdk/jdk/pull/24329#issuecomment-2838223030 ------------- PR Review: https://git.openjdk.org/jdk/pull/24329#pullrequestreview-2815634822 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2074470895 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2074469800 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2074484317 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2074481382 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2074502713 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2074479165 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2074496719 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2074480203 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2073919224 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2074519224 From vlivanov at openjdk.org Tue May 6 01:33:21 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 6 May 2025 01:33:21 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v12] In-Reply-To: <9d9DVuqRAeb_8kiEwkPQH6g2eBU5Jc_5ZSBAi1in9X0=.1d955598-f466-46ff-8b1f-71c87abd6313@github.com> References: <9d9DVuqRAeb_8kiEwkPQH6g2eBU5Jc_5ZSBAi1in9X0=.1d955598-f466-46ff-8b1f-71c87abd6313@github.com> Message-ID: On Mon, 5 May 2025 03:54:24 GMT, Jatin Bhateja wrote: >> I prefer explicit accessor calls on corresponding instance fields. >> >> It's confusing to see `VM_Version::CpuidInfo::feature_flags()` implicitly modifying `_dynamic_features_vector` through macros. > > I have changed this local rountine name to install_feature_flags to confirm to its semantics It's still counter-intuitive to see `VM_Version::CpuidInfo` implicitly initializes a field in `Abstract_VM_Version` class. I prefer original code shape. Any problems with the following code shape? VM_Features VM_Version::CpuidInfo::feature_flags() const { VM_Features result; if (std_cpuid1_edx.bits.cmpxchg8 != 0) { result.set_feature(CPU_CX8); } ... return result; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2074474099 From dzhang at openjdk.org Tue May 6 01:39:26 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 6 May 2025 01:39:26 GMT Subject: RFR: 8356188: RISC-V: Cleanup effect of vmaskcmp_fp Message-ID: Hi all, Please take a look and review this PR, thanks! See "The RISC-V Instruction Set Manual" at https://riscv.org/technical/specifications/. In the Vector Floating-Point Compare Instructions section: The destination mask vector register may be the same as the source vector mask register (v0). Also, the integer form of `vmaskcmp` has no effect too, so remove the effect of vmaskcmp_fp. ### Testing qemu-system 9.1.0 with UseRVV (ubuntu24.10): * [x] Run test/jdk/jdk/incubator/vector (fastdebug) ------------- Commit messages: - 8356188: RISC-V: Cleanup effect of vmaskcmp_fp Changes: https://git.openjdk.org/jdk/pull/25055/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25055&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356188 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25055.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25055/head:pull/25055 PR: https://git.openjdk.org/jdk/pull/25055 From lmesnik at openjdk.org Tue May 6 01:44:29 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 6 May 2025 01:44:29 GMT Subject: RFR: 8356089: java/lang/IO/IO.java fails with -XX:+AOTClassLinking [v5] In-Reply-To: <2GJNl0usfhrC1avlAfFDump-GdWzUWNMRl971HhUKVA=.2b70302f-e134-4934-9be3-335ee97c00fc@github.com> References: <2GJNl0usfhrC1avlAfFDump-GdWzUWNMRl971HhUKVA=.2b70302f-e134-4934-9be3-335ee97c00fc@github.com> Message-ID: > The failing test is excluded. > No plan to fix, so no bugid is used. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: added test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25052/files - new: https://git.openjdk.org/jdk/pull/25052/files/47606397..2b4deef8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25052&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25052&range=03-04 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25052.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25052/head:pull/25052 PR: https://git.openjdk.org/jdk/pull/25052 From epavlova at openjdk.org Tue May 6 02:02:21 2025 From: epavlova at openjdk.org (Ekaterina Pavlova) Date: Tue, 6 May 2025 02:02:21 GMT Subject: RFR: 8356089: java/lang/IO/IO.java fails with -XX:+AOTClassLinking [v5] In-Reply-To: References: <2GJNl0usfhrC1avlAfFDump-GdWzUWNMRl971HhUKVA=.2b70302f-e134-4934-9be3-335ee97c00fc@github.com> Message-ID: <8ZXxjVM5_Z8enD3PifiITpHdHtbeBo0PEHLboO4DoXk=.2fe46c82-f4f4-48d7-b68b-b7293aedb75c@github.com> On Tue, 6 May 2025 01:44:29 GMT, Leonid Mesnik wrote: >> The failing test is excluded. >> No plan to fix, so no bugid is used. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > added test Thanks for integrating the changes. ------------- Marked as reviewed by epavlova (Committer). PR Review: https://git.openjdk.org/jdk/pull/25052#pullrequestreview-2816656366 From iklam at openjdk.org Tue May 6 02:02:21 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 6 May 2025 02:02:21 GMT Subject: RFR: 8356089: java/lang/IO/IO.java fails with -XX:+AOTClassLinking [v5] In-Reply-To: References: <2GJNl0usfhrC1avlAfFDump-GdWzUWNMRl971HhUKVA=.2b70302f-e134-4934-9be3-335ee97c00fc@github.com> Message-ID: On Tue, 6 May 2025 01:44:29 GMT, Leonid Mesnik wrote: >> The failing test is excluded. >> No plan to fix, so no bugid is used. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > added test LGTM and can be considered as trivial change. ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25052#pullrequestreview-2816657202 From lmesnik at openjdk.org Tue May 6 02:02:22 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 6 May 2025 02:02:22 GMT Subject: Integrated: 8356089: java/lang/IO/IO.java fails with -XX:+AOTClassLinking In-Reply-To: <2GJNl0usfhrC1avlAfFDump-GdWzUWNMRl971HhUKVA=.2b70302f-e134-4934-9be3-335ee97c00fc@github.com> References: <2GJNl0usfhrC1avlAfFDump-GdWzUWNMRl971HhUKVA=.2b70302f-e134-4934-9be3-335ee97c00fc@github.com> Message-ID: On Mon, 5 May 2025 23:24:21 GMT, Leonid Mesnik wrote: > The failing test is excluded. > No plan to fix, so no bugid is used. This pull request has now been integrated. Changeset: 64b58f6a Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/64b58f6a54c1197002527bdb6ba7b48283dc634e Stats: 34 lines in 2 files changed: 34 ins; 0 del; 0 mod 8356089: java/lang/IO/IO.java fails with -XX:+AOTClassLinking Reviewed-by: epavlova, iklam ------------- PR: https://git.openjdk.org/jdk/pull/25052 From fyang at openjdk.org Tue May 6 02:13:12 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 6 May 2025 02:13:12 GMT Subject: RFR: 8356188: RISC-V: Cleanup effect of vmaskcmp_fp In-Reply-To: References: Message-ID: On Tue, 6 May 2025 01:34:33 GMT, Dingli Zhang wrote: > Hi all, > Please take a look and review this PR, thanks! > > See "The RISC-V Instruction Set Manual" at https://riscv.org/technical/specifications/. In the Vector Floating-Point Compare Instructions section: > > The destination mask vector register may be the same as the source vector mask register (v0). > > Also, the integer form of `vmaskcmp` has no effect too, so remove the effect of vmaskcmp_fp. > > ### Testing > qemu-system 9.1.0 with UseRVV (ubuntu24.10): > * [x] Run test/jdk/jdk/incubator/vector (fastdebug) Looks fine. Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25055#pullrequestreview-2816685869 From dzhang at openjdk.org Tue May 6 02:42:32 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 6 May 2025 02:42:32 GMT Subject: RFR: 8356188: RISC-V: Cleanup effect of vmaskcmp_fp [v2] In-Reply-To: References: Message-ID: > Hi all, > Please take a look and review this PR, thanks! > > See "The RISC-V Instruction Set Manual" at https://riscv.org/technical/specifications/. In the Vector Floating-Point Compare Instructions section: > > The destination mask vector register may be the same as the source vector mask register (v0). > > Also, the integer form of `vmaskcmp` has no effect too, so remove the effect of vmaskcmp_fp. > > ### Testing > qemu-system 9.1.0 with UseRVV (ubuntu24.10): > * [x] Run test/jdk/jdk/incubator/vector (fastdebug) Dingli Zhang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into JDK-8356188 - 8356188: RISC-V: Cleanup effect of vmaskcmp_fp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25055/files - new: https://git.openjdk.org/jdk/pull/25055/files/89ff0627..008455a7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25055&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25055&range=00-01 Stats: 35 lines in 3 files changed: 34 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25055.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25055/head:pull/25055 PR: https://git.openjdk.org/jdk/pull/25055 From gcao at openjdk.org Tue May 6 02:52:13 2025 From: gcao at openjdk.org (Gui Cao) Date: Tue, 6 May 2025 02:52:13 GMT Subject: RFR: 8356188: RISC-V: Cleanup effect of vmaskcmp_fp [v2] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 02:42:32 GMT, Dingli Zhang wrote: >> Hi all, >> Please take a look and review this PR, thanks! >> >> See "The RISC-V Instruction Set Manual" at https://riscv.org/technical/specifications/. In the Vector Floating-Point Compare Instructions section: >> >> The destination mask vector register may be the same as the source vector mask register (v0). >> >> Also, the integer form of `vmaskcmp` has no effect too, so remove the effect of vmaskcmp_fp. >> >> ### Testing >> qemu-system 9.1.0 with UseRVV (ubuntu24.10): >> * [x] Run test/jdk/jdk/incubator/vector (fastdebug) > > Dingli Zhang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8356188 > - 8356188: RISC-V: Cleanup effect of vmaskcmp_fp Looks good to me. ------------- Marked as reviewed by gcao (Author). PR Review: https://git.openjdk.org/jdk/pull/25055#pullrequestreview-2816731409 From epeter at openjdk.org Tue May 6 05:54:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 May 2025 05:54:17 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v5] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 14:22:57 GMT, Jatin Bhateja wrote: >>> Let me tune this check and update the test. >> >> For me to approve this code, you will have to do more than that. I will need: >> - Proof of the implemented logic. >> - More tests. > > Hi @eme64 , can you kindly run this latest version through your testing, please. @jatin-bhateja I think I'd rather wait until you have more thorough testing and the proofs I asked for, otherwise I would need to run the testing twice ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2853354513 From epeter at openjdk.org Tue May 6 06:04:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 May 2025 06:04:15 GMT Subject: RFR: 8354473: Incorrect results for compress/expand tests with -XX:+EnableX86ECoreOpts In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 03:54:09 GMT, Volodymyr Paprotski wrote: > It looks like the `permv` mask isnt always 'all-ones' or 'all-zeroes'. (Which is OK for real blend, but needs to be enforced via the flag for blend emulation) > > Before the fix, `make test TEST="jdk/incubator/vector"` (on ECore machine) > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP >>> jtreg:test/jdk/jdk/incubator/vector 83 71 10 0 2 << > ============================== > TEST FAILURE > > After the fix: > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP > jtreg:test/jdk/jdk/incubator/vector 83 81 0 0 2 > ============================== > TEST SUCCESS > > And on an AVX512 machine: > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP > jtreg:test/jdk/jdk/incubator/vector 83 81 0 0 2 > ============================== > TEST SUCCESS Looks reasonable to me. I did run our testing, so it won't break out CI. But I think we are not yet running our testing with `-XX:+EnableX86ECoreOpts`, so no guarantees there ;) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24645#pullrequestreview-2816950724 From epeter at openjdk.org Tue May 6 06:22:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 May 2025 06:22:16 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v12] In-Reply-To: References: Message-ID: <8f0ksVvO5Bes8UDkNLkRnG8IvXYIAvAGncWwbQSs8S4=.5417ecc8-b257-4df8-953a-f6d711e26534@github.com> On Thu, 3 Apr 2025 08:22:26 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> For Christian > > That looks good to me, thanks for bearing with me! @chhagedorn @TobiHartmann Thanks for the reviews :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24224#issuecomment-2853401862 From chagedorn at openjdk.org Tue May 6 06:29:28 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 May 2025 06:29:28 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v15] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 13:06:36 GMT, Emanuel Peter wrote: >> We should extend the functionality of Verify.checkEQ: >> - Allow different NaN encodings to be seen as equal (by default). >> - Compare VectorAPI vectors. >> - Compare Exceptions, and their messages. >> - Compare arbitrary Objects via Reflection. >> >> Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: > > - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects > - Apply Tobias' review suggestions > > Co-authored-by: Tobias Hartmann > - Apply suggestions from code review > > Co-authored-by: Andrey Turbanov > - For Christian > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - fix whitespace issues > - Updates for Christian > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - refactor with checkEQWithRawBits > - upate copyright > - ... and 7 more: https://git.openjdk.org/jdk/compare/d24a8877...23a22389 Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24224#pullrequestreview-2816991583 From epeter at openjdk.org Tue May 6 06:29:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 May 2025 06:29:28 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v15] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 06:24:13 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 17 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects >> - Apply Tobias' review suggestions >> >> Co-authored-by: Tobias Hartmann >> - Apply suggestions from code review >> >> Co-authored-by: Andrey Turbanov >> - For Christian >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - fix whitespace issues >> - Updates for Christian >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - refactor with checkEQWithRawBits >> - upate copyright >> - ... and 7 more: https://git.openjdk.org/jdk/compare/d24a8877...23a22389 > > Marked as reviewed by chagedorn (Reviewer). @chhagedorn Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24224#issuecomment-2853412641 From epeter at openjdk.org Tue May 6 06:29:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 May 2025 06:29:29 GMT Subject: Integrated: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 11:15:58 GMT, Emanuel Peter wrote: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. This pull request has now been integrated. Changeset: 9f8fbf29 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/9f8fbf292278d995c9fa112d8f97b2375f619537 Stats: 684 lines in 3 files changed: 581 ins; 2 del; 101 mod 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24224 From iveresov at openjdk.org Tue May 6 06:31:43 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 6 May 2025 06:31:43 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v13] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 43 commits: - Merge branch 'master' into pp2 - Fix compile - Fix additional issues - Make sure command line flags that affect MDO layout are consistent - Fix semantics change from the previous commit - Port 8355915: [leyden] Crash in MDO clearing the unloaded array type - Fix flag behavior - Fix log tags - Remove the proxy class counter - Address review comments part 2 - ... and 33 more: https://git.openjdk.org/jdk/compare/e09d2e27...7d22a42a ------------- Changes: https://git.openjdk.org/jdk/pull/24886/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=12 Stats: 3231 lines in 60 files changed: 3011 ins; 103 del; 117 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From rcastanedalo at openjdk.org Tue May 6 07:22:17 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 6 May 2025 07:22:17 GMT Subject: RFR: 8356172: IdealGraphPrinter doesn't need ThreadCritical In-Reply-To: References: Message-ID: On Mon, 5 May 2025 18:18:12 GMT, Coleen Phillimore wrote: > Is it trivial? I am not confident enough to declare it trivial, mostly because the code is not exercised by any regression test (I tested the change manually and it seemed fine). It would be good to have a second review. Perhaps from @dean-long, or @chhagedorn who was the last one to touch this code? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25035#issuecomment-2853519389 From mchevalier at openjdk.org Tue May 6 07:46:14 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 6 May 2025 07:46:14 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 13:18:33 GMT, Marc Chevalier wrote: > A first part toward a better support of pure functions. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! > > IR framework and IGV needed a little bit of fixing. > > Thanks, > Marc Thanks for the comment. I'll think deeper about it. I've started by trying to make PureCall a subclass of Call (or a property of LeafCall) but that broke a lot of things that were using some invariants on CallNode that weren't holding anymore. After a some time tracking bugs and trying to fix, I thought it would be simpler to have a new kind of node, and it would have less impact on existing code. Another reason I've changed it to a direct sub-class of Node is that I felt it made little sense to be a Call (or sub-class of) since Calls are Safepoint, but pure calls don't need to be (and similar "conceptual" problems). It seemed like a hack to me. About > support arbitrary nodes to be lowered into leaf runtime calls. I don't think I understand what you mean. Overall, I see the weaknesses of my design, but I'm not sure which direction to take instead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2853576338 From duke at openjdk.org Tue May 6 07:58:18 2025 From: duke at openjdk.org (snake66) Date: Tue, 6 May 2025 07:58:18 GMT Subject: RFR: 8356182: Build fails on aarch64 without ZGC In-Reply-To: References: Message-ID: <3vJdr1hZW4puC-Ow5d6rAEjDIrdJykt-M3OZSIfWhdU=.ce3df94b-6cff-4348-803a-38ff895862f0@github.com> On Mon, 5 May 2025 13:24:51 GMT, snake66 wrote: > jvmciCodeInstaller_aarch64.cpp references symbols defined by the ZGC unconditionally, causing the build to fail when ZGC is not included. > > This work is sponsored by The FreeBSD FOundation If anybody would be willing to sponsor this trivial change, it'd be much appreciated. Thanks in advance! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25039#issuecomment-2853606562 From rcastanedalo at openjdk.org Tue May 6 08:16:22 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 6 May 2025 08:16:22 GMT Subject: RFR: 8354520: IGV: dump contextual information [v6] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 11:37:04 GMT, Roberto Casta?eda Lozano wrote: >> This changeset extends the IGV graph dumps with additional properties that ease tracing the dumps back to the context in which they were produced. The changeset dumps, for every compilation, the following additional properties: >> >> - JVM arguments >> - platform information >> - JVM version information >> - date and time >> - process ID >> - (compiler) thread ID >> >> ![compilation-properties](https://github.com/user-attachments/assets/8ddc8fb9-c348-4761-8e19-c70633a1b59f) >> >> Additionally, the changeset produces and dumps the C2 stack trace from which each graph is dumped: >> >> ![c2-stack-trace](https://github.com/user-attachments/assets/085547ee-b0b3-4a38-86f1-9df79cf1cc01) >> >> This should be particularly useful in an interactive context, where the user steps through C2 code using a debugger and dumps graphs at different points. To produce a stack trace in this context, the usual debugger-entry C2 functions (`igv_print`, `igv_append`, `Node::dump_bfs`, ...) are extended with extra arguments to specify the stack handling registers (stack pointer, frame pointer, and program counter): >> >> ![c2-stack-trace-from-gdb](https://github.com/user-attachments/assets/29de2964-ee2d-4f5f-bcf7-d81e1bc6c8a6) >> >> The inconvenience of manually specifying the stack handling registers can be addressed by hiding them in debugger user-defined commands, e.g.: >> >> >> define igv >> p igv_print(true, $sp, $fp, $pc) >> end >> >> define igv_node >> p find_node($arg0)->dump_bfs(0, 0, "!", $sp, $fp, $pc) >> end >> >> >> Thanks to @TobiHartmann for providing useful feedback! >> >> #### Testing >> >> - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). >> - Tested interactive usage manually via `gdb` and `rr` on linux-x64. >> - Tested automatically that dumping thousands of graphs does not trigger any assertion failure. > > Roberto Casta?eda Lozano has updated the pull request incrementally with seven additional commits since the last revision: > > - Dump CPU features > - Make frame pointer parameters const whenever possible > - Pass pointer to initial frame to print_stack > - Refactor loop > - Inline _current into its only use > - Improve naming and commenting of stack-walking predicates > - Extend comments with debugger usage examples Manuel, Damon, and Emanuel: thanks again for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24724#issuecomment-2853656909 From rcastanedalo at openjdk.org Tue May 6 08:19:32 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 6 May 2025 08:19:32 GMT Subject: Integrated: 8354520: IGV: dump contextual information In-Reply-To: References: Message-ID: <6ndyb9shWuOVG9SUKAdByZ94ui0OGxrRHu1a9K6XPRc=.930911c0-8875-4a4c-a83e-aa2421d0744a@github.com> On Thu, 17 Apr 2025 13:05:03 GMT, Roberto Casta?eda Lozano wrote: > This changeset extends the IGV graph dumps with additional properties that ease tracing the dumps back to the context in which they were produced. The changeset dumps, for every compilation, the following additional properties: > > - JVM arguments > - platform information > - JVM version information > - date and time > - process ID > - (compiler) thread ID > > ![compilation-properties](https://github.com/user-attachments/assets/8ddc8fb9-c348-4761-8e19-c70633a1b59f) > > Additionally, the changeset produces and dumps the C2 stack trace from which each graph is dumped: > > ![c2-stack-trace](https://github.com/user-attachments/assets/085547ee-b0b3-4a38-86f1-9df79cf1cc01) > > This should be particularly useful in an interactive context, where the user steps through C2 code using a debugger and dumps graphs at different points. To produce a stack trace in this context, the usual debugger-entry C2 functions (`igv_print`, `igv_append`, `Node::dump_bfs`, ...) are extended with extra arguments to specify the stack handling registers (stack pointer, frame pointer, and program counter): > > ![c2-stack-trace-from-gdb](https://github.com/user-attachments/assets/29de2964-ee2d-4f5f-bcf7-d81e1bc6c8a6) > > The inconvenience of manually specifying the stack handling registers can be addressed by hiding them in debugger user-defined commands, e.g.: > > > define igv > p igv_print(true, $sp, $fp, $pc) > end > > define igv_node > p find_node($arg0)->dump_bfs(0, 0, "!", $sp, $fp, $pc) > end > > > Thanks to @TobiHartmann for providing useful feedback! > > #### Testing > > - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). > - Tested interactive usage manually via `gdb` and `rr` on linux-x64. > - Tested automatically that dumping thousands of graphs does not trigger any assertion failure. This pull request has now been integrated. Changeset: def907ab Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/def907ab89f3e5593aef17dcc61807e2836d41ae Stats: 241 lines in 7 files changed: 203 ins; 0 del; 38 mod 8354520: IGV: dump contextual information Reviewed-by: epeter, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/24724 From duke at openjdk.org Tue May 6 08:22:22 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 6 May 2025 08:22:22 GMT Subject: RFR: 8258229: Crash in nmethod::reloc_string_for [v2] In-Reply-To: References: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> Message-ID: On Tue, 29 Apr 2025 12:56:07 GMT, Manuel H?ssig wrote: >> ## Issue Summary >> >> The issue manifests in intermittent failures of test cases with `-XX:+PrintAssembly`. The reason for these intermittent failures is a deoptimization of the method before or during printing its assembly. In case that deoptimization makes the method not entrant, then the entry of that method is patched, but the relocation information is not updated. If the instruction at the method entry before patching had relocation info that prints a comment during assembly printing, printing that comment for the patched entry fails in case the operands of the original and patched instructions do not match. >> >> ## Change Summary >> >> To fix this issue, this PR updates the relocation info when patching the method entry. To avoid any races between printing and deoptimizing, this PR acquires the`NMethodState_lock`for printing an `nmethod`. >> >> All changes of this PR summarized: >> - add a regression test, >> - update the relocation information after patching the method entry for making it not entrant, >> - acquire the `NMethodStat_lock` in `print_nmethod()` to avoid changing the relocation information during printing. >> >> ## Testing >> >> I ran tiers 1 through 3 and Oracle internal testing. > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into jdk-8258229-nmethod > - Add DeoptimizeALot and fix typo in test > - Hold NMethodState_lock while printing an nmethod > > This prevents data races on the relocation info when code is patched. > - Update relocation info when making method not entrant > - Add regression test Thank you for the reviews, everyone! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24831#issuecomment-2853669693 From duke at openjdk.org Tue May 6 08:22:22 2025 From: duke at openjdk.org (duke) Date: Tue, 6 May 2025 08:22:22 GMT Subject: RFR: 8258229: Crash in nmethod::reloc_string_for [v2] In-Reply-To: References: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> Message-ID: On Tue, 29 Apr 2025 12:56:07 GMT, Manuel H?ssig wrote: >> ## Issue Summary >> >> The issue manifests in intermittent failures of test cases with `-XX:+PrintAssembly`. The reason for these intermittent failures is a deoptimization of the method before or during printing its assembly. In case that deoptimization makes the method not entrant, then the entry of that method is patched, but the relocation information is not updated. If the instruction at the method entry before patching had relocation info that prints a comment during assembly printing, printing that comment for the patched entry fails in case the operands of the original and patched instructions do not match. >> >> ## Change Summary >> >> To fix this issue, this PR updates the relocation info when patching the method entry. To avoid any races between printing and deoptimizing, this PR acquires the`NMethodState_lock`for printing an `nmethod`. >> >> All changes of this PR summarized: >> - add a regression test, >> - update the relocation information after patching the method entry for making it not entrant, >> - acquire the `NMethodStat_lock` in `print_nmethod()` to avoid changing the relocation information during printing. >> >> ## Testing >> >> I ran tiers 1 through 3 and Oracle internal testing. > > Manuel H?ssig has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into jdk-8258229-nmethod > - Add DeoptimizeALot and fix typo in test > - Hold NMethodState_lock while printing an nmethod > > This prevents data races on the relocation info when code is patched. > - Update relocation info when making method not entrant > - Add regression test @mhaessig Your change (at version 956f45a624789d415d04da246b154e421e54d568) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24831#issuecomment-2853672451 From duke at openjdk.org Tue May 6 08:31:18 2025 From: duke at openjdk.org (snake66) Date: Tue, 6 May 2025 08:31:18 GMT Subject: Integrated: 8356182: Build fails on aarch64 without ZGC In-Reply-To: References: Message-ID: On Mon, 5 May 2025 13:24:51 GMT, snake66 wrote: > jvmciCodeInstaller_aarch64.cpp references symbols defined by the ZGC unconditionally, causing the build to fail when ZGC is not included. > > This work is sponsored by The FreeBSD FOundation This pull request has now been integrated. Changeset: 8c4f2ff2 Author: Harald Eilertsen Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/8c4f2ff21e21b158c333b3d36fcf323f68f4d187 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8356182: Build fails on aarch64 without ZGC This work was sponsored by The FreeBSD Foundation Reviewed-by: stefank, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/25039 From chagedorn at openjdk.org Tue May 6 08:34:13 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 May 2025 08:34:13 GMT Subject: RFR: 8356172: IdealGraphPrinter doesn't need ThreadCritical In-Reply-To: References: Message-ID: <1T8DqY2Rr13uSyqY09pPUruJ-YA6ZyyBddvo0SuiYdA=.33d18749-7dd5-466f-80b3-1230ca08f1d3@github.com> On Mon, 5 May 2025 12:27:22 GMT, Coleen Phillimore wrote: > Please review this possibly trivial change. > Tested with hs-precheckin-comp test list. I once just moved the code around. But the `ThreadCritical` seems to be here since the initial load. So, I'm not sure if the exact reason is still known. But it sounds reasonable to remove the `ThreadCritical`. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25035#pullrequestreview-2817361509 From jbhateja at openjdk.org Tue May 6 08:49:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 08:49:57 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v13] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/7b414b8c..b25cc776 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=11-12 Stats: 441 lines in 9 files changed: 106 ins; 107 del; 228 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From jbhateja at openjdk.org Tue May 6 08:49:58 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 08:49:58 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v12] In-Reply-To: References: <2ioSQVtfXhnqvAXqiadwR1HuJsz3t9nytY0wRps-x68=.35220ade-0e70-41c6-9ebd-a271e7dcb2bb@github.com> Message-ID: On Tue, 6 May 2025 00:30:23 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains two new commits since the last revision: >> >> - Updating comment >> - Review comments resolutions > > src/hotspot/cpu/x86/vm_version_x86.cpp line 1102: > >> 1100: size_t buf_iter = cpu_info_size; >> 1101: for (uint64_t i = 0; i < features_vector_size(); i++) { >> 1102: insert_features_names(features_vector_elem(i), buf + buf_iter, sizeof(buf) - buf_iter, _features_names, 64 * i); > > `Abstract_VM_Version::insert_features_names` is used only on x86. You can move it to `vm_version_x86.cpp/.hpp` and adjust to new layout. DONE > src/hotspot/share/runtime/abstract_vm_version.hpp line 51: > >> 49: class VM_Features { >> 50: public: >> 51: using FeatureVector = uint64_t [MAX_FEATURE_VEC_SIZE]; > > Why did you decide to declare new type name for fixed size array type? I see you use `FeatureVector` in `vmStructs*` and JVMCI code. Does it make things simpler there? Yes. I was facing compilation issues with raw array types. > src/hotspot/share/runtime/abstract_vm_version.hpp line 91: > >> 89: >> 90: // CPU feature flags vector, can be affected by VM settings. >> 91: static VM_Features _vm_target_features; > > Unless we plan to migrate all platforms all at once, I suggest to move this code into `VM_Version` and keep the same names (`_features` and `_cpu_features`). Ideally, `_features` field can be moved to from `Abstract_VM_Version` to platform-specific `VM_Version`s across all platforms. But leaving it as is for now is also fine with me. > > There's a precedent: `VM_Version` already overrides `_features` field on s390 [1]. > > `VM_Features` class can start as x86-specific, but for advertisement purposes it makes sense to keep it in `abstract_vm_version.hpp`. > > Alternatively, `Abstract_VM_Version::_features` can be converted from `uint64_t` to `VM_Features` and non-x86 platforms can be covered by providing overloads for currently used operators (it's mostly `|=`, `&=`, and `&`, plus convertions). > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/s390/vm_version_s390.hpp#L130 Moved VM_Features to VM_Version. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2075014479 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2075012045 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2075015126 From jbhateja at openjdk.org Tue May 6 08:49:58 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 08:49:58 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v13] In-Reply-To: References: <2ioSQVtfXhnqvAXqiadwR1HuJsz3t9nytY0wRps-x68=.35220ade-0e70-41c6-9ebd-a271e7dcb2bb@github.com> Message-ID: On Tue, 6 May 2025 00:57:29 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > src/hotspot/cpu/x86/vm_version_x86.hpp line 707: > >> 705: // >> 706: static bool supports_cpuid() { return _features != 0; } >> 707: static bool supports_cmov() { return (_features & CPU_CMOV) != 0; } > > Since you touch this code anyway, I suggest to use this opportunity to automatically derive this code using `CPU_FEATURE_FLAGS` macro. (As an example [1].) > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/vm_version_aarch64.hpp#L147 Unlike AARCH64, there is not a 1:1 mapping b/w CPU_* features and the corresponding support checkers; some AVX512 checkers use multiple features. Skipping this for now for consistency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2075012161 From jbhateja at openjdk.org Tue May 6 08:56:21 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 08:56:21 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v5] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 14:22:57 GMT, Jatin Bhateja wrote: >>> Let me tune this check and update the test. >> >> For me to approve this code, you will have to do more than that. I will need: >> - Proof of the implemented logic. >> - More tests. > > Hi @eme64 , can you kindly run this latest version through your testing, please. > @jatin-bhateja I think I'd rather wait until you have more thorough testing and the proofs I asked for, otherwise I would need to run the testing twice ;) I have added comments in the code which give sufficient details, let me know if you still need more explanation ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2853769295 From jbhateja at openjdk.org Tue May 6 08:57:18 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 08:57:18 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: <3t1R35B9bafRtfvqfE7D2dAeLrjaDukXlDUGb-3VtaA=.46d64318-e9fb-4bf3-8a68-8dba2c2b7b26@github.com> References: <3t1R35B9bafRtfvqfE7D2dAeLrjaDukXlDUGb-3VtaA=.46d64318-e9fb-4bf3-8a68-8dba2c2b7b26@github.com> Message-ID: On Sat, 3 May 2025 08:13:11 GMT, Vladimir Ivanov wrote: >>> Ok, thanks! I wasn't sure you finished the pass. >>> >>> I'm still seeing dynamic memory allocation which IMO unnecessarily complicates the implementation. Bitmap size is fixed and well-known at compile time. It enables `VM_Feature` class to embed the array of proper size inline. And it eliminates all the problems related to undesired sharing of backed array. (Also, `pre_initialize()` is not needed as well.) >> >> Bitmap size depends on the maximum feature enum value, I made it dynamic to keep it flexible. Do you want the feature vector size to be made constant and manually bump it when we exhaust the limit? > >> Bitmap size depends on the maximum feature enum value, I made it dynamic to keep it flexible. Do you want the feature vector size to be made constant and manually bump it when we exhaust the limit? > > Yes, please. (The limit may be precise - number of elements in Feature_Flag enum - but the logic which computes the size of backing array can automatically round it and bump the size once the actual limit is reached.) > >> pre_initialize was put in place because codeCache_init() proceeds VM_Version_init() > > I wanted to say that the sole purpose of `pre_initialize` is to allocate memory. Once it goes away, there's no reason to keep it. Hi @iwanowww , your comments have been addressed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24329#issuecomment-2853772762 From qamai at openjdk.org Tue May 6 09:02:18 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 6 May 2025 09:02:18 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v7] In-Reply-To: <7sL3-TEh2o6nT6GvvjYUpQfBbqbzeXgrJST9JeAcjLc=.df22b69d-6087-44c6-883d-e0604b92a44d@github.com> References: <7sL3-TEh2o6nT6GvvjYUpQfBbqbzeXgrJST9JeAcjLc=.df22b69d-6087-44c6-883d-e0604b92a44d@github.com> Message-ID: On Mon, 5 May 2025 14:21:07 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. >> >> Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. >> >> New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding additional test src/hotspot/share/opto/intrinsicnode.cpp line 302: > 300: // res.hi = MIN(res.hi, (1L << result_bit_width) - 1) > 301: hi = src_type->hi_as_long() >= 0 ? src_type->hi_as_long() : hi; > 302: hi = result_bit_width < mask_bit_width ? MIN2((jlong)((1L << result_bit_width) - 1L), hi) : hi; Note that if `result_bit_width == 63`, this computation will do `min_jlong - 1` which is UB. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2075035508 From duke at openjdk.org Tue May 6 09:05:21 2025 From: duke at openjdk.org (snake66) Date: Tue, 6 May 2025 09:05:21 GMT Subject: RFR: 8356182: Build fails on aarch64 without ZGC In-Reply-To: <1EsTsRhVHlC5lsFXRZXBGff4nOPUj-hyAE6VivJpw3w=.4fcfc225-2bda-403b-bf26-6d01347d5e38@github.com> References: <1EsTsRhVHlC5lsFXRZXBGff4nOPUj-hyAE6VivJpw3w=.4fcfc225-2bda-403b-bf26-6d01347d5e38@github.com> Message-ID: On Mon, 5 May 2025 14:24:12 GMT, Christian Hagedorn wrote: >> jvmciCodeInstaller_aarch64.cpp references symbols defined by the ZGC unconditionally, causing the build to fail when ZGC is not included. >> >> This work is sponsored by The FreeBSD FOundation > > Looks good to me, too. @chhagedorn Thank you! :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25039#issuecomment-2853792884 From duke at openjdk.org Tue May 6 09:08:24 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 6 May 2025 09:08:24 GMT Subject: Integrated: 8258229: Crash in nmethod::reloc_string_for In-Reply-To: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> References: <6wxhOTq8-vRcBjfw6HdHD9nZzwdT7SgvXfgnQseFF7w=.05dc5242-cad0-4a6a-a96b-9754b2edc927@github.com> Message-ID: On Wed, 23 Apr 2025 15:12:54 GMT, Manuel H?ssig wrote: > ## Issue Summary > > The issue manifests in intermittent failures of test cases with `-XX:+PrintAssembly`. The reason for these intermittent failures is a deoptimization of the method before or during printing its assembly. In case that deoptimization makes the method not entrant, then the entry of that method is patched, but the relocation information is not updated. If the instruction at the method entry before patching had relocation info that prints a comment during assembly printing, printing that comment for the patched entry fails in case the operands of the original and patched instructions do not match. > > ## Change Summary > > To fix this issue, this PR updates the relocation info when patching the method entry. To avoid any races between printing and deoptimizing, this PR acquires the`NMethodState_lock`for printing an `nmethod`. > > All changes of this PR summarized: > - add a regression test, > - update the relocation information after patching the method entry for making it not entrant, > - acquire the `NMethodStat_lock` in `print_nmethod()` to avoid changing the relocation information during printing. > > ## Testing > > I ran tiers 1 through 3 and Oracle internal testing. This pull request has now been integrated. Changeset: 1eee15ee Author: Manuel H?ssig Committer: SendaoYan URL: https://git.openjdk.org/jdk/commit/1eee15eea692f57e35dd785bdd491411746ae3f1 Stats: 93 lines in 2 files changed: 93 ins; 0 del; 0 mod 8258229: Crash in nmethod::reloc_string_for Reviewed-by: galder, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24831 From mli at openjdk.org Tue May 6 09:12:16 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 6 May 2025 09:12:16 GMT Subject: RFR: 8355699: RISC-V: support SUADD/SADD/SUSUB/SSUB [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch to add SUADD/SADD/SUSUB/SSUB for vector api? > > Thanks! > > ## Test > data > > Benchmark | (size) | Mode | Cnt | Score - master | Score - patch | improvement (master/patch) > -- | -- | -- | -- | -- | -- | -- > ByteMaxVector.SADD | 1024 | avgt | 10 | 23693.941 | 381.441 | 62.117 > ByteMaxVector.SSUB | 1024 | avgt | 10 | 24067.009 | 379.836 | 63.362 > ByteMaxVector.SUADD | 1024 | avgt | 10 | 24131.819 | 382.678 | 63.06 > ByteMaxVector.SUSUB | 1024 | avgt | 10 | 23140.494 | 380.768 | 60.773 > IntMaxVector.SADD | 1024 | avgt | 10 | 88526.058 | 1378.77 | 64.207 > IntMaxVector.SSUB | 1024 | avgt | 10 | 94204.768 | 1383.613 | 68.086 > IntMaxVector.SUADD | 1024 | avgt | 10 | 82470.743 | 1384.668 | 59.56 > IntMaxVector.SUSUB | 1024 | avgt | 10 | 84443.805 | 1759.69 | 47.988 > LongMaxVector.SADD | 1024 | avgt | 10 | 187690.117 | 3770.84 | 49.774 > LongMaxVector.SSUB | 1024 | avgt | 10 | 187334.716 | 3814.869 | 49.106 > LongMaxVector.SUADD | 1024 | avgt | 10 | 186891.578 | 2747.753 | 68.016 > LongMaxVector.SUSUB | 1024 | avgt | 10 | 186092.582 | 2730.588 | 68.151 > ShortMaxVector.SADD | 1024 | avgt | 10 | 43991.814 | 726.703 | 60.536 > ShortMaxVector.SSUB | 1024 | avgt | 10 | 40560.356 | 730.238 | 55.544 > ShortMaxVector.SUADD | 1024 | avgt | 10 | 43349.632 | 729.758 | 59.403 > ShortMaxVector.SUSUB | 1024 | avgt | 10 | 42686.701 | 726.059 | 58.792 > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: minor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25005/files - new: https://git.openjdk.org/jdk/pull/25005/files/7648da04..466eb06c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25005&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25005&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25005.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25005/head:pull/25005 PR: https://git.openjdk.org/jdk/pull/25005 From mli at openjdk.org Tue May 6 09:12:26 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 6 May 2025 09:12:26 GMT Subject: RFR: 8355699: RISC-V: support SUADD/SADD/SUSUB/SSUB [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 10:17:36 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> minor > > src/hotspot/cpu/riscv/riscv_v.ad line 696: > >> 694: match(Set dst_src (SaturatingAddV (Binary dst_src src1) v0)); >> 695: ins_cost(VEC_COST); >> 696: format %{ "vsadd_masked $dst_src, $dst_src, $src1" %} > > Nit: Seems the mask register (`v0`) is missing in opto asm for these masked operations. > For integrity, we always print the mask register as the last operand for other masked nodes. > `format %{ "vsadd_masked $dst_src, $dst_src, $src1, $v0" %}` Fixed. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25005#discussion_r2075049744 From mli at openjdk.org Tue May 6 09:13:15 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 6 May 2025 09:13:15 GMT Subject: RFR: 8356030: RISC-V: enable (part of) BasicDoubleOpTest.java In-Reply-To: References: <3UkoITinG0CBPVt9q5O8vpnHKh154itJ4STteFDM1cc=.b5da8c9f-2ca8-4d4a-91b6-70ae0a949a94@github.com> Message-ID: On Mon, 5 May 2025 09:29:36 GMT, Fei Yang wrote: >> Hi, >> Can you help to review this patch? >> Originally, I was going to enable all test cases on riscv in this test file. But seems there was already a try to implement RoundDoubleModeV (which is IRNode.ROUND_DOUBLE_MODE_V) in https://github.com/openjdk/jdk/pull/21164, but failed because of some performance regression. >> So I'll just enable part of test cases in this pr. >> >> Thanks! > > LGTM. Thanks. Thank you @RealFYang @mhaessig ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24983#issuecomment-2853820512 From epeter at openjdk.org Tue May 6 09:39:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 May 2025 09:39:20 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v5] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 08:53:48 GMT, Jatin Bhateja wrote: > > @jatin-bhateja I think I'd rather wait until you have more thorough testing and the proofs I asked for, otherwise I would need to run the testing twice ;) > > I have added comments in the code which give sufficient details, let me know if you still need more explanation @jatin-bhateja I assume you are referring to the comments here: // Following rules applies to upper bound estimation of results value range // res.hi = src.hi iff src.hi > 0 else max_value // if result_bit_width < mask_bit_width, then we can further constrain res.hi as follows. // res.hi = MIN(res.hi, (1L << result_bit_width) - 1) hi = src_type->hi_as_long() >= 0 ? src_type->hi_as_long() : hi; hi = result_bit_width < mask_bit_width ? MIN2((jlong)((1L << result_bit_width) - 1L), hi) : hi; To me this looks more like simply a "statement" `we can`. It may be correct, or may be not. Even @merykitty says this is non-trivial, and he is a bit-magic master. There have now been multiple bugs in this code: - I filed this bug after finding it with my Template based Fuzzer. - Then I found a bug in your fix here. - And now @merykitty even found UB in this code. As I asked above: > We got this code wrong before, and now again. How can we gain confidence that it will be correct on the next attempt? You are claiming **that** your code is true, but you do not give proof for it. What I am looking for is **why** it is correct, and correct for all cases. To repeat myself from above: > For me to approve this code, you will have to do more than that. I will need: > - Proof of the implemented logic. > - More tests. @merykitty has pointed you to the tests here: https://github.com/openjdk/jdk/pull/23089 Have you responded to that yet? I agree with him that we need such tests. I would point you specifically to the range verification like this: int sum = 0; if (z > LIMIT_1) { sum += 1; } if (z > LIMIT_2) { sum += 2; } if (z > LIMIT_3) { sum += 4; } if (z > LIMIT_4) { sum += 8; } if (z > LIMIT_5) { sum += 16; } if (z > LIMIT_6) { sum += 32; } if (z > LIMIT_7) { sum += 64; } if (z > LIMIT_8) { sum += 128; } If a limit wrongly constant folds, then we get wrong results, by either always adding or never adding to the `sum`. This allows us to do checks against ranges. ------------------- An alternative is always to "backout" the broken optimization, as @merykitty suggested: > @jatin-bhateja This operation is non-trivial, I expect the level of coverage to be on par with https://github.com/openjdk/jdk/pull/23089. If you want to have a quick fix, I suggest removing all the logic and simply returning the bottom type. I.e. you could the problematic part of the changes from https://github.com/openjdk/jdk/pull/8498. We may want to revisit the `expand / compress` optimizations once we have `KnownBits` anyway, and we are quite close with it now https://github.com/openjdk/jdk/pull/17508. But even then: we will need good tests eventually. And with `KnownBits` we do not only need the "range" verification I showed above from https://github.com/openjdk/jdk/pull/23089, but we would have to do something similar with bits, to check if the type has one bit as always one or zero, which could lead to wrong constant folding below. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2853896251 From galder at openjdk.org Tue May 6 09:41:19 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 6 May 2025 09:41:19 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v2] In-Reply-To: References: <3xXLZZOHl6oejisEzmNv206aQo4y6FuJoWhsOO_GWqM=.682d7701-baa2-4654-8216-e4de526456d1@github.com> Message-ID: <_jqax2exjj4DnvqP-lVK4kiwJ59C0XS6B8DE6quAHGc=.579945f7-36a6-49f8-9b04-c0fe63f60a5f@github.com> On Mon, 5 May 2025 09:02:30 GMT, Daniel Lund?n wrote: >> Ah, good catch. Let me try to verify that my new asserts also trigger if I revert the fix for [JDK-8260420](https://bugs.openjdk.org/browse/JDK-8260420). > > Unfortunately, it was not straightforward to revert the fix for [JDK-8260420](https://bugs.openjdk.org/browse/JDK-8260420) (too many changes since then). If `!LCA_orig->dominates(pred_block) || early->dominates(pred_block)` failed at some point, then the new assert `early->dominates(LCA_orig)` must also fail in that situation (in theory). See the details in my other response above. Ok, maybe I was not clear enough. The comment says: > // Triggers an assert in PhaseCFG::raise_above_anti_dependences if loop strip mining verification is disabled: My question is, does the test on top of which the comment is placed (`test4` right?) ever run with loop strip mining verification disabled and if it does, does the assert get triggered? If `test4` does not do this, seems to me it would be nice to have an additional test that verifies just that rather than accept/assume the comment as valid without a test that actually verifies this. Thoughts? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2075102963 From epeter at openjdk.org Tue May 6 09:43:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 May 2025 09:43:19 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v5] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 08:53:48 GMT, Jatin Bhateja wrote: >> Hi @eme64 , can you kindly run this latest version through your testing, please. > >> @jatin-bhateja I think I'd rather wait until you have more thorough testing and the proofs I asked for, otherwise I would need to run the testing twice ;) > > I have added comments in the code which give sufficient details, let me know if you still need more explanation @jatin-bhateja Maybe it is not very clear to you what I mean by "proof". Here an example I worked on with @merykitty : https://github.com/openjdk/jdk/pull/17508, especially see the explanations and proofs in `adjust_lo`. Basically every line has an explanation, and a proof. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2853912265 From epeter at openjdk.org Tue May 6 09:49:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 May 2025 09:49:14 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v5] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 08:53:48 GMT, Jatin Bhateja wrote: >> Hi @eme64 , can you kindly run this latest version through your testing, please. > >> @jatin-bhateja I think I'd rather wait until you have more thorough testing and the proofs I asked for, otherwise I would need to run the testing twice ;) > > I have added comments in the code which give sufficient details, let me know if you still need more explanation @jatin-bhateja I really don't want you to feel forced to do anything here. If you don't want to write the tests or proofs, then I would suggest just to "backout" the problematic changes ? I'm sure someone else will do both proofs and tests once we can do these optimizations even more powerfully with `KnownBits`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2853925941 From jbhateja at openjdk.org Tue May 6 09:55:21 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 09:55:21 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 12:28:55 GMT, Jatin Bhateja wrote: > This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 > > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] > > In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 > > > PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > Member Hi @xmas92, Your suggestion looks good to me for this bugfix. I think we can improve upon the existing implementation as part of JDK-8355341 since its a bigger change and also include graal byein. There is still a possibility of incorrect relocation sharing with subsequent relocatable instructions in other cases, e.g. OR instruction for which we bookkeep the relocation address from the end of the instruction, and it's the last instruction in the pointer coloring primitive. For this bug fix, your suggestion looks fine to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2853945841 From shade at openjdk.org Tue May 6 09:57:47 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 6 May 2025 09:57:47 GMT Subject: RFR: 8356259: Lift basic -Xlog:jit* logging to "info" level Message-ID: <2fpJJXAU-vYZkTcjJtTiy5gie8wiw836gMv3kbcidXs=.47732a59-c5ce-4d66-9f40-8d78c657374f@github.com> We have unified logging for JIT activity: -Xlog:jit+compilation, -Xlog:jit+inlining, etc. These serve as convenient replacements for -XX:+PrintCompilation, -XX:+PrintInlining, etc. And these replacements are useful, because UL can be forwarded to file, their format can be adjusted, and they can be handled asynchronously. However, all useful messages are on "debug" level, which is inconvenient and surprising. It is reasonable to expect some level of basic logging when supplying -Xlog:jit+compilation, e.g. "info" level. I believe we should lift at least some of the logging to "info" level for these. Additional testing: - [x] Eyeballing `-Xlog:jit*` logs after the patch - [ ] Linux x86_64 server fastdebug, `all` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/25061/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25061&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356259 Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/25061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25061/head:pull/25061 PR: https://git.openjdk.org/jdk/pull/25061 From jbhateja at openjdk.org Tue May 6 10:21:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 10:21:54 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: > This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 > > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] > > In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 > > > PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24919/files - new: https://git.openjdk.org/jdk/pull/24919/files/1f9c84c8..fc3b61e7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24919&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24919&range=00-01 Stats: 25 lines in 4 files changed: 11 ins; 7 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24919.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24919/head:pull/24919 PR: https://git.openjdk.org/jdk/pull/24919 From jbhateja at openjdk.org Tue May 6 10:31:15 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 10:31:15 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v5] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 08:53:48 GMT, Jatin Bhateja wrote: >> Hi @eme64 , can you kindly run this latest version through your testing, please. > >> @jatin-bhateja I think I'd rather wait until you have more thorough testing and the proofs I asked for, otherwise I would need to run the testing twice ;) > > I have added comments in the code which give sufficient details, let me know if you still need more explanation > @jatin-bhateja I really don't want you to feel forced to do anything here. If you don't want to write the tests or proofs, then I would suggest just to "backout" the problematic changes ?? I'm sure someone else will do both proofs and tests once we can do these optimizations even more powerfully with `KnownBits`. Hi @eme64 , Thanks for your pointers, let me do the needful. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2854061476 From jbhateja at openjdk.org Tue May 6 10:37:15 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 10:37:15 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v7] In-Reply-To: References: <7sL3-TEh2o6nT6GvvjYUpQfBbqbzeXgrJST9JeAcjLc=.df22b69d-6087-44c6-883d-e0604b92a44d@github.com> Message-ID: On Tue, 6 May 2025 08:59:24 GMT, Quan Anh Mai wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding additional test > > src/hotspot/share/opto/intrinsicnode.cpp line 302: > >> 300: // res.hi = MIN(res.hi, (1L << result_bit_width) - 1) >> 301: hi = src_type->hi_as_long() >= 0 ? src_type->hi_as_long() : hi; >> 302: hi = result_bit_width < mask_bit_width ? MIN2((jlong)((1L << result_bit_width) - 1L), hi) : hi; > > Note that if `result_bit_width == 63`, this computation will do `min_jlong - 1` which is UB. Thanks @merykitty , I liked your to-the-point, informative and crisp comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2075188291 From fyang at openjdk.org Tue May 6 10:41:15 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 6 May 2025 10:41:15 GMT Subject: RFR: 8355699: RISC-V: support SUADD/SADD/SUSUB/SSUB [v2] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 09:12:16 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch to add SUADD/SADD/SUSUB/SSUB for vector api? >> >> Thanks! >> >> ## Test >> data >> >> Benchmark | (size) | Mode | Cnt | Score - master | Score - patch | improvement (master/patch) >> -- | -- | -- | -- | -- | -- | -- >> ByteMaxVector.SADD | 1024 | avgt | 10 | 23693.941 | 381.441 | 62.117 >> ByteMaxVector.SSUB | 1024 | avgt | 10 | 24067.009 | 379.836 | 63.362 >> ByteMaxVector.SUADD | 1024 | avgt | 10 | 24131.819 | 382.678 | 63.06 >> ByteMaxVector.SUSUB | 1024 | avgt | 10 | 23140.494 | 380.768 | 60.773 >> IntMaxVector.SADD | 1024 | avgt | 10 | 88526.058 | 1378.77 | 64.207 >> IntMaxVector.SSUB | 1024 | avgt | 10 | 94204.768 | 1383.613 | 68.086 >> IntMaxVector.SUADD | 1024 | avgt | 10 | 82470.743 | 1384.668 | 59.56 >> IntMaxVector.SUSUB | 1024 | avgt | 10 | 84443.805 | 1759.69 | 47.988 >> LongMaxVector.SADD | 1024 | avgt | 10 | 187690.117 | 3770.84 | 49.774 >> LongMaxVector.SSUB | 1024 | avgt | 10 | 187334.716 | 3814.869 | 49.106 >> LongMaxVector.SUADD | 1024 | avgt | 10 | 186891.578 | 2747.753 | 68.016 >> LongMaxVector.SUSUB | 1024 | avgt | 10 | 186092.582 | 2730.588 | 68.151 >> ShortMaxVector.SADD | 1024 | avgt | 10 | 43991.814 | 726.703 | 60.536 >> ShortMaxVector.SSUB | 1024 | avgt | 10 | 40560.356 | 730.238 | 55.544 >> ShortMaxVector.SUADD | 1024 | avgt | 10 | 43349.632 | 729.758 | 59.403 >> ShortMaxVector.SUSUB | 1024 | avgt | 10 | 42686.701 | 726.059 | 58.792 >> >> > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minor Updated change looks good. What about the vector-scalar variants (vsaddu.vx, vsaddu.vi, etc.)? Do they help in any way? BTW: Nice JMH numbers :-) ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25005#pullrequestreview-2817781732 From jbhateja at openjdk.org Tue May 6 11:19:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 11:19:54 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v14] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: build fixes for non-x86 targets ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/b25cc776..650e3d61 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=12-13 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From mli at openjdk.org Tue May 6 11:20:15 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 6 May 2025 11:20:15 GMT Subject: RFR: 8355699: RISC-V: support SUADD/SADD/SUSUB/SSUB [v2] In-Reply-To: References: Message-ID: <1Da_mkv8g0xGl13SPBP1Bo1EfDodNNOtXgt_lO8PaCU=.5a2cc392-94f4-4550-92ea-e04998acdda0@github.com> On Tue, 6 May 2025 10:39:04 GMT, Fei Yang wrote: > Updated change looks good. Thank you! > What about the vector-scalar variants (vsaddu.vx, vsaddu.vi, etc.)? Do they help in any way? I think so, although not sure how much benefit it will bring, as it should be able to do a vmv first, then use the instructs in this patch, so there should be some improvement, but maybe just minor one. And for other operations, like (signed/unsigned) max/min, mulb/s/i/l/f/d, and so on, I think we can also introduce the _vx and _vi version. Maybe we could implement these bunch of instructs in another patch together? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25005#issuecomment-2854193307 From qamai at openjdk.org Tue May 6 11:50:19 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 6 May 2025 11:50:19 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v14] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 11:19:54 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > build fixes for non-x86 targets src/hotspot/cpu/x86/vm_version_x86.hpp line 37: > 35: class VM_Features { > 36: public: > 37: using FeatureVector = uint64_t [MAX_FEATURE_VEC_SIZE]; Do you think it would be better to refactor this into a separate class analogous to `std::bitset`? You can start with only implementing `test`, `set`, `reset`. This would help in other use cases, too. https://en.cppreference.com/w/cpp/utility/bitset ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2075295556 From coleenp at openjdk.org Tue May 6 11:51:19 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 6 May 2025 11:51:19 GMT Subject: RFR: 8356172: IdealGraphPrinter doesn't need ThreadCritical In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:27:22 GMT, Coleen Phillimore wrote: > Please review this possibly trivial change. > Tested with hs-precheckin-comp test list. Thanks for the review Roberto and Christian. We used to have a lot more ThreadCriticals scattered around this code so I think this was just another place that one landed, I don't see what it could have protected now relative to the other ThreadCritical that we still have. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25035#issuecomment-2854275217 From coleenp at openjdk.org Tue May 6 11:51:20 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 6 May 2025 11:51:20 GMT Subject: Integrated: 8356172: IdealGraphPrinter doesn't need ThreadCritical In-Reply-To: References: Message-ID: <5ZdierYx0rcDYzAc-OFKAlqZM6ow9bK8pS6J-jmNsEI=.1abc7aa0-f299-42e2-9a6e-0c1b7f44b6db@github.com> On Mon, 5 May 2025 12:27:22 GMT, Coleen Phillimore wrote: > Please review this possibly trivial change. > Tested with hs-precheckin-comp test list. This pull request has now been integrated. Changeset: ddd07b10 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/ddd07b107e814ec846579a66d4f2005b7db9bb2f Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8356172: IdealGraphPrinter doesn't need ThreadCritical Reviewed-by: rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/25035 From qamai at openjdk.org Tue May 6 11:54:22 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 6 May 2025 11:54:22 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v14] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 11:19:54 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > build fixes for non-x86 targets src/hotspot/cpu/x86/vm_version_x86.hpp line 44: > 42: // log2 of feature vector element size in bits, used by JVMCI to check enabled feature bits. > 43: // Refer HotSpotJVMCIBackendFactory::convertFeaturesVector. > 44: static uint32_t _features_vector_element_shift_count; Making this `static constexpr` helps constant folding, too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2075301116 From jbhateja at openjdk.org Tue May 6 12:09:21 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 12:09:21 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v14] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 11:47:47 GMT, Quan Anh Mai wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> build fixes for non-x86 targets > > src/hotspot/cpu/x86/vm_version_x86.hpp line 37: > >> 35: class VM_Features { >> 36: public: >> 37: using FeatureVector = uint64_t [MAX_FEATURE_VEC_SIZE]; > > Do you think it would be better to refactor this into a separate class analogous to `std::bitset`? You can start with only implementing `test`, `set`, `reset`. This would help in other use cases, too. > > https://en.cppreference.com/w/cpp/utility/bitset In essence, what we have currently is a bitmap implementation, but its utility is limited to VM_Version for now. The current approach simplifies the JVMCI side of handling. We have an existing utility for bitset src/hotspot/share/utilities/bitMap.hpp ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2075325468 From rkennke at openjdk.org Tue May 6 12:22:53 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 6 May 2025 12:22:53 GMT Subject: RFR: 8356266: Fix non-Shenandoah build after JDK-8356075 Message-ID: <9t9PKKEIz5lyztUpQjzlbAi218B71LKv2w-UvMikrF8=.987114a6-8e92-4193-910c-2688a8ecddcf@github.com> [JDK-8356075](https://bugs.openjdk.org/browse/JDK-8356075) (see PR #25001) causes builds without Shenandoah GC to fail. It's missing an `#if INCLUDE_SHENANDOAHGC`. Testing: - [x] Build without Shenandoah GC ------------- Commit messages: - 8356266: Fix non-Shenandoah build after JDK-8356075 Changes: https://git.openjdk.org/jdk/pull/25064/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25064&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356266 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25064.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25064/head:pull/25064 PR: https://git.openjdk.org/jdk/pull/25064 From dnsimon at openjdk.org Tue May 6 12:46:16 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 6 May 2025 12:46:16 GMT Subject: RFR: 8356266: Fix non-Shenandoah build after JDK-8356075 In-Reply-To: <9t9PKKEIz5lyztUpQjzlbAi218B71LKv2w-UvMikrF8=.987114a6-8e92-4193-910c-2688a8ecddcf@github.com> References: <9t9PKKEIz5lyztUpQjzlbAi218B71LKv2w-UvMikrF8=.987114a6-8e92-4193-910c-2688a8ecddcf@github.com> Message-ID: <108F8BKi1AuttNCA6a1RxJYTIVnP0phMzeaUNoHMq9Q=.44c15e9a-a8d7-42c7-97c2-f1eb0b6b5e04@github.com> On Tue, 6 May 2025 12:17:44 GMT, Roman Kennke wrote: > [JDK-8356075](https://bugs.openjdk.org/browse/JDK-8356075) (see PR #25001) causes builds without Shenandoah GC to fail. It's missing an `#if INCLUDE_SHENANDOAHGC`. > > Testing: > - [x] Build without Shenandoah GC Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25064#pullrequestreview-2818123867 From luhenry at openjdk.org Tue May 6 13:10:26 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 6 May 2025 13:10:26 GMT Subject: RFR: 8356030: RISC-V: enable (part of) BasicDoubleOpTest.java In-Reply-To: <3UkoITinG0CBPVt9q5O8vpnHKh154itJ4STteFDM1cc=.b5da8c9f-2ca8-4d4a-91b6-70ae0a949a94@github.com> References: <3UkoITinG0CBPVt9q5O8vpnHKh154itJ4STteFDM1cc=.b5da8c9f-2ca8-4d4a-91b6-70ae0a949a94@github.com> Message-ID: <9hu46rG_cR6lH81nYW_J05IE_Vs8I6A1zJ7jPqxWQ4g=.81048b8c-04ab-4a17-a590-11b520050b15@github.com> On Thu, 1 May 2025 11:31:50 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > Originally, I was going to enable all test cases on riscv in this test file. But seems there was already a try to implement RoundDoubleModeV (which is IRNode.ROUND_DOUBLE_MODE_V) in https://github.com/openjdk/jdk/pull/21164, but failed because of some performance regression. > So I'll just enable part of test cases in this pr. > > Thanks! Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24983#pullrequestreview-2818216170 From luhenry at openjdk.org Tue May 6 13:10:29 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 6 May 2025 13:10:29 GMT Subject: RFR: 8355699: RISC-V: support SUADD/SADD/SUSUB/SSUB [v2] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 09:12:16 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch to add SUADD/SADD/SUSUB/SSUB for vector api? >> >> Thanks! >> >> ## Test >> data >> >> Benchmark | (size) | Mode | Cnt | Score - master | Score - patch | improvement (master/patch) >> -- | -- | -- | -- | -- | -- | -- >> ByteMaxVector.SADD | 1024 | avgt | 10 | 23693.941 | 381.441 | 62.117 >> ByteMaxVector.SSUB | 1024 | avgt | 10 | 24067.009 | 379.836 | 63.362 >> ByteMaxVector.SUADD | 1024 | avgt | 10 | 24131.819 | 382.678 | 63.06 >> ByteMaxVector.SUSUB | 1024 | avgt | 10 | 23140.494 | 380.768 | 60.773 >> IntMaxVector.SADD | 1024 | avgt | 10 | 88526.058 | 1378.77 | 64.207 >> IntMaxVector.SSUB | 1024 | avgt | 10 | 94204.768 | 1383.613 | 68.086 >> IntMaxVector.SUADD | 1024 | avgt | 10 | 82470.743 | 1384.668 | 59.56 >> IntMaxVector.SUSUB | 1024 | avgt | 10 | 84443.805 | 1759.69 | 47.988 >> LongMaxVector.SADD | 1024 | avgt | 10 | 187690.117 | 3770.84 | 49.774 >> LongMaxVector.SSUB | 1024 | avgt | 10 | 187334.716 | 3814.869 | 49.106 >> LongMaxVector.SUADD | 1024 | avgt | 10 | 186891.578 | 2747.753 | 68.016 >> LongMaxVector.SUSUB | 1024 | avgt | 10 | 186092.582 | 2730.588 | 68.151 >> ShortMaxVector.SADD | 1024 | avgt | 10 | 43991.814 | 726.703 | 60.536 >> ShortMaxVector.SSUB | 1024 | avgt | 10 | 40560.356 | 730.238 | 55.544 >> ShortMaxVector.SUADD | 1024 | avgt | 10 | 43349.632 | 729.758 | 59.403 >> ShortMaxVector.SUSUB | 1024 | avgt | 10 | 42686.701 | 726.059 | 58.792 >> >> > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minor Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25005#pullrequestreview-2818215012 From luhenry at openjdk.org Tue May 6 13:11:21 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 6 May 2025 13:11:21 GMT Subject: RFR: 8355704: RISC-V: enable TestIRFma.java [v2] In-Reply-To: References: Message-ID: <9UHpK0ZeT6p6ONGbanYoxYvzU4m2dP1m4K2jLLYNf0s=.c48ace55-2792-4ddf-95dc-4f78d1e2ad31@github.com> On Thu, 1 May 2025 08:36:29 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch to enable TestIRFma.java? >> FmaF/D (checked by TestIRFma.java) are supported on riscv, but for some reason we can not enable it easily, but we should enable it. >> >> NOTE: the reason I change IRNode matching rules is that, previously it verify the `FINAL CODE` where every platform could have different instruct name; I change it from machOnlyNameRegex to beforeMatchingNameRegex, to make it verify the `PrintIdeal` where every platform share the same names. >> >> Also tested on machine with `asimd` support. >> >> Thanks! > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > adjust IR verification Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24947#pullrequestreview-2818217833 From rkennke at openjdk.org Tue May 6 13:18:23 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 6 May 2025 13:18:23 GMT Subject: RFR: 8356266: Fix non-Shenandoah build after JDK-8356075 In-Reply-To: <108F8BKi1AuttNCA6a1RxJYTIVnP0phMzeaUNoHMq9Q=.44c15e9a-a8d7-42c7-97c2-f1eb0b6b5e04@github.com> References: <9t9PKKEIz5lyztUpQjzlbAi218B71LKv2w-UvMikrF8=.987114a6-8e92-4193-910c-2688a8ecddcf@github.com> <108F8BKi1AuttNCA6a1RxJYTIVnP0phMzeaUNoHMq9Q=.44c15e9a-a8d7-42c7-97c2-f1eb0b6b5e04@github.com> Message-ID: <_dnHV1rf65FfgcxrigE2RMCBOBu_YUq58SAdmB2as2k=.605e1266-5e2f-44da-8889-3658545d6c1b@github.com> On Tue, 6 May 2025 12:43:07 GMT, Doug Simon wrote: >> [JDK-8356075](https://bugs.openjdk.org/browse/JDK-8356075) (see PR #25001) causes builds without Shenandoah GC to fail. It's missing an `#if INCLUDE_SHENANDOAHGC`. >> >> Testing: >> - [x] Build without Shenandoah GC > > Marked as reviewed by dnsimon (Reviewer). Thanks, @dougxc! Is this trivial? Can I push this right away to fix the build? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25064#issuecomment-2854547042 From rkennke at openjdk.org Tue May 6 13:28:26 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 6 May 2025 13:28:26 GMT Subject: RFR: 8356266: Fix non-Shenandoah build after JDK-8356075 In-Reply-To: <9t9PKKEIz5lyztUpQjzlbAi218B71LKv2w-UvMikrF8=.987114a6-8e92-4193-910c-2688a8ecddcf@github.com> References: <9t9PKKEIz5lyztUpQjzlbAi218B71LKv2w-UvMikrF8=.987114a6-8e92-4193-910c-2688a8ecddcf@github.com> Message-ID: On Tue, 6 May 2025 12:17:44 GMT, Roman Kennke wrote: > [JDK-8356075](https://bugs.openjdk.org/browse/JDK-8356075) (see PR #25001) causes builds without Shenandoah GC to fail. It's missing an `#if INCLUDE_SHENANDOAHGC`. > > Testing: > - [x] Build without Shenandoah GC Some GHA failures - they look unrelated. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25064#issuecomment-2854558817 PR Comment: https://git.openjdk.org/jdk/pull/25064#issuecomment-2854572905 From shade at openjdk.org Tue May 6 13:28:25 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 6 May 2025 13:28:25 GMT Subject: RFR: 8356266: Fix non-Shenandoah build after JDK-8356075 In-Reply-To: <9t9PKKEIz5lyztUpQjzlbAi218B71LKv2w-UvMikrF8=.987114a6-8e92-4193-910c-2688a8ecddcf@github.com> References: <9t9PKKEIz5lyztUpQjzlbAi218B71LKv2w-UvMikrF8=.987114a6-8e92-4193-910c-2688a8ecddcf@github.com> Message-ID: On Tue, 6 May 2025 12:17:44 GMT, Roman Kennke wrote: > [JDK-8356075](https://bugs.openjdk.org/browse/JDK-8356075) (see PR #25001) causes builds without Shenandoah GC to fail. It's missing an `#if INCLUDE_SHENANDOAHGC`. > > Testing: > - [x] Build without Shenandoah GC Ah yes. Trivial. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25064#pullrequestreview-2818264886 From rkennke at openjdk.org Tue May 6 13:28:26 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 6 May 2025 13:28:26 GMT Subject: Integrated: 8356266: Fix non-Shenandoah build after JDK-8356075 In-Reply-To: <9t9PKKEIz5lyztUpQjzlbAi218B71LKv2w-UvMikrF8=.987114a6-8e92-4193-910c-2688a8ecddcf@github.com> References: <9t9PKKEIz5lyztUpQjzlbAi218B71LKv2w-UvMikrF8=.987114a6-8e92-4193-910c-2688a8ecddcf@github.com> Message-ID: <5xNQWiQmV33cfOTCB2_pb5B66d7L7IK2MXWEN-Gnqy4=.181a933e-4a2a-4210-8610-f03d62828c8c@github.com> On Tue, 6 May 2025 12:17:44 GMT, Roman Kennke wrote: > [JDK-8356075](https://bugs.openjdk.org/browse/JDK-8356075) (see PR #25001) causes builds without Shenandoah GC to fail. It's missing an `#if INCLUDE_SHENANDOAHGC`. > > Testing: > - [x] Build without Shenandoah GC This pull request has now been integrated. Changeset: bfdafb76 Author: Roman Kennke URL: https://git.openjdk.org/jdk/commit/bfdafb762661fad5746607aaf5b21d6d11c72ffc Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod 8356266: Fix non-Shenandoah build after JDK-8356075 Reviewed-by: dnsimon, shade ------------- PR: https://git.openjdk.org/jdk/pull/25064 From asmehra at openjdk.org Tue May 6 14:14:22 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 6 May 2025 14:14:22 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: <7e6TPADKIO-d9cqpuhk-O-4bEX1esJsQdFtztwF5gcU=.8df43b49-7e62-4aa4-8f14-184b9376467b@github.com> On Mon, 5 May 2025 23:54:55 GMT, Vladimir Kozlov wrote: >> Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Merge branch 'master' into preserve-runtime-blobs-master >> - Address Vladimir's comments >> >> Signed-off-by: Ashutosh Mehra >> - Remove irrelevant comment >> >> Signed-off-by: Ashutosh Mehra >> - Fix win64 compile failures >> >> Signed-off-by: Ashutosh Mehra >> - Fix AOTCodeFlags.java test >> >> Signed-off-by: Ashutosh Mehra >> - Fix compile failure in minimal config >> >> Signed-off-by: Ashutosh Mehra >> - Revert back changes that added AOTRuntimeConstants. >> Ensure CompressedOops::base and CompressedKlssPointers::base does not >> change in production run >> >> Signed-off-by: Ashutosh Mehra >> - Fix merge conflicts >> >> Signed-off-by: Ashutosh Mehra >> - Store/load AsmRemarks and DbgStrings in aot code cache >> >> Signed-off-by: Ashutosh Mehra >> - Add missing external address in aarch64 >> >> Signed-off-by: Ashutosh Mehra >> - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab > > I attached hs_err file to RFE in JBS. @vnkozlov thanks for testing the patch. > % make test JTREG=AOT_JDK=true CONF=fastdebug TEST=compiler/c2/cr6865031/Test.java I can recreate this test locally. Looking into it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2854731804 From mbaesken at openjdk.org Tue May 6 14:42:26 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 6 May 2025 14:42:26 GMT Subject: RFR: 8356269: Issues after JDK-8295470 Message-ID: There are some issues with [JDK-8295470](https://bugs.openjdk.org/browse/JDK-8295470) https://wiki.openjdk.org/display/CodeTools/jcstress seems to be dead now (also used in TestGenerator.java). There is a typo hhttps at one place, needs to be fixed. ------------- Commit messages: - JDK-8356269 Changes: https://git.openjdk.org/jdk/pull/25068/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25068&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356269 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25068.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25068/head:pull/25068 PR: https://git.openjdk.org/jdk/pull/25068 From chagedorn at openjdk.org Tue May 6 15:34:12 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 May 2025 15:34:12 GMT Subject: RFR: 8356269: Issues after JDK-8295470 In-Reply-To: References: Message-ID: On Tue, 6 May 2025 14:37:37 GMT, Matthias Baesken wrote: > There are some issues with [JDK-8295470](https://bugs.openjdk.org/browse/JDK-8295470) > https://wiki.openjdk.org/display/CodeTools/jcstress seems to be dead now (also used in TestGenerator.java). > There is a typo hhttps at one place, needs to be fixed. Looks good to me. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25068#pullrequestreview-2818734569 From thartmann at openjdk.org Tue May 6 15:35:26 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 6 May 2025 15:35:26 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v16] In-Reply-To: <0c36D3W-1GHo4gbSmHye1tJKsDxJLjG3uxt0Hb0Qxpo=.6ce172e4-2e95-44d3-a293-315dcb8e64a5@github.com> References: <0c36D3W-1GHo4gbSmHye1tJKsDxJLjG3uxt0Hb0Qxpo=.6ce172e4-2e95-44d3-a293-315dcb8e64a5@github.com> Message-ID: On Mon, 24 Mar 2025 16:02:35 GMT, Srinivas Vamsi Parasa wrote: >> Re-approved :) > >> Re-approved :) > > Thank you, Emanuel! :) This caused a regression: [JDK-8356281](https://bugs.openjdk.org/browse/JDK-8356281) @vamsi-parasa Could you please have a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23501#issuecomment-2855038426 From chagedorn at openjdk.org Tue May 6 15:37:16 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 May 2025 15:37:16 GMT Subject: RFR: 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000) In-Reply-To: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> References: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> Message-ID: On Wed, 30 Apr 2025 10:30:33 GMT, Daniel Lund?n wrote: > Certain idealizations introduce more new nodes than expected when adding the new assert in the changeset for [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833). The limit checked by the new assert is too optimistic. > > ### Changeset > > Tweak the maximum live node increase per iteration in the main IGVN loop from `NodeLimitFudgeFactor * 2` (4000 by default) to `NodeLimitFudgeFactor * 3` (6000 by default). This change does not only affect the newly added assert in [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833), but also the IGVN live node count bailout which is `MaxNodeLimit` minus the maximum live node increase per iteration. That is, the bailout by default is currently at 80000 - 4000 = 76000 live nodes, and 80000 - 6000 = 74000 live nodes after this changeset. In practice, the difference does not matter (see Testing below). > > The motivation for just tweaking the limit and keeping the assert added by [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833) is that individual IGVN transformations (within a single iteration of the IGVN loop) should, in theory, only affect a local set of nodes in the ideal graph. Therefore, the assert is a good sanity check that various transformations (current ones and whatever we might add in the future) do not scale in the size of the ideal graph (i.e., they are local transformations). > > I have not managed to construct a reliable regression test, as triggering the assert is difficult (highly intermittent). Also, the issue is benign (a too optimistic limit). > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14594986152) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Checked IGVN live node count bailouts in DaCapo, Renaissance, SPECjvm, and SPECjbb and observed no bailouts before nor after this changeset. Looks resonable to me. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24960#pullrequestreview-2818744088 From sparasa at openjdk.org Tue May 6 15:41:21 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 6 May 2025 15:41:21 GMT Subject: RFR: 8349582: APX NDD code generation for OpenJDK [v16] In-Reply-To: <0c36D3W-1GHo4gbSmHye1tJKsDxJLjG3uxt0Hb0Qxpo=.6ce172e4-2e95-44d3-a293-315dcb8e64a5@github.com> References: <0c36D3W-1GHo4gbSmHye1tJKsDxJLjG3uxt0Hb0Qxpo=.6ce172e4-2e95-44d3-a293-315dcb8e64a5@github.com> Message-ID: On Mon, 24 Mar 2025 16:02:35 GMT, Srinivas Vamsi Parasa wrote: >> Re-approved :) > >> Re-approved :) > > Thank you, Emanuel! :) > This caused a regression: [JDK-8356281](https://bugs.openjdk.org/browse/JDK-8356281) @vamsi-parasa Could you please have a look? Hi Tobias, thank you for letting us know. I have been working on a fix for this issue over the last few days. The root cause has been figured out and it will be addressed soon. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23501#issuecomment-2855056348 From duke at openjdk.org Tue May 6 15:46:17 2025 From: duke at openjdk.org (duke) Date: Tue, 6 May 2025 15:46:17 GMT Subject: RFR: 8354473: Incorrect results for compress/expand tests with -XX:+EnableX86ECoreOpts In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 03:54:09 GMT, Volodymyr Paprotski wrote: > It looks like the `permv` mask isnt always 'all-ones' or 'all-zeroes'. (Which is OK for real blend, but needs to be enforced via the flag for blend emulation) > > Before the fix, `make test TEST="jdk/incubator/vector"` (on ECore machine) > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP >>> jtreg:test/jdk/jdk/incubator/vector 83 71 10 0 2 << > ============================== > TEST FAILURE > > After the fix: > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP > jtreg:test/jdk/jdk/incubator/vector 83 81 0 0 2 > ============================== > TEST SUCCESS > > And on an AVX512 machine: > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP > jtreg:test/jdk/jdk/incubator/vector 83 81 0 0 2 > ============================== > TEST SUCCESS @vpaprotsk Your change (at version fffd783e57bc876ecd209a7cb3b352ff0e74e266) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24645#issuecomment-2855071740 From epeter at openjdk.org Tue May 6 16:24:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 May 2025 16:24:14 GMT Subject: RFR: 8356269: Issues after JDK-8295470 In-Reply-To: References: Message-ID: On Tue, 6 May 2025 14:37:37 GMT, Matthias Baesken wrote: > There are some issues with [JDK-8295470](https://bugs.openjdk.org/browse/JDK-8295470) > https://wiki.openjdk.org/display/CodeTools/jcstress seems to be dead now (also used in TestGenerator.java). > There is a typo hhttps at one place, needs to be fixed. Looks reasonable. 2 comments: the title is not very descriptive. Suggestion: `Fix broken web-links after JDK-8295470` Also: the JBS issue is not assigned to you, which means anybody might assign it to themselves at any time, and snatch it away from you "legally" ;) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25068#pullrequestreview-2818893434 From epeter at openjdk.org Tue May 6 16:26:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 May 2025 16:26:24 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v7] In-Reply-To: References: <5JpWWMlRP-o60KZI9bU5bMq-dJePHvnKdUgigCfwbfo=.c5545951-3f36-43da-b082-79a3a00ac6c0@github.com> Message-ID: On Wed, 30 Apr 2025 22:37:43 GMT, Dhamoder Nalla wrote: > It appears that the bug is challenging to reproduce with the default values. @dhanalla Would the bug still trigger with two arrays of half the size? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20504#issuecomment-2855198197 From vpaprotski at openjdk.org Tue May 6 16:29:20 2025 From: vpaprotski at openjdk.org (Volodymyr Paprotski) Date: Tue, 6 May 2025 16:29:20 GMT Subject: Integrated: 8354473: Incorrect results for compress/expand tests with -XX:+EnableX86ECoreOpts In-Reply-To: References: Message-ID: On Tue, 15 Apr 2025 03:54:09 GMT, Volodymyr Paprotski wrote: > It looks like the `permv` mask isnt always 'all-ones' or 'all-zeroes'. (Which is OK for real blend, but needs to be enforced via the flag for blend emulation) > > Before the fix, `make test TEST="jdk/incubator/vector"` (on ECore machine) > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP >>> jtreg:test/jdk/jdk/incubator/vector 83 71 10 0 2 << > ============================== > TEST FAILURE > > After the fix: > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP > jtreg:test/jdk/jdk/incubator/vector 83 81 0 0 2 > ============================== > TEST SUCCESS > > And on an AVX512 machine: > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP > jtreg:test/jdk/jdk/incubator/vector 83 81 0 0 2 > ============================== > TEST SUCCESS This pull request has now been integrated. Changeset: a6995a3d Author: Volodymyr Paprotski Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/a6995a3d42955f1f207c14be1634daf225b5ab3f Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8354473: Incorrect results for compress/expand tests with -XX:+EnableX86ECoreOpts Reviewed-by: jbhateja, sviswanathan, epeter ------------- PR: https://git.openjdk.org/jdk/pull/24645 From rcastanedalo at openjdk.org Tue May 6 16:42:31 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 6 May 2025 16:42:31 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads Message-ID: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. #### Testing - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). ------------- Commit messages: - Format - Remove extra line - Further clarify zLoadP candidate predicate and no-preceding-lea assertion - Rename machine node property to ins_is_late_expanded_null_check_candidate for clarity, and make it a total function - Update copyright year - Revert unnecessary changes - Move check to original location - Enable zLoadP as implicit null check candidates on riscv and ppc - Refactor assertion - Simplify test - ... and 15 more: https://git.openjdk.org/jdk/compare/e2ae50d8...dc5aa4fc Changes: https://git.openjdk.org/jdk/pull/25066/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345067 Stats: 385 lines in 15 files changed: 338 ins; 37 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/25066.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25066/head:pull/25066 PR: https://git.openjdk.org/jdk/pull/25066 From epeter at openjdk.org Tue May 6 16:44:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 May 2025 16:44:18 GMT Subject: RFR: 8351950: C2: masked vector MIN/MAX AVX512: SIGFPE / no valid evex tuple_table entry In-Reply-To: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: On Sat, 3 May 2025 15:49:24 GMT, Jatin Bhateja wrote: > PR adds missing EVEX compressed displacement attributes used for computing the scale factor (N) of compressed displacement. > AVX512 memory operand instructions use compressed disp8 encoding if the displacement is a multiple of scale (N), which depends on Vector Length, embedded broadcasting, and lane size. Please refer to section 2.7.5 of Intel SDM for more details. > > e.g., Consider two instructions, one with displacement 0x10203040 and the other with displacement 0x40, instruction operates over full 64-byte vector hence scale N = 64. Displacement of latter instruction is a multiple of scale, thus can be represented by 1 byte displacement encoding, while the former requires 4 bytes to represent displacement in instruction encoding. > > > 1) vpternlogq $0xff,0x10203040(%r20,%r21,8),%zmm23,%zmm24 > EVEX OP MR SIB DISP IMM > --------------|----|----|----|---------------|-----| > 62 6b c1 40 25 84 ec 40 30 20 10 ff > > 2) vpternlogq $0xff,0x40(%r20,%r21,8),%zmm23,%zmm24 > For full vector width operation, scalar matches with vector size, hence scale N = 64 > effective displacement / compressed DISP8 = OFFSET(64) / 64 = 0x1 > EVEX OP MR SIB DISP IMM > -------------|----|---|---|-----------|---| > 62 6b c1 40 25 44 ec 01 ff > > > Kindly review and share your feedback. > > Best Regards, > Jatin @jatin-bhateja Thanks you for looking into this! The fix looks generally reasonable, thanks for adding all the tests! src/hotspot/cpu/x86/assembler_x86.cpp line 11542: > 11540: assert(vector_len == AVX_512bit || VM_Version::supports_avx512vl(), ""); > 11541: InstructionAttr attributes(vector_len, /* vex_w */ true,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true); > 11542: attributes.set_address_attributes(/* tuple_type */ EVEX_FV,/* input_size_in_bits */ EVEX_NObit); @jatin-bhateja How is this `fma` case related to the `min / max` cases that were reported? I did also not find a test below. src/hotspot/cpu/x86/assembler_x86.cpp line 11571: > 11569: attributes.set_is_evex_instruction(); > 11570: attributes.set_embedded_opmask_register_specifier(mask); > 11571: attributes.set_address_attributes(/* tuple_type */ EVEX_FV, /* input_size_in_bits */ EVEX_NObit); @jatin-bhateja How are these `perm` cases related to the `min / max` cases that were reported? I did also not find a test below. ------------- PR Review: https://git.openjdk.org/jdk/pull/25021#pullrequestreview-2818982572 PR Review Comment: https://git.openjdk.org/jdk/pull/25021#discussion_r2075881555 PR Review Comment: https://git.openjdk.org/jdk/pull/25021#discussion_r2075882162 From epeter at openjdk.org Tue May 6 16:50:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 May 2025 16:50:15 GMT Subject: RFR: 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000) In-Reply-To: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> References: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> Message-ID: On Wed, 30 Apr 2025 10:30:33 GMT, Daniel Lund?n wrote: > Certain idealizations introduce more new nodes than expected when adding the new assert in the changeset for [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833). The limit checked by the new assert is too optimistic. > > ### Changeset > > Tweak the maximum live node increase per iteration in the main IGVN loop from `NodeLimitFudgeFactor * 2` (4000 by default) to `NodeLimitFudgeFactor * 3` (6000 by default). This change does not only affect the newly added assert in [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833), but also the IGVN live node count bailout which is `MaxNodeLimit` minus the maximum live node increase per iteration. That is, the bailout by default is currently at 80000 - 4000 = 76000 live nodes, and 80000 - 6000 = 74000 live nodes after this changeset. In practice, the difference does not matter (see Testing below). > > The motivation for just tweaking the limit and keeping the assert added by [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833) is that individual IGVN transformations (within a single iteration of the IGVN loop) should, in theory, only affect a local set of nodes in the ideal graph. Therefore, the assert is a good sanity check that various transformations (current ones and whatever we might add in the future) do not scale in the size of the ideal graph (i.e., they are local transformations). > > I have not managed to construct a reliable regression test, as triggering the assert is difficult (highly intermittent). Also, the issue is benign (a too optimistic limit). > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14594986152) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Checked IGVN live node count bailouts in DaCapo, Renaissance, SPECjvm, and SPECjbb and observed no bailouts before nor after this changeset. @dlunde Did I understand this right: a single node was transformed, and it created over 4k new nodes? DEBUG_ONLY(int live_nodes_before = C->live_nodes();) Node* nn = transform_old(n); DEBUG_ONLY(int live_nodes_after = C->live_nodes();) Do you know which node was transformed, and what exactly happens there? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24960#issuecomment-2855273059 From epeter at openjdk.org Tue May 6 16:56:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 May 2025 16:56:17 GMT Subject: RFR: 8355674: C2: Partial Peeling should not introduce Phi nodes above OpaqueInitializedAssertionPredicate nodes In-Reply-To: References: Message-ID: On Fri, 2 May 2025 13:44:10 GMT, Christian Hagedorn wrote: > In the test case, we have two Initialized Assertion Predicate that share the same `Bool` which is perfectly fine: > > ![image](https://github.com/user-attachments/assets/a7a1673f-b5df-49e8-8a22-c35aa0ee1693) > > These Initialized Assertion Predicates were created for loops that have been folded away. They then end up in a new inner most loop which is partial peeled. Partial Peeling finds that we need to do the cut between `580 IfTrue` and `581 If`. This means, that the Initialized Assertion Predicate `569 RangeCheck` with its `535 OpaqueInitializedAssertionPredicate` is in the peel set and the second Initialized Assertion Predicate `504 RangeCheck` with its `503 OpaqueInitializedAssertionPredicate` is in the not peel set. As a result of that, we are introducing a `Phi` node between an `OpaqueInitializedAssertionPredicate` and a `Bool` node: > > ![image](https://github.com/user-attachments/assets/3bbc2b88-300a-4c40-99ac-056cbeab822a) > > We eventually remove the `OpaqueInitializedAssertionPredicate` and are left with the following graph shape > > ![image](https://github.com/user-attachments/assets/e839366f-119b-4411-b422-84639c68aa80) > > which cannot be handled by the backend. > > The fix I propose is to prohibit Partial Peeling from inserting such a `Phi` node by updating `clone_for_special_use_inside_loop()` which takes care of not inserting phis for an `If/Bool`. We need to also special case `OpaqueInitializedAssertionPredicate`. > > Thanks, > Christian Thanks for fixing this, looks reasonable :) test/hotspot/jtreg/compiler/predicates/assertion/TestPhiAboveOpaqueInitializedAssertionPredicate.java line 29: > 27: * @summary Check that we do not introduce a Phi above a OpaqueInitializedAssertionPredicateNode during Partial Peeling. > 28: * @run main/othervm -Xcomp -XX:CompileOnly=compiler.predicates.assertion.TestPhiAboveOpaqueInitializedAssertionPredicate::test > 29: * compiler.predicates.assertion.TestPhiAboveOpaqueInitializedAssertionPredicate What about a run without `Xcomp`? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25006#pullrequestreview-2819013567 PR Review Comment: https://git.openjdk.org/jdk/pull/25006#discussion_r2075900903 From epeter at openjdk.org Tue May 6 17:00:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 May 2025 17:00:25 GMT Subject: RFR: 8356084: C2: Data is wrongly rewired to Initialized Assertion Predicates instead of Template Assertion Predicates In-Reply-To: <9OvLoKhN1DbjqW9IpXAeQ-Xt-MwyBZXrseoIyXmjNqo=.eed50620-9c5d-4f2e-876c-c51e42fd8c2e@github.com> References: <9OvLoKhN1DbjqW9IpXAeQ-Xt-MwyBZXrseoIyXmjNqo=.eed50620-9c5d-4f2e-876c-c51e42fd8c2e@github.com> Message-ID: On Fri, 2 May 2025 14:14:26 GMT, Christian Hagedorn wrote: > Before the Assertion Predicate refactorings, we rewired data dependencies either to the newly created Initialized Assertion Predicates (for Loop Peeling) or to the zero trip guard (for main and post loops). Both was incomplete when we further split a loop - we missed to update these data dependencies accordingly. > > Now that the (almost) complete Assertion Predicate fix is in with [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577), we are now finally able to fix this by always rewiring the data dependencies to the Template Assertion Predicates which will be kept until either no more loop splitting can be done for a loop or until loop opts are over. > > We could have already fixed that with JDK-8350577 but it was simply missed. As an intermediate solution, we always rewired the data dependencies to the Initialized Assertion Predicates which only worked in some cases when the Initialized Assertion Predicates were folded away: They ended up at the Template Assertion Predicates above and from there we could update the data dependencies further. But if that did not happen, we could not find these data dependencies at the Template Assertion Predicates and failed to further update them when the loop was split again. As a result, we could perform some loads too early and crash (not observable, though). > > How we could end up with such a crash is described in the newly added regression test `testPeelingThreeTimesDataUpdate()`. Here is a snippet from the graph after applying Loop Peeling several times without the patch: > > ![image](https://github.com/user-attachments/assets/c40f5918-3ef4-4c4b-ab7d-dc4fdbf41fdf) > > All `LoadN` data dependencies are piled up at an Initialized Assertion Predicate from where we can no longer update them in further loop splitting optimizations because we only look at Template Assertion Predicates for that. By correctly rewiring the data dependencies to Template Assertion Predicates, we fix this which is proposed with this patch. > > This was found by a new stress peeling mode ([JDK-8355488](https://bugs.openjdk.org/browse/JDK-8355488)) @marc-chevalier > is currently working on. I was able to come up with a reproducer that does not use the new stressing but it shows that the new stressing is useful in finding hard to discover bugs. > > Thanks, > Christian Looks reasonable, thanks for fixing this :) test/hotspot/jtreg/compiler/predicates/assertion/TestAssertionPredicates.java line 185: > 183: * @test id=StressXcompMaxUnroll0 > 184: * @key randomness > 185: * @bug 8288981 Suggestion: * @bug 8288981 8356084 ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25007#pullrequestreview-2819020611 PR Review Comment: https://git.openjdk.org/jdk/pull/25007#discussion_r2075905349 From chagedorn at openjdk.org Tue May 6 17:12:29 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 May 2025 17:12:29 GMT Subject: RFR: 8356084: C2: Data is wrongly rewired to Initialized Assertion Predicates instead of Template Assertion Predicates [v2] In-Reply-To: <9OvLoKhN1DbjqW9IpXAeQ-Xt-MwyBZXrseoIyXmjNqo=.eed50620-9c5d-4f2e-876c-c51e42fd8c2e@github.com> References: <9OvLoKhN1DbjqW9IpXAeQ-Xt-MwyBZXrseoIyXmjNqo=.eed50620-9c5d-4f2e-876c-c51e42fd8c2e@github.com> Message-ID: > Before the Assertion Predicate refactorings, we rewired data dependencies either to the newly created Initialized Assertion Predicates (for Loop Peeling) or to the zero trip guard (for main and post loops). Both was incomplete when we further split a loop - we missed to update these data dependencies accordingly. > > Now that the (almost) complete Assertion Predicate fix is in with [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577), we are now finally able to fix this by always rewiring the data dependencies to the Template Assertion Predicates which will be kept until either no more loop splitting can be done for a loop or until loop opts are over. > > We could have already fixed that with JDK-8350577 but it was simply missed. As an intermediate solution, we always rewired the data dependencies to the Initialized Assertion Predicates which only worked in some cases when the Initialized Assertion Predicates were folded away: They ended up at the Template Assertion Predicates above and from there we could update the data dependencies further. But if that did not happen, we could not find these data dependencies at the Template Assertion Predicates and failed to further update them when the loop was split again. As a result, we could perform some loads too early and crash (not observable, though). > > How we could end up with such a crash is described in the newly added regression test `testPeelingThreeTimesDataUpdate()`. Here is a snippet from the graph after applying Loop Peeling several times without the patch: > > ![image](https://github.com/user-attachments/assets/c40f5918-3ef4-4c4b-ab7d-dc4fdbf41fdf) > > All `LoadN` data dependencies are piled up at an Initialized Assertion Predicate from where we can no longer update them in further loop splitting optimizations because we only look at Template Assertion Predicates for that. By correctly rewiring the data dependencies to Template Assertion Predicates, we fix this which is proposed with this patch. > > This was found by a new stress peeling mode ([JDK-8355488](https://bugs.openjdk.org/browse/JDK-8355488)) @marc-chevalier > is currently working on. I was able to come up with a reproducer that does not use the new stressing but it shows that the new stressing is useful in finding hard to discover bugs. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/predicates/assertion/TestAssertionPredicates.java Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25007/files - new: https://git.openjdk.org/jdk/pull/25007/files/31edbb8f..19ede0a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25007&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25007&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25007.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25007/head:pull/25007 PR: https://git.openjdk.org/jdk/pull/25007 From epeter at openjdk.org Tue May 6 17:12:29 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 May 2025 17:12:29 GMT Subject: RFR: 8356084: C2: Data is wrongly rewired to Initialized Assertion Predicates instead of Template Assertion Predicates [v2] In-Reply-To: References: <9OvLoKhN1DbjqW9IpXAeQ-Xt-MwyBZXrseoIyXmjNqo=.eed50620-9c5d-4f2e-876c-c51e42fd8c2e@github.com> Message-ID: On Tue, 6 May 2025 17:09:48 GMT, Christian Hagedorn wrote: >> Before the Assertion Predicate refactorings, we rewired data dependencies either to the newly created Initialized Assertion Predicates (for Loop Peeling) or to the zero trip guard (for main and post loops). Both was incomplete when we further split a loop - we missed to update these data dependencies accordingly. >> >> Now that the (almost) complete Assertion Predicate fix is in with [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577), we are now finally able to fix this by always rewiring the data dependencies to the Template Assertion Predicates which will be kept until either no more loop splitting can be done for a loop or until loop opts are over. >> >> We could have already fixed that with JDK-8350577 but it was simply missed. As an intermediate solution, we always rewired the data dependencies to the Initialized Assertion Predicates which only worked in some cases when the Initialized Assertion Predicates were folded away: They ended up at the Template Assertion Predicates above and from there we could update the data dependencies further. But if that did not happen, we could not find these data dependencies at the Template Assertion Predicates and failed to further update them when the loop was split again. As a result, we could perform some loads too early and crash (not observable, though). >> >> How we could end up with such a crash is described in the newly added regression test `testPeelingThreeTimesDataUpdate()`. Here is a snippet from the graph after applying Loop Peeling several times without the patch: >> >> ![image](https://github.com/user-attachments/assets/c40f5918-3ef4-4c4b-ab7d-dc4fdbf41fdf) >> >> All `LoadN` data dependencies are piled up at an Initialized Assertion Predicate from where we can no longer update them in further loop splitting optimizations because we only look at Template Assertion Predicates for that. By correctly rewiring the data dependencies to Template Assertion Predicates, we fix this which is proposed with this patch. >> >> This was found by a new stress peeling mode ([JDK-8355488](https://bugs.openjdk.org/browse/JDK-8355488)) @marc-chevalier >> is currently working on. I was able to come up with a reproducer that does not use the new stressing but it shows that the new stressing is useful in finding hard to discover bugs. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/predicates/assertion/TestAssertionPredicates.java > > Co-authored-by: Emanuel Peter Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25007#pullrequestreview-2819051345 From chagedorn at openjdk.org Tue May 6 17:12:30 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 May 2025 17:12:30 GMT Subject: RFR: 8356084: C2: Data is wrongly rewired to Initialized Assertion Predicates instead of Template Assertion Predicates [v2] In-Reply-To: References: <9OvLoKhN1DbjqW9IpXAeQ-Xt-MwyBZXrseoIyXmjNqo=.eed50620-9c5d-4f2e-876c-c51e42fd8c2e@github.com> Message-ID: <9usOmTiN_JT-dMH9BPhsLBude-sCescycHNFeF0MgX8=.07c4100b-4d0d-4548-9d8e-6a7d5a029ef0@github.com> On Tue, 6 May 2025 16:56:10 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Update test/hotspot/jtreg/compiler/predicates/assertion/TestAssertionPredicates.java >> >> Co-authored-by: Emanuel Peter > > test/hotspot/jtreg/compiler/predicates/assertion/TestAssertionPredicates.java line 185: > >> 183: * @test id=StressXcompMaxUnroll0 >> 184: * @key randomness >> 185: * @bug 8288981 > > Suggestion: > > * @bug 8288981 8356084 Good point, updated! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25007#discussion_r2075922435 From chagedorn at openjdk.org Tue May 6 17:17:17 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 May 2025 17:17:17 GMT Subject: RFR: 8356084: C2: Data is wrongly rewired to Initialized Assertion Predicates instead of Template Assertion Predicates [v2] In-Reply-To: References: <9OvLoKhN1DbjqW9IpXAeQ-Xt-MwyBZXrseoIyXmjNqo=.eed50620-9c5d-4f2e-876c-c51e42fd8c2e@github.com> Message-ID: On Tue, 6 May 2025 17:12:29 GMT, Christian Hagedorn wrote: >> Before the Assertion Predicate refactorings, we rewired data dependencies either to the newly created Initialized Assertion Predicates (for Loop Peeling) or to the zero trip guard (for main and post loops). Both was incomplete when we further split a loop - we missed to update these data dependencies accordingly. >> >> Now that the (almost) complete Assertion Predicate fix is in with [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577), we are now finally able to fix this by always rewiring the data dependencies to the Template Assertion Predicates which will be kept until either no more loop splitting can be done for a loop or until loop opts are over. >> >> We could have already fixed that with JDK-8350577 but it was simply missed. As an intermediate solution, we always rewired the data dependencies to the Initialized Assertion Predicates which only worked in some cases when the Initialized Assertion Predicates were folded away: They ended up at the Template Assertion Predicates above and from there we could update the data dependencies further. But if that did not happen, we could not find these data dependencies at the Template Assertion Predicates and failed to further update them when the loop was split again. As a result, we could perform some loads too early and crash (not observable, though). >> >> How we could end up with such a crash is described in the newly added regression test `testPeelingThreeTimesDataUpdate()`. Here is a snippet from the graph after applying Loop Peeling several times without the patch: >> >> ![image](https://github.com/user-attachments/assets/c40f5918-3ef4-4c4b-ab7d-dc4fdbf41fdf) >> >> All `LoadN` data dependencies are piled up at an Initialized Assertion Predicate from where we can no longer update them in further loop splitting optimizations because we only look at Template Assertion Predicates for that. By correctly rewiring the data dependencies to Template Assertion Predicates, we fix this which is proposed with this patch. >> >> This was found by a new stress peeling mode ([JDK-8355488](https://bugs.openjdk.org/browse/JDK-8355488)) @marc-chevalier >> is currently working on. I was able to come up with a reproducer that does not use the new stressing but it shows that the new stressing is useful in finding hard to discover bugs. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Update test/hotspot/jtreg/compiler/predicates/assertion/TestAssertionPredicates.java > > Co-authored-by: Emanuel Peter Thanks Emanuel for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25007#issuecomment-2855345925 From rehn at openjdk.org Tue May 6 17:19:19 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 6 May 2025 17:19:19 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v8] In-Reply-To: <5e1o1xtN0ZdQZGJi2aVmgCEApW625koeE9F53VhDi5E=.2390045d-844e-4800-8d4b-075a2a3a8793@github.com> References: <5e1o1xtN0ZdQZGJi2aVmgCEApW625koeE9F53VhDi5E=.2390045d-844e-4800-8d4b-075a2a3a8793@github.com> Message-ID: On Mon, 5 May 2025 10:17:27 GMT, Yuri Gaevsky wrote: >> The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. >> >> Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. > > Yuri Gaevsky has updated the pull request incrementally with one additional commit since the last revision: > > change slli+add sequence to shadd Hey, I'm sorry for not explaining this proper, maybe this helps: You have four coefficients - you want to process a batch of four, _OR_ a mutiple of four. This batch of four - we call this a lane: int lane = array[currentIndex] * m_pow_3 + array[currentIndex + 1] * m_pow_2 + array[currentIndex + 2] * m_pow_1 + array[currentIndex + 3] * m_pow_0; hashCode = hashCode * m_pow_4 + lane; You can process mutiple lanes by doing: int lane_1 = array[currentIndex ] * m_pow_3 + array[currentIndex + 1] * m_pow_2 + array[currentIndex + 2] * m_pow_1 + array[currentIndex + 3] * m_pow_0; int lane_2 = array[currentIndex+4] * m_pow_3 + array[currentIndex + 5] * m_pow_2 + array[currentIndex + 6] * m_pow_1 + array[currentIndex + 7] * m_pow_0; hashCode = hashCode * m_pow_4 + lane1; hashCode = hashCode * m_pow_4 + lane2; So for example you could layout the data like below using vlse32.v, strided load. v2 = array[currentIndex] | array[currentIndex+4] | .... | array[currentIndex+n*4] v4 = array[currentIndex+1] | array[currentIndex+5] | .... | array[currentIndex+1+n*4] v6 = array[currentIndex+2] | array[currentIndex+6] | .... | array[currentIndex+2+n*4] v8 = array[currentIndex+3] | array[currentIndex+7] | .... | array[currentIndex+3+n*4] v10 = sum lane 1 | sum lane 2 | .... | sum lane n Now you can multiple every element in v2 with m_pow_3 without knowing the length of v2 (i.e. LMUL can be 1 or 8). Then sum each lane into v10, and finally for each lane mutiple hashcode by m_pow_4 and add that lane sum. When this is done, you have 0-3 elements left you can process with scalar. So when you do: `vsetvli vl_processing, count/4, emul, lmul` vl_processing == number of lanes. There is no need to know the length of the vector registers. NOTE: I'm not saying this is better or faster than your version - it's hopefully an example of a vector length agnostic approach. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-2855340992 From chagedorn at openjdk.org Tue May 6 17:21:55 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 May 2025 17:21:55 GMT Subject: RFR: 8355674: C2: Partial Peeling should not introduce Phi nodes above OpaqueInitializedAssertionPredicate nodes [v2] In-Reply-To: References: Message-ID: > In the test case, we have two Initialized Assertion Predicate that share the same `Bool` which is perfectly fine: > > ![image](https://github.com/user-attachments/assets/a7a1673f-b5df-49e8-8a22-c35aa0ee1693) > > These Initialized Assertion Predicates were created for loops that have been folded away. They then end up in a new inner most loop which is partial peeled. Partial Peeling finds that we need to do the cut between `580 IfTrue` and `581 If`. This means, that the Initialized Assertion Predicate `569 RangeCheck` with its `535 OpaqueInitializedAssertionPredicate` is in the peel set and the second Initialized Assertion Predicate `504 RangeCheck` with its `503 OpaqueInitializedAssertionPredicate` is in the not peel set. As a result of that, we are introducing a `Phi` node between an `OpaqueInitializedAssertionPredicate` and a `Bool` node: > > ![image](https://github.com/user-attachments/assets/3bbc2b88-300a-4c40-99ac-056cbeab822a) > > We eventually remove the `OpaqueInitializedAssertionPredicate` and are left with the following graph shape > > ![image](https://github.com/user-attachments/assets/e839366f-119b-4411-b422-84639c68aa80) > > which cannot be handled by the backend. > > The fix I propose is to prohibit Partial Peeling from inserting such a `Phi` node by updating `clone_for_special_use_inside_loop()` which takes care of not inserting phis for an `If/Bool`. We need to also special case `OpaqueInitializedAssertionPredicate`. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: add run with Xbatch ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25006/files - new: https://git.openjdk.org/jdk/pull/25006/files/7fcb8245..5c1bfaea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25006&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25006&range=00-01 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25006.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25006/head:pull/25006 PR: https://git.openjdk.org/jdk/pull/25006 From epeter at openjdk.org Tue May 6 17:21:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 6 May 2025 17:21:55 GMT Subject: RFR: 8355674: C2: Partial Peeling should not introduce Phi nodes above OpaqueInitializedAssertionPredicate nodes [v2] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 17:18:50 GMT, Christian Hagedorn wrote: >> In the test case, we have two Initialized Assertion Predicate that share the same `Bool` which is perfectly fine: >> >> ![image](https://github.com/user-attachments/assets/a7a1673f-b5df-49e8-8a22-c35aa0ee1693) >> >> These Initialized Assertion Predicates were created for loops that have been folded away. They then end up in a new inner most loop which is partial peeled. Partial Peeling finds that we need to do the cut between `580 IfTrue` and `581 If`. This means, that the Initialized Assertion Predicate `569 RangeCheck` with its `535 OpaqueInitializedAssertionPredicate` is in the peel set and the second Initialized Assertion Predicate `504 RangeCheck` with its `503 OpaqueInitializedAssertionPredicate` is in the not peel set. As a result of that, we are introducing a `Phi` node between an `OpaqueInitializedAssertionPredicate` and a `Bool` node: >> >> ![image](https://github.com/user-attachments/assets/3bbc2b88-300a-4c40-99ac-056cbeab822a) >> >> We eventually remove the `OpaqueInitializedAssertionPredicate` and are left with the following graph shape >> >> ![image](https://github.com/user-attachments/assets/e839366f-119b-4411-b422-84639c68aa80) >> >> which cannot be handled by the backend. >> >> The fix I propose is to prohibit Partial Peeling from inserting such a `Phi` node by updating `clone_for_special_use_inside_loop()` which takes care of not inserting phis for an `If/Bool`. We need to also special case `OpaqueInitializedAssertionPredicate`. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > add run with Xbatch Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25006#pullrequestreview-2819076911 From chagedorn at openjdk.org Tue May 6 17:21:56 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 May 2025 17:21:56 GMT Subject: RFR: 8355674: C2: Partial Peeling should not introduce Phi nodes above OpaqueInitializedAssertionPredicate nodes [v2] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 16:53:11 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> add run with Xbatch > > test/hotspot/jtreg/compiler/predicates/assertion/TestPhiAboveOpaqueInitializedAssertionPredicate.java line 29: > >> 27: * @summary Check that we do not introduce a Phi above a OpaqueInitializedAssertionPredicateNode during Partial Peeling. >> 28: * @run main/othervm -Xcomp -XX:CompileOnly=compiler.predicates.assertion.TestPhiAboveOpaqueInitializedAssertionPredicate::test >> 29: * compiler.predicates.assertion.TestPhiAboveOpaqueInitializedAssertionPredicate > > What about a run without `Xcomp`? We would probably then also need to add some more invocations of `test()` inside `main()`. Probably does not hurt. The test is quite fast. I've updated the test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25006#discussion_r2075933957 From chagedorn at openjdk.org Tue May 6 17:24:15 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 6 May 2025 17:24:15 GMT Subject: RFR: 8355674: C2: Partial Peeling should not introduce Phi nodes above OpaqueInitializedAssertionPredicate nodes In-Reply-To: References: Message-ID: <9ZVGT3XmqmOzVRCrBtiY5MKP8zfK9ENX-2yfoYDR4DM=.7a7312f0-d44c-4320-8174-07dd01be927d@github.com> On Fri, 2 May 2025 13:44:10 GMT, Christian Hagedorn wrote: > In the test case, we have two Initialized Assertion Predicate that share the same `Bool` which is perfectly fine: > > ![image](https://github.com/user-attachments/assets/a7a1673f-b5df-49e8-8a22-c35aa0ee1693) > > These Initialized Assertion Predicates were created for loops that have been folded away. They then end up in a new inner most loop which is partial peeled. Partial Peeling finds that we need to do the cut between `580 IfTrue` and `581 If`. This means, that the Initialized Assertion Predicate `569 RangeCheck` with its `535 OpaqueInitializedAssertionPredicate` is in the peel set and the second Initialized Assertion Predicate `504 RangeCheck` with its `503 OpaqueInitializedAssertionPredicate` is in the not peel set. As a result of that, we are introducing a `Phi` node between an `OpaqueInitializedAssertionPredicate` and a `Bool` node: > > ![image](https://github.com/user-attachments/assets/3bbc2b88-300a-4c40-99ac-056cbeab822a) > > We eventually remove the `OpaqueInitializedAssertionPredicate` and are left with the following graph shape > > ![image](https://github.com/user-attachments/assets/e839366f-119b-4411-b422-84639c68aa80) > > which cannot be handled by the backend. > > The fix I propose is to prohibit Partial Peeling from inserting such a `Phi` node by updating `clone_for_special_use_inside_loop()` which takes care of not inserting phis for an `If/Bool`. We need to also special case `OpaqueInitializedAssertionPredicate`. > > Thanks, > Christian Thanks for your review Emanuel! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25006#issuecomment-2855365073 From jbhateja at openjdk.org Tue May 6 17:28:48 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 17:28:48 GMT Subject: RFR: 8349138: Optimize Math.copySign API for Intel e-core targets [v4] In-Reply-To: References: Message-ID: > Math.copySign is only intrinsified on x86 targets supporting the AVX512 feature. > Intel E-core Xeons support only the AVX2 feature set and still compile Java implementation which is composed of logical operations. > > Since there is a 3-cycle penalty for copying incoming float/double values to GPRs before being operated upon by logical operation there is an opportunity to optimize this using an efficient instruction sequence. > > Patch uses ANDPS and ANDPD logical instruction to generate efficient instruction sequences to absorb domain copy over penalty. Also, performs minor tuning for existing AVX512 instruction sequence based on VPTERNLOG instruction. > > Following are the performance numbers of the following existing microbenchmark > https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/vm/compiler/Signum.java > > Patch passes following validation test > [test/jdk/java/lang/Math/IeeeRecommendedTests.java > ](https://github.com/openjdk/jdk/blob/master/test/jdk/java/lang/Math/IeeeRecommendedTests.java) > > > Granite Rapids-AP (P-core Xeon) > Baseline AVX512: > Benchmark Mode Cnt Score Error Units > Signum._5_copySignFloatTest thrpt 2 1296.141 ops/ns > Signum._7_copySignDoubleTest thrpt 2 838.954 ops/ns > > Withopt : > Benchmark Mode Cnt Score Error Units > Signum._5_copySignFloatTest thrpt 2 940.240 ops/ns > Signum._7_copySignDoubleTest thrpt 2 967.370 ops/ns > > Baseline AVX2: > Benchmark Mode Cnt Score Error Units > Signum._5_copySignFloatTest thrpt 2 63.673 ops/ns > Signum._7_copySignDoubleTest thrpt 2 26.898 ops/ns > > Withopt : > Benchmark Mode Cnt Score Error Units > Signum._5_copySignFloatTest thrpt 2 785.801 ops/ns > Signum._7_copySignDoubleTest thrpt 2 558.710 ops/ns > > Sierra Forest (E-core Xeon) > Baseline: > Benchmark (seed) Mode Cnt Score Error Units > o.o.b.vm.compiler.Signum._5_copySignFloatTest N/A thrpt 2 40.528 ops/ns > o.o.b.vm.compiler.Signum._7_copySignDoubleTest N/A thrpt 2 25.101 ops/ns > > Withopt: > Benchmark (seed) Mode Cnt Score Error Units > o.o.b.vm.compiler.Signum._5_copySignFloatTest N/A thrpt 2 676.101 ops/ns > o.o.b.vm.compiler.Signum._7_copySignDoubleTest N/A thrpt 2 ... Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Review comments resolutions - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8349138 - Adding vector support along with some refactoring. - Adding IR framework verification test - 8349138: Optimize Math.copySign API for Intel e-core and p-core targets ------------- Changes: https://git.openjdk.org/jdk/pull/23386/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23386&range=03 Stats: 342 lines in 10 files changed: 304 ins; 5 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/23386.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23386/head:pull/23386 PR: https://git.openjdk.org/jdk/pull/23386 From dlunden at openjdk.org Tue May 6 17:49:16 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 6 May 2025 17:49:16 GMT Subject: RFR: 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000) In-Reply-To: References: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> Message-ID: On Tue, 6 May 2025 15:34:51 GMT, Christian Hagedorn wrote: >> Certain idealizations introduce more new nodes than expected when adding the new assert in the changeset for [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833). The limit checked by the new assert is too optimistic. >> >> ### Changeset >> >> Tweak the maximum live node increase per iteration in the main IGVN loop from `NodeLimitFudgeFactor * 2` (4000 by default) to `NodeLimitFudgeFactor * 3` (6000 by default). This change does not only affect the newly added assert in [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833), but also the IGVN live node count bailout which is `MaxNodeLimit` minus the maximum live node increase per iteration. That is, the bailout by default is currently at 80000 - 4000 = 76000 live nodes, and 80000 - 6000 = 74000 live nodes after this changeset. In practice, the difference does not matter (see Testing below). >> >> The motivation for just tweaking the limit and keeping the assert added by [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833) is that individual IGVN transformations (within a single iteration of the IGVN loop) should, in theory, only affect a local set of nodes in the ideal graph. Therefore, the assert is a good sanity check that various transformations (current ones and whatever we might add in the future) do not scale in the size of the ideal graph (i.e., they are local transformations). >> >> I have not managed to construct a reliable regression test, as triggering the assert is difficult (highly intermittent). Also, the issue is benign (a too optimistic limit). >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14594986152) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Checked IGVN live node count bailouts in DaCapo, Renaissance, SPECjvm, and SPECjbb and observed no bailouts before nor after this changeset. > > Looks resonable to me. @chhagedorn Thanks for the review! @eme64 > @dlunde Did I understand this right: a single node was transformed, and it created over 4k new nodes? Yes, a single call to `transform_old` resulted in more than 4k new nodes. > ``` > DEBUG_ONLY(int live_nodes_before = C->live_nodes();) > Node* nn = transform_old(n); > DEBUG_ONLY(int live_nodes_after = C->live_nodes();) > ``` > > Do you know which node was transformed, and what exactly happens there? I investigated, but did not manage to reproduce the failure locally (so I could not look at it in detail). No success with reproducing through replay files either. In Oracle-internal testing, the failure reproduces in only 1% of the test runs. I did do a simple dump of the nodes `n` and `nn` during an iteration that triggered the assert, and got the below. 19517 Phi === 1444 17794 19566 19568 19570 14402 19571 19573 19575 19576 19580 19584 19585 19591 19597 19598 19606 19617 19720 19704 19826 19703 19837 19618 19851 19701 19956 19700 19970 19698 19987 19697 20004 19695 20024 19694 20044 19692 20067 19691 20090 19689 20116 19688 20142 19686 20171 19685 20200 19683 20232 19682 20264 20301 19620 17860 20405 19680 20441 20480 19621 19678 20583 19677 20622 20666 20768 17855 20869 19675 20912 19623 20958 19673 21058 19672 21104 19624 17850 17850 21155 19670 21206 19668 21260 19667 21314 17845 21373 19665 21431 19663 21492 19662 21553 21615 19660 21678 19659 21742 21807 19657 21873 19656 21940 22008 19654 22077 19653 22147 22218 19651 22290 19650 22363 22437 19648 22512 19647 22588 22665 19645 22743 19644 22822 22902 19642 22983 19641 23065 23148 19639 23232 19638 23317 23403 19636 23490 19635 23578 23667 19633 23757 19632 23848 23940 19630 24033 19629 24127 24222 19627 24318 19626 24415 24513 [[ 19529 ]] #memory Memory: @java/lang/Long (java/io/Serializable,java/lang/Comparable,java/lang/constant/Constable,java/lang/constant/ConstantDesc):NotNull:exact+16 *,iid=4638, name=value, idx=32; !orig=19506,[2618],[39511],[764] !jvms: VarHandleTestByteArrayAsLong::testArrayReadWrite @ bci:83 (line 1059) 19517 Phi === 1444 17794 19566 19568 19570 14402 19571 19573 19575 19576 19580 19584 19585 19591 19597 19598 19606 19617 19720 19704 19826 19703 19837 19618 19851 19701 19956 19700 19970 19698 19987 19697 20004 19695 20024 19694 20044 19692 20067 19691 20090 19689 20116 19688 20142 19686 20171 19685 20200 19683 20232 19682 20264 20301 19620 17860 20405 19680 20441 20480 19621 19678 20583 19677 20622 20666 20768 17855 20869 19675 20912 19623 20958 19673 21058 19672 21104 19624 17850 17850 21155 19670 21206 19668 21260 19667 21314 17845 21373 19665 21431 19663 21492 19662 21553 21615 19660 21678 19659 21742 21807 19657 21873 19656 21940 22008 19654 22077 19653 22147 22218 19651 22290 19650 22363 22437 19648 22512 19647 22588 22665 19645 22743 19644 22822 22902 19642 22983 19641 23065 23148 19639 23232 19638 23317 23403 19636 23490 19635 23578 23667 19633 23757 19632 23848 23940 19630 24033 19629 24127 24222 19627 24318 19626 24415 24513 [[ 19529 ]] #memory Memory: @java/lang/Long (java/io/Serializable,java/lang/Comparable,java/lang/constant/Constable,java/lang/constant/ConstantDesc):NotNull:exact+16 *,iid=4638, name=value, idx=32; !orig=19506,[2618],[39511],[764] !jvms: VarHandleTestByteArrayAsLong::testArrayReadWrite @ bci:83 (line 1059) That is, a (locally unchanged) large Phi node. I would assume `PhiNode::Ideal` added 4k new nodes somewhere further up the inputs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24960#issuecomment-2855425252 From dlunden at openjdk.org Tue May 6 18:05:16 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 6 May 2025 18:05:16 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v2] In-Reply-To: <_jqax2exjj4DnvqP-lVK4kiwJ59C0XS6B8DE6quAHGc=.579945f7-36a6-49f8-9b04-c0fe63f60a5f@github.com> References: <3xXLZZOHl6oejisEzmNv206aQo4y6FuJoWhsOO_GWqM=.682d7701-baa2-4654-8216-e4de526456d1@github.com> <_jqax2exjj4DnvqP-lVK4kiwJ59C0XS6B8DE6quAHGc=.579945f7-36a6-49f8-9b04-c0fe63f60a5f@github.com> Message-ID: <4n5OGLVPn8sEuDgcJqZ5oKco3N_trnSxHNwyBawRQF4=.fe8ecf59-fb55-49c9-b8da-99efee63dde4@github.com> On Tue, 6 May 2025 09:38:37 GMT, Galder Zamarre?o wrote: >> Unfortunately, it was not straightforward to revert the fix for [JDK-8260420](https://bugs.openjdk.org/browse/JDK-8260420) (too many changes since then). If `!LCA_orig->dominates(pred_block) || early->dominates(pred_block)` failed at some point, then the new assert `early->dominates(LCA_orig)` must also fail in that situation (in theory). See the details in my other response above. > > Ok, maybe I was not clear enough. > > The comment says: > >> // Triggers an assert in PhaseCFG::raise_above_anti_dependences if loop strip mining verification is disabled: > > My question is, does the test on top of which the comment is placed (`test4` right?) ever run with loop strip mining verification disabled and if it does, does the assert get triggered? If `test4` does not do this, seems to me it would be nice to have an additional test that verifies just that rather than accept/assume the comment as valid without a test that actually verifies this. Thoughts? I would assume `test4` is a regression test and that the `assert` no longer triggers in any situation (otherwise we'd still have a bug)? Also, from what I can see, there is no VM flag that disables loop strip mining verification. Perhaps I'm still misunderstanding you? @TobiHartmann I see you added this test back in 2021, could you help bring us some clarity? This changeset only renames the occurrence of `insert_anti_dependences` in `TestSplitIfPinnedLoadInStripMinedLoop.java` to `raise_above_anti_dependences`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2076003886 From sparasa at openjdk.org Tue May 6 18:08:31 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 6 May 2025 18:08:31 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v19] In-Reply-To: References: Message-ID: > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: use size == EVEX_16bit to emit 0x66 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/67d9b3b9..ca6b83a3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=17-18 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From sparasa at openjdk.org Tue May 6 18:08:31 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 6 May 2025 18:08:31 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v18] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 18:39:14 GMT, Sandhya Viswanathan wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> remove unused functions: orw and evex_prefix_int8_operand_ndd > > src/hotspot/cpu/x86/assembler_x86.cpp line 12976: > >> 12974: if (pre == VEX_SIMD_66) { >> 12975: emit_int8(0x66); >> 12976: } > > We could do this based on size instead: if (size == EVEX_16bit). Please see updated code using size == EVEX_16bit > src/hotspot/cpu/x86/assembler_x86.cpp line 13044: > >> 13042: bool demote = is_demotable(no_flags, dst_enc, nds_enc); >> 13043: if (demote) { >> 13044: (size == EVEX_64bit) ? prefixq_and_encode(dst_enc) : prefix_and_encode(dst_enc); > > This could be: > (size == EVEX_64bit) ? prefixq(dst_enc) : prefix(dst_enc); The code was updated as (size == EVEX_64bit) ? (void) prefixq_and_encode(dst_enc) : (void) prefix_and_encode(dst_enc); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2076004105 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2076004952 From kvn at openjdk.org Tue May 6 18:10:20 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 6 May 2025 18:10:20 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 6 May 2025 13:28:28 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). Why the attribute is not set for `zLoadP` on x64? ------------- PR Review: https://git.openjdk.org/jdk/pull/25066#pullrequestreview-2819201282 From jbhateja at openjdk.org Tue May 6 18:11:21 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 6 May 2025 18:11:21 GMT Subject: RFR: 8351950: C2: masked vector MIN/MAX AVX512: SIGFPE / no valid evex tuple_table entry In-Reply-To: References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: On Tue, 6 May 2025 16:40:05 GMT, Emanuel Peter wrote: >> PR adds missing EVEX compressed displacement attributes used for computing the scale factor (N) of compressed displacement. >> AVX512 memory operand instructions use compressed disp8 encoding if the displacement is a multiple of scale (N), which depends on Vector Length, embedded broadcasting, and lane size. Please refer to section 2.7.5 of Intel SDM for more details. >> >> e.g., Consider two instructions, one with displacement 0x10203040 and the other with displacement 0x40, instruction operates over full 64-byte vector hence scale N = 64. Displacement of latter instruction is a multiple of scale, thus can be represented by 1 byte displacement encoding, while the former requires 4 bytes to represent displacement in instruction encoding. >> >> >> 1) vpternlogq $0xff,0x10203040(%r20,%r21,8),%zmm23,%zmm24 >> EVEX OP MR SIB DISP IMM >> --------------|----|----|----|---------------|-----| >> 62 6b c1 40 25 84 ec 40 30 20 10 ff >> >> 2) vpternlogq $0xff,0x40(%r20,%r21,8),%zmm23,%zmm24 >> For full vector width operation, scalar matches with vector size, hence scale N = 64 >> effective displacement / compressed DISP8 = OFFSET(64) / 64 = 0x1 >> EVEX OP MR SIB DISP IMM >> -------------|----|---|---|-----------|---| >> 62 6b c1 40 25 44 ec 01 ff >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > src/hotspot/cpu/x86/assembler_x86.cpp line 11542: > >> 11540: assert(vector_len == AVX_512bit || VM_Version::supports_avx512vl(), ""); >> 11541: InstructionAttr attributes(vector_len, /* vex_w */ true,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true); >> 11542: attributes.set_address_attributes(/* tuple_type */ EVEX_FV,/* input_size_in_bits */ EVEX_NObit); > > @jatin-bhateja How is this `fma` case related to the `min / max` cases that were reported? I did also not find a test below. Hi @eme64 , For tuple_type Fully Vector (FV) scale factor (N) does not take into account the lane size, thus EVEX_NObit is right argument here, using EVEX_32bit will not cause functional correctness as lane size is anyways ignored, but EVEX_NObit better conveys our intent. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25021#discussion_r2076011064 From vlivanov at openjdk.org Tue May 6 18:21:13 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 6 May 2025 18:21:13 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Tue, 6 May 2025 07:43:57 GMT, Marc Chevalier wrote: >> support arbitrary nodes to be lowered into leaf runtime calls. A leaf runtime call which doesn't depend or change memory state can be inserted at arbitrary points in the graph. So, an arbitrary data node can be lowered into a runtime call once the place to insert it is known/chosen. > Overall, I see the weaknesses of my design, but I'm not sure which direction to take instead. I suggest to experiment with untangling `ModF`/`ModD` from `CallLeaf`, making them expensive nodes (to avoid commoning during GVN) , and still lower them into `CallLeaf`. (It doesn't have to be part of existing macro expansion. Depending on implementation considerations, earlier or later may be more appropriate. But it should be expanded before RA kicks in.) The hard part is probably related to picking a point in CFG to insert the call, but the control the node has may be not suitable for that (e.g., if inputs don't dominate control anymore). In that case, updating control input during loop opts may be an option. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2855510094 From kvn at openjdk.org Tue May 6 18:42:13 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 6 May 2025 18:42:13 GMT Subject: RFR: 8356259: Lift basic -Xlog:jit* logging to "info" level In-Reply-To: <2fpJJXAU-vYZkTcjJtTiy5gie8wiw836gMv3kbcidXs=.47732a59-c5ce-4d66-9f40-8d78c657374f@github.com> References: <2fpJJXAU-vYZkTcjJtTiy5gie8wiw836gMv3kbcidXs=.47732a59-c5ce-4d66-9f40-8d78c657374f@github.com> Message-ID: <6hO6sv_xTTfD8CuETfeCFvN0oURmjfX9PDIwzd4EnG4=.35d3674c-6e9e-444f-af5f-bc47586530b9@github.com> On Tue, 6 May 2025 09:52:24 GMT, Aleksey Shipilev wrote: > We have unified logging for JIT activity: -Xlog:jit+compilation, -Xlog:jit+inlining, etc. These serve as convenient replacements for -XX:+PrintCompilation, -XX:+PrintInlining, etc. And these replacements are useful, because UL can be forwarded to file, their format can be adjusted, and they can be handled asynchronously. > > However, all useful messages are on "debug" level, which is inconvenient and surprising. It is reasonable to expect some level of basic logging when supplying -Xlog:jit+compilation, e.g. "info" level. I believe we should lift at least some of the logging to "info" level for these. > > Additional testing: > - [x] Eyeballing `-Xlog:jit*` logs after the patch > - [ ] Linux x86_64 server fastdebug, `all` PrintInlining and PrintIntrinsics are diagnostic flags (while PrintCompilation is product). So mapping UL `Info` to product flag and `Debug` to diagnostic seems valid. Based on this, I agree with changes to `CT::print_ul()` but not others. ------------- PR Review: https://git.openjdk.org/jdk/pull/25061#pullrequestreview-2819277571 From cjplummer at openjdk.org Tue May 6 18:51:18 2025 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 6 May 2025 18:51:18 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v13] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 06:31:43 GMT, Igor Veresov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 43 commits: > > - Merge branch 'master' into pp2 > - Fix compile > - Fix additional issues > - Make sure command line flags that affect MDO layout are consistent > - Fix semantics change from the previous commit > - Port 8355915: [leyden] Crash in MDO clearing the unloaded array type > - Fix flag behavior > - Fix log tags > - Remove the proxy class counter > - Address review comments part 2 > - ... and 33 more: https://git.openjdk.org/jdk/compare/e09d2e27...7d22a42a src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/memory/FileMapInfo.java line 129: > 127: metadataTypeArray[5] = db.lookupType("InstanceStackChunkKlass"); > 128: metadataTypeArray[6] = db.lookupType("Method"); > 129: metadataTypeArray[9] = db.lookupType("MethodData"); It looks like MethodData inheriting from Metadata is not a new change, but has always been the case. I'm surprised this didn't cause any test failures before your changes. Did you end up with test failures after your changes? src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/Threads.java line 154: > 152: if (!VM.getVM().isCore()) { > 153: virtualConstructor.addMapping("CompilerThread", CompilerThread.class); > 154: virtualConstructor.addMapping("TrainingReplayThread", TrainingReplayThread.class); The new SA TrainingReplayThread class is not needed since it only overrides isHiddenFromExternalView() to return true. You can instead use HiddenJavaThread.class here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2076064357 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2076058595 From rcastanedalo at openjdk.org Tue May 6 19:00:18 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 6 May 2025 19:00:18 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 6 May 2025 18:07:17 GMT, Vladimir Kozlov wrote: > Why the attribute is not set for `zLoadP` on x64? `ins_is_late_expanded_null_check_candidate` is set to `true` for `zLoadP` in [src/hotspot/cpu/x86/gc/z/z_x86_64.ad (line 121)](https://github.com/openjdk/jdk/pull/25066/files#diff-183d5784f9317f5582b267d82e7afa4e23ae137671fab8ba9cb5b502dae52b3dR121), or did I misunderstand your question? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2855603683 From shade at openjdk.org Tue May 6 19:18:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 6 May 2025 19:18:54 GMT Subject: RFR: 8356259: Lift basic -Xlog:jit* logging to "info" level [v2] In-Reply-To: <2fpJJXAU-vYZkTcjJtTiy5gie8wiw836gMv3kbcidXs=.47732a59-c5ce-4d66-9f40-8d78c657374f@github.com> References: <2fpJJXAU-vYZkTcjJtTiy5gie8wiw836gMv3kbcidXs=.47732a59-c5ce-4d66-9f40-8d78c657374f@github.com> Message-ID: > We have unified logging for JIT activity: -Xlog:jit+compilation, -Xlog:jit+inlining, etc. These serve as convenient replacements for -XX:+PrintCompilation, -XX:+PrintInlining, etc. And these replacements are useful, because UL can be forwarded to file, their format can be adjusted, and they can be handled asynchronously. > > However, all useful messages are on "debug" level, which is inconvenient and surprising. It is reasonable to expect some level of basic logging when supplying -Xlog:jit+compilation, e.g. "info" level. I believe we should lift at least some of the logging to "info" level for these. > > Additional testing: > - [x] Eyeballing `-Xlog:jit*` logs after the patch > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Only do jit+compilation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25061/files - new: https://git.openjdk.org/jdk/pull/25061/files/2e1b9e64..2b8c9576 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25061&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25061&range=00-01 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25061/head:pull/25061 PR: https://git.openjdk.org/jdk/pull/25061 From shade at openjdk.org Tue May 6 19:18:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 6 May 2025 19:18:55 GMT Subject: RFR: 8356259: Lift basic -Xlog:jit* logging to "info" level [v2] In-Reply-To: <6hO6sv_xTTfD8CuETfeCFvN0oURmjfX9PDIwzd4EnG4=.35d3674c-6e9e-444f-af5f-bc47586530b9@github.com> References: <2fpJJXAU-vYZkTcjJtTiy5gie8wiw836gMv3kbcidXs=.47732a59-c5ce-4d66-9f40-8d78c657374f@github.com> <6hO6sv_xTTfD8CuETfeCFvN0oURmjfX9PDIwzd4EnG4=.35d3674c-6e9e-444f-af5f-bc47586530b9@github.com> Message-ID: On Tue, 6 May 2025 18:39:43 GMT, Vladimir Kozlov wrote: > PrintInlining and PrintIntrinsics are diagnostic flags (while PrintCompilation is product). So mapping UL `Info` to product flag and `Debug` to diagnostic seems valid. Based on this, I agree with changes to `CT::print_ul()` but not others. I am mostly interested in `PrintCompilation` myself, so that would be an acceptable compromise. However, I do believe that `PrintInlining` along with `TraceTypeProfile` are very useful to figure out performance anomalies in the field. Those really should not be diagnostic, and UL should really be "info" for them :) But we can have that discussion at some point later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25061#issuecomment-2855647445 From iveresov at openjdk.org Tue May 6 21:50:34 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 6 May 2025 21:50:34 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v14] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24886/files - new: https://git.openjdk.org/jdk/pull/24886/files/7d22a42a..11e3c398 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=12-13 Stats: 36 lines in 2 files changed: 0 ins; 35 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From iveresov at openjdk.org Tue May 6 21:50:36 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 6 May 2025 21:50:36 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v13] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 18:48:03 GMT, Chris Plummer wrote: >> Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 43 commits: >> >> - Merge branch 'master' into pp2 >> - Fix compile >> - Fix additional issues >> - Make sure command line flags that affect MDO layout are consistent >> - Fix semantics change from the previous commit >> - Port 8355915: [leyden] Crash in MDO clearing the unloaded array type >> - Fix flag behavior >> - Fix log tags >> - Remove the proxy class counter >> - Address review comments part 2 >> - ... and 33 more: https://git.openjdk.org/jdk/compare/e09d2e27...7d22a42a > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/memory/FileMapInfo.java line 129: > >> 127: metadataTypeArray[5] = db.lookupType("InstanceStackChunkKlass"); >> 128: metadataTypeArray[6] = db.lookupType("Method"); >> 129: metadataTypeArray[9] = db.lookupType("MethodData"); > > It looks like MethodData inheriting from Metadata is not a new change, but has always been the case. I'm surprised this didn't cause any test failures before your changes. Did you end up with test failures after your changes? Honestly I don't remember, I think @iklam did these changes. > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/Threads.java line 154: > >> 152: if (!VM.getVM().isCore()) { >> 153: virtualConstructor.addMapping("CompilerThread", CompilerThread.class); >> 154: virtualConstructor.addMapping("TrainingReplayThread", TrainingReplayThread.class); > > The new SA TrainingReplayThread class is not needed since it only overrides isHiddenFromExternalView() to return true. You can instead use HiddenJavaThread.class here. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2076373507 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2076369998 From kvn at openjdk.org Tue May 6 22:58:14 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 6 May 2025 22:58:14 GMT Subject: RFR: 8356259: Lift basic -Xlog:jit* logging to "info" level [v2] In-Reply-To: References: <2fpJJXAU-vYZkTcjJtTiy5gie8wiw836gMv3kbcidXs=.47732a59-c5ce-4d66-9f40-8d78c657374f@github.com> Message-ID: On Tue, 6 May 2025 19:18:54 GMT, Aleksey Shipilev wrote: >> We have unified logging for JIT activity: -Xlog:jit+compilation, -Xlog:jit+inlining, etc. These serve as convenient replacements for -XX:+PrintCompilation, -XX:+PrintInlining, etc. And these replacements are useful, because UL can be forwarded to file, their format can be adjusted, and they can be handled asynchronously. >> >> However, all useful messages are on "debug" level, which is inconvenient and surprising. It is reasonable to expect some level of basic logging when supplying -Xlog:jit+compilation, e.g. "info" level. I believe we should lift at least some of the logging to "info" level for these. >> >> Additional testing: >> - [x] Eyeballing `-Xlog:jit*` logs after the patch >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Only do jit+compilation Trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25061#pullrequestreview-2819895653 From vlivanov at openjdk.org Tue May 6 23:21:18 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 6 May 2025 23:21:18 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v14] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 11:19:54 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > build fixes for non-x86 targets Very nice! I made a cleanup pass over the code [1]. Feel free to incorporate it or let me know if you have any questions/concerns. Meanwhile, submitted it for testing. [1] https://github.com/iwanowww/jdk/commit/35aeb88d0d5667c9e4f699bb9b3b7169af96446a ------------- PR Review: https://git.openjdk.org/jdk/pull/24329#pullrequestreview-2819173067 From vlivanov at openjdk.org Tue May 6 23:21:19 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 6 May 2025 23:21:19 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v14] In-Reply-To: References: <2ioSQVtfXhnqvAXqiadwR1HuJsz3t9nytY0wRps-x68=.35220ade-0e70-41c6-9ebd-a271e7dcb2bb@github.com> Message-ID: On Tue, 6 May 2025 08:45:15 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/vm_version_x86.hpp line 707: >> >>> 705: // >>> 706: static bool supports_cpuid() { return _features != 0; } >>> 707: static bool supports_cmov() { return (_features & CPU_CMOV) != 0; } >> >> Since you touch this code anyway, I suggest to use this opportunity to automatically derive this code using `CPU_FEATURE_FLAGS` macro. (As an example [1].) >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/vm_version_aarch64.hpp#L147 > > Unlike AARCH64, there is not a 1:1 mapping b/w CPU_* features and the corresponding support checkers; some AVX512 checkers use multiple features. Skipping this for now for consistency. Sure, I'm fine with addressing it separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2075993391 From dzhang at openjdk.org Wed May 7 01:01:26 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 7 May 2025 01:01:26 GMT Subject: RFR: 8356188: RISC-V: Cleanup effect of vmaskcmp_fp [v2] In-Reply-To: References: Message-ID: <7sHTZoexWFU06pFvpz-rl9yXtuhelzqIPn4oM8tKoYw=.389ba8fd-f373-46e0-9318-46868051d061@github.com> On Tue, 6 May 2025 02:42:32 GMT, Dingli Zhang wrote: >> Hi all, >> Please take a look and review this PR, thanks! >> >> See "The RISC-V Instruction Set Manual" at https://riscv.org/technical/specifications/. In the Vector Floating-Point Compare Instructions section: >> >> The destination mask vector register may be the same as the source vector mask register (v0). >> >> Also, the integer form of `vmaskcmp` has no effect too, so remove the effect of vmaskcmp_fp. >> >> ### Testing >> qemu-system 9.1.0 with UseRVV (ubuntu24.10): >> * [x] Run test/jdk/jdk/incubator/vector (fastdebug) > > Dingli Zhang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8356188 > - 8356188: RISC-V: Cleanup effect of vmaskcmp_fp Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25055#issuecomment-2856716186 From dzhang at openjdk.org Wed May 7 01:01:26 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 7 May 2025 01:01:26 GMT Subject: Integrated: 8356188: RISC-V: Cleanup effect of vmaskcmp_fp In-Reply-To: References: Message-ID: On Tue, 6 May 2025 01:34:33 GMT, Dingli Zhang wrote: > Hi all, > Please take a look and review this PR, thanks! > > See "The RISC-V Instruction Set Manual" at https://riscv.org/technical/specifications/. In the Vector Floating-Point Compare Instructions section: > > The destination mask vector register may be the same as the source vector mask register (v0). > > Also, the integer form of `vmaskcmp` has no effect too, so remove the effect of vmaskcmp_fp. > > ### Testing > qemu-system 9.1.0 with UseRVV (ubuntu24.10): > * [x] Run test/jdk/jdk/incubator/vector (fastdebug) This pull request has now been integrated. Changeset: acad0b49 Author: Dingli Zhang Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/acad0b4968f931a00664f18fd22ee97fdb001099 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8356188: RISC-V: Cleanup effect of vmaskcmp_fp Reviewed-by: fyang, gcao ------------- PR: https://git.openjdk.org/jdk/pull/25055 From fyang at openjdk.org Wed May 7 01:51:18 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 7 May 2025 01:51:18 GMT Subject: RFR: 8355699: RISC-V: support SUADD/SADD/SUSUB/SSUB [v2] In-Reply-To: <1Da_mkv8g0xGl13SPBP1Bo1EfDodNNOtXgt_lO8PaCU=.5a2cc392-94f4-4550-92ea-e04998acdda0@github.com> References: <1Da_mkv8g0xGl13SPBP1Bo1EfDodNNOtXgt_lO8PaCU=.5a2cc392-94f4-4550-92ea-e04998acdda0@github.com> Message-ID: On Tue, 6 May 2025 11:17:57 GMT, Hamlin Li wrote: > > What about the vector-scalar variants (vsaddu.vx, vsaddu.vi, etc.)? Do they help in any way? > > I think so, although not sure how much benefit it will bring, as it should be able to do a vmv first, then use the instructs in this patch, so there should be some improvement, but maybe just minor one. And for other operations, like (signed/unsigned) We already have some vector-scalar examples like `vadd_vx`, `vadd_vi` [1][2]. I guess it will be similar for this case as well. Maybe just replicate the scalar src2 to get a vector in the match rule will do, like: ` match(Set dst (SaturatingAddV src1 (Replicate src2))); ` [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv_v.ad#L446 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/riscv_v.ad#L417 > max/min, mulb/s/i/l/f/d, and so on, I think we can also introduce the _vx and _vi version. Maybe we could implement these bunch of instructs in another patch together? Sure, OK. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25005#issuecomment-2856784116 From duke at openjdk.org Wed May 7 02:10:56 2025 From: duke at openjdk.org (erifan) Date: Wed, 7 May 2025 02:10:56 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5] In-Reply-To: References: Message-ID: > This patch optimizes the following patterns: > For integer types: > > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. > > For float and double types: > > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: > > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 > testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 > testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 > testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 > testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 > testCompareLTMaskNotInt ops/s 1672180.09 995.238142 2353757.863 853.774734 1.4 > testCompareLTMaskNotLong ops/s 856502.26... erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Refactor code Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this optimization, making the code more modular. - Merge branch 'master' into JDK-8354242 - Update the jtreg test - Merge branch 'master' into JDK-8354242 - Addressed some review comments 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. 2. Improve code comments. - Merge branch 'master' into JDK-8354242 - Merge branch 'master' into JDK-8354242 - 8354242: VectorAPI: combine vector not operation with compare This patch optimizes the following patterns: For integer types: ``` (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) => (VectorMaskCmp src1 src2 ncond) (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) => (VectorMaskCmp src1 src2 ncond) ``` cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. For float and double types: ``` (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) ``` cond can be eq or ne. Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: ``` Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 testCompareLTMaskNotInt ops/s 1672180.09 995.238142 2353757.863 853.774734 1.4 testCompareLTMaskNotLong ops/s 856502.2695 12276.82851 1177671.815 496.723302 1.37 testCompareLTMaskNotShort ops/s 3325798.025 2412.702501 4711554.181 1779.302112 1.41 testCompareNEMaskNotByte ops/s 7910002.518 2771.82477 10245315.33 16321.93935 1.29 testCompareNEMaskNotDouble ops/s 863754.6022 523.140788 1179133.982 476.572178 1.36 testCompareNEMaskNotFloat ops/s 1723321.883 2598.484803 2358492.186 877.1401 1.36 testCompareNEMaskNotInt ops/s 1670288.841 751.774826 2354158.125 835.720163 1.4 testCompareNEMaskNotLong ops/s 836327.6835 410.525466 1178178.825 308.757932 1.4 testCompareNEMaskNotShort ops/s 3327815.841 1511.978763 4711379.136 2336.505531 1.41 testCompareUGEMaskNotByte ops/s 7906699.024 3200.936474 10253843.74 15067.59401 1.29 testCompareUGEMaskNotInt ops/s 1674003.923 3287.191727 2353340.666 951.381021 1.4 testCompareUGEMaskNotLong ops/s 852424.5562 8920.408939 1177943.609 389.6621 1.38 testCompareUGEMaskNotShort ops/s 3327255.858 1584.885143 4711622.355 1247.215277 1.41 testCompareUGTMaskNotByte ops/s 7909249.189 4435.283667 10245541.34 10993.34739 1.29 testCompareUGTMaskNotInt ops/s 1693713.433 20650.00213 2353153.787 1055.343846 1.38 testCompareUGTMaskNotLong ops/s 851022.3395 7079.065268 1177910.677 538.604598 1.38 testCompareUGTMaskNotShort ops/s 3327236.988 1616.886789 4711209.865 3098.494145 1.41 testCompareULEMaskNotByte ops/s 7909350.825 3251.262342 10261449.03 7273.831341 1.29 testCompareULEMaskNotInt ops/s 1672350.925 1545.304304 2353231.755 914.231193 1.4 testCompareULEMaskNotLong ops/s 853349.4765 9804.906913 1177967.254 435.044367 1.38 testCompareULEMaskNotShort ops/s 3325757.891 1555.062257 4712873.187 1650.986905 1.41 testCompareULTMaskNotByte ops/s 7912218.621 2633.477744 10242095.98 21921.39902 1.29 testCompareULTMaskNotInt ops/s 1673994.849 2672.507666 2353449.22 946.105757 1.4 testCompareULTMaskNotLong ops/s 849032.5868 10406.06689 1177586.047 506.541456 1.38 testCompareULTMaskNotShort ops/s 3328062.026 1892.991844 4713247.216 1855.983724 1.41 ``` With option `-XX:UseSVE=0`: ``` Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 7895961.919 72712.90804 7746493.731 71481.92938 0.98 testCompareEQMaskNotDouble ops/s 789811.0455 384.493088 766473.7994 2216.581793 0.97 testCompareEQMaskNotFloat ops/s 1806305.818 638.010451 1819616.613 3295.38958 1 testCompareEQMaskNotInt ops/s 1815820.144 1225.336135 1849538.401 766.29902 1.01 testCompareEQMaskNotLong ops/s 807336.492 335.451807 792732.9483 277.954432 0.98 testCompareEQMaskNotShort ops/s 4818266.38 1927.862665 4668903.001 1922.782715 0.96 testCompareGEMaskNotByte ops/s 7818439.678 75374.97739 16498003.98 41440.49653 2.11 testCompareGEMaskNotInt ops/s 1815159.05 1090.912209 2372095.779 1664.397112 1.3 testCompareGEMaskNotLong ops/s 804324.5575 2301.686878 927919.8507 371.766719 1.15 testCompareGEMaskNotShort ops/s 4818966.563 2443.643652 5385561.038 29558.37423 1.11 testCompareGTMaskNotByte ops/s 7893406.157 82687.74264 16470663.2 22165.55812 2.08 testCompareGTMaskNotInt ops/s 1815316.812 915.894106 2370447.198 655.016338 1.3 testCompareGTMaskNotLong ops/s 807019.456 526.525482 928079.0541 330.582693 1.15 testCompareGTMaskNotShort ops/s 4820552.881 1684.247747 5355902.93 5893.2915 1.11 testCompareLEMaskNotByte ops/s 7816263.323 79560.0015 16473621.19 56688.99585 2.1 testCompareLEMaskNotInt ops/s 1814915.724 926.998625 2368790.306 932.594778 1.3 testCompareLEMaskNotLong ops/s 806483.9 935.718082 928110.9074 407.096695 1.15 testCompareLEMaskNotShort ops/s 4813660.241 6817.870509 5357107.852 10061.47975 1.11 testCompareLTMaskNotByte ops/s 7838948.962 69136.4504 16424405.96 24464.75469 2.09 testCompareLTMaskNotInt ops/s 1815056.833 1187.6453 2369892.187 1103.819634 1.3 testCompareLTMaskNotLong ops/s 806602.1804 287.923365 928346.4118 617.682824 1.15 testCompareLTMaskNotShort ops/s 4817940.643 2767.1509 5372537.84 15397.47169 1.11 testCompareNEMaskNotByte ops/s 9078493.798 4630.339307 16484348.42 18925.88346 1.81 testCompareNEMaskNotDouble ops/s 661769.6272 398.712981 926763.5839 1808.843788 1.4 testCompareNEMaskNotFloat ops/s 1570527.252 563.642144 2312425.678 1815.844846 1.47 testCompareNEMaskNotInt ops/s 1619146.58 626.793854 2369711.543 942.330478 1.46 testCompareNEMaskNotLong ops/s 680201.5381 2252.836482 927808.6147 414.917863 1.36 testCompareNEMaskNotShort ops/s 3763508.054 3622.560798 5367808.015 8591.466599 1.42 testCompareUGEMaskNotByte ops/s 7886373.129 75917.74675 16480928.93 27524.31005 2.08 testCompareUGEMaskNotInt ops/s 1815636.832 750.036241 2369683.015 901.609404 1.3 testCompareUGEMaskNotLong ops/s 806862.5826 287.819616 928001.4394 361.063837 1.15 testCompareUGEMaskNotShort ops/s 4820581.361 2098.537435 5375854.248 25619.40165 1.11 testCompareUGTMaskNotByte ops/s 7891591.465 96614.93542 16410405.93 15012.37096 2.07 testCompareUGTMaskNotInt ops/s 1814871.179 662.825588 2371325.903 1170.491164 1.3 testCompareUGTMaskNotLong ops/s 804013.7658 2240.534209 928062.2169 531.306897 1.15 testCompareUGTMaskNotShort ops/s 4818150.337 3051.717685 5381449.337 21212.34187 1.11 testCompareULEMaskNotByte ops/s 7831540.628 81306.67253 16495250.78 38682.19675 2.1 testCompareULEMaskNotInt ops/s 1814484.14 687.860656 2369265.075 940.609586 1.3 testCompareULEMaskNotLong ops/s 807780.5749 769.876816 927538.0732 1278.267724 1.14 testCompareULEMaskNotShort ops/s 4817437.42 5141.336541 5356183.359 7015.608124 1.11 testCompareULTMaskNotByte ops/s 7849078.225 56753.59764 16395975.27 34043.67295 2.08 testCompareULTMaskNotInt ops/s 1814328.226 2697.219111 2370700.47 1991.841988 1.3 testCompareULTMaskNotLong ops/s 807166.8197 253.061506 927926.2803 252.933462 1.14 testCompareULTMaskNotShort ops/s 4821098.216 1625.959044 5348980.243 4100.768121 1.1 ``` Benchmarks on AMD EPYC 9124 16-Core Processor: With option `-XX:UseAVX=3`: ``` Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 16607323.35 1233692.631 18381557.66 1163201.522 1.1 testCompareEQMaskNotDouble ops/s 2114285.245 58782.2534 2959946.353 43016.0445 1.39 testCompareEQMaskNotFloat ops/s 4480874.437 89975.29074 6960151.436 64799.143 1.55 testCompareEQMaskNotInt ops/s 4370906.91 51784.80889 6856955.043 313858.5504 1.56 testCompareEQMaskNotLong ops/s 2080065.895 26762.06732 2939142.143 67179.05314 1.41 testCompareEQMaskNotShort ops/s 7968282.563 210437.2781 12701214.56 473152.6407 1.59 testCompareGEMaskNotByte ops/s 18419141.89 473408.9451 19880059.68 321638.0397 1.07 testCompareGEMaskNotInt ops/s 4419015.62 77352.98633 7037639.227 151066.0383 1.59 testCompareGEMaskNotLong ops/s 2147982.48 49227.42782 3000275.928 39298.75344 1.39 testCompareGEMaskNotShort ops/s 8469039.613 17833.19707 12288229.49 244317.8812 1.45 testCompareGTMaskNotByte ops/s 18728997.5 468328.8358 20544730.05 392264.6466 1.09 testCompareGTMaskNotInt ops/s 4510009.705 78812.57357 7364629.942 70970.78473 1.63 testCompareGTMaskNotLong ops/s 2124104.969 40917.89257 2953536.279 35199.19687 1.39 testCompareGTMaskNotShort ops/s 8690557.621 311534.1159 12344017.51 457931.8741 1.42 testCompareLEMaskNotByte ops/s 17758400.53 478383.4945 19209183.26 1143297.241 1.08 testCompareLEMaskNotInt ops/s 4363664.862 43443.18063 7054093.064 78141.11476 1.61 testCompareLEMaskNotLong ops/s 2068632.213 29844.78023 2954766.412 50667.22502 1.42 testCompareLEMaskNotShort ops/s 8637608.548 183538.5511 12719010.27 473568.8825 1.47 testCompareLTMaskNotByte ops/s 14406138.95 423105.0163 17292417.96 371386.9689 1.2 testCompareLTMaskNotInt ops/s 4546707.266 131977.3144 7040483.394 213590.4657 1.54 testCompareLTMaskNotLong ops/s 2123277.356 47243.21499 2848720.442 58896.97045 1.34 testCompareLTMaskNotShort ops/s 7570169.363 649873.6295 11945383.75 988276.5955 1.57 testCompareNEMaskNotByte ops/s 18274529.55 683396.7384 19081938.8 1118739.778 1.04 testCompareNEMaskNotDouble ops/s 2112533.61 43295.50012 2912115.441 78189.51083 1.37 testCompareNEMaskNotFloat ops/s 4628683.814 93817.07362 6967208.729 145135.8544 1.5 testCompareNEMaskNotInt ops/s 4470900.214 75974.50842 7286913.662 116328.5277 1.62 testCompareNEMaskNotLong ops/s 2134091.061 46377.94061 2934667.477 81675.46021 1.37 testCompareNEMaskNotShort ops/s 8790384.287 396161.8599 13076858.35 286272.1155 1.48 testCompareUGEMaskNotByte ops/s 18009150.9 660803.8886 17551258.33 1667014.843 0.97 testCompareUGEMaskNotInt ops/s 4442928.74 83190.81019 6854088.277 329008.8901 1.54 testCompareUGEMaskNotLong ops/s 2088357.736 71696.24791 2973202.26 63278.78974 1.42 testCompareUGEMaskNotShort ops/s 8348624.02 116562.7876 12832250.78 546869.3006 1.53 testCompareUGTMaskNotByte ops/s 17871101.25 800199.6321 19902619.81 214003.3262 1.11 testCompareUGTMaskNotInt ops/s 4088304.421 137797.9723 7135454.33 124553.651 1.74 testCompareUGTMaskNotLong ops/s 2070610.42 19881.82182 2991536.365 36260.60767 1.44 testCompareUGTMaskNotShort ops/s 8637099.341 155822.1608 12756579.77 186068.199 1.47 testCompareULEMaskNotByte ops/s 17940901.36 1258029.364 18932484.94 694554.6305 1.05 testCompareULEMaskNotInt ops/s 4369177.511 74982.31936 6392773.082 550171.2266 1.46 testCompareULEMaskNotLong ops/s 2135905.761 43693.63178 2877579.631 41651.56289 1.34 testCompareULEMaskNotShort ops/s 8607710.544 132655.1676 12446370.04 441718.3035 1.44 testCompareULTMaskNotByte ops/s 17409912.23 1033204.537 20607479.99 362000.5056 1.18 testCompareULTMaskNotInt ops/s 4386455.9 119192.1635 6920123.264 186158.2845 1.57 testCompareULTMaskNotLong ops/s 2064995.149 38622.2734 2988343.589 39037.90006 1.44 testCompareULTMaskNotShort ops/s 8642182.752 230919.2442 13029582.09 437101.4923 1.5 ``` The small amount of performance degradation is due to test fluctuations. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24674/files - new: https://git.openjdk.org/jdk/pull/24674/files/4fbf84e3..001fac0f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=03-04 Stats: 27362 lines in 755 files changed: 19223 ins; 4813 del; 3326 mod Patch: https://git.openjdk.org/jdk/pull/24674.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674 PR: https://git.openjdk.org/jdk/pull/24674 From duke at openjdk.org Wed May 7 02:10:56 2025 From: duke at openjdk.org (erifan) Date: Wed, 7 May 2025 02:10:56 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v3] In-Reply-To: References: <1Qo4pB9I7Ok4ntXSE-KkE0sv-Tp5EVCWriWnjcf2iEE=.a7e28640-85df-436a-9c82-3c067cc88dee@github.com> Message-ID: On Thu, 1 May 2025 07:32:22 GMT, erifan wrote: >> Yes, this discussion is down to `requires` vs `applyIf`. This is my argument for `applyIf`, quoted from above, I have not yet seen an argument against it: >> >>> If you use @require, then the person does not realize there is a test AND the test is not run. If you use applyIf, the person does not realize there is a test, but it is run at least for result verifiation - and then the person MIGHT realize if the test catches a wrong result / crash. >> >> In my understanding, `requires` should only be used if the test really **requires** a certain platform or feature. That can be because some flags are only available under certain platforms for example. But for IR tests, we should try to always use `applyIf`, because it allows testing on other platforms. >> >> Actually, I filed this RFE a while ago: https://bugs.openjdk.org/browse/JDK-8310891 >> We should try to move as many tests from using `requires` to `applyIf`, so that we have an increased test coverage. > > @eme64 @jatin-bhateja I have updated the test, thanks for your suggestion. > @erifan thanks for updating the tests! > > Now I had a quick look at the VM code. > > My biggest observation is this: > > Wrapping `VectorNode::Ideal` somewhere in the middle of your new optimization is going to make future optimizations here much harder. How would they check their conditions next to yours? That would be quite a mess. > > I suggest you do this: > > * `XorVNode::Ideal` does > > * checks `in1 == in2` case > * calls a method called `XorVNode::Ideal_XorV_VectorMaskCmp`. Check if it succeeded, i.e. returns `nullptr`. > * ... future optimizations could go here ... > * Finally, i.e. none of the optimizations above worked: call `VectorNode::Ideal` > > Then you pack all your new logic here into `XorVNode::Ideal_XorV_VectorMaskCmp`. You can also find a better name, it is just what I came up with just now. > > This gives us a much more **modular** design, and it is easier to add another new optimization to `XorVNode::Ideal`. It is easy to change the precedence of the optimizations by just changing the order, etc. > > Examples of this "modular" design: > > * `CMoveNode::Ideal` -> calls `TypeNode::Ideal` and `Ideal_minmax`. > * `StoreBNode::Ideal` -> calls `StoreNode::Ideal_masked_input` and `StoreNode::Ideal_sign_extended_input` > These are really nice, because you can quickly see what optimizations we already have, and in which order they are checked. Yes, this is a good idea, I have changed it like this, thanks for your suggestion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2856811448 From duke at openjdk.org Wed May 7 02:10:56 2025 From: duke at openjdk.org (erifan) Date: Wed, 7 May 2025 02:10:56 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v4] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 06:14:33 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.cpp line 2216: >> >>> 2214: in2->is_predicated_vector()) { >>> 2215: with_predicated = true; >>> 2216: } >> >> Suggestion: >> >> bool with_predicated = is_predicated_vector() || >> in1->is_predicated_vector() || >> in2->is_predicated_vector(); > > Would that not be easier to read? Yes, done, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2076645802 From duke at openjdk.org Wed May 7 02:10:58 2025 From: duke at openjdk.org (erifan) Date: Wed, 7 May 2025 02:10:58 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v4] In-Reply-To: References: Message-ID: <4ZhJjZtiOsmaTZzKbYFE99Q-J01WP_8kDB3Egx5znHY=.54b633f8-73ba-4083-8946-524c8bd6e47e@github.com> On Fri, 2 May 2025 06:16:03 GMT, Emanuel Peter wrote: >> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Update the jtreg test >> - Merge branch 'master' into JDK-8354242 >> - Addressed some review comments >> >> 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >> 2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >> This patch optimizes the following patterns: >> For integer types: >> ``` >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> ``` >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> ``` >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> ``` >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: >> With option `-XX:UseSVE=2`: >> ``` >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 1... > > src/hotspot/share/opto/vectornode.cpp line 2224: > >> 2222: // => (VectorMaskCmp src1 src2 ncond) >> 2223: // cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> 2224: // negative comparison of cond. > > Suggestion: > > // cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt. > // ncond is the negative comparison of cond. > > I was getting lost in all the commas. Done. > src/hotspot/share/opto/vectornode.cpp line 2248: > >> 2246: !((VectorMaskCmpNode*) in1)->predicate_can_be_inverted() || >> 2247: !VectorNode::is_all_ones_vector(in2)) { >> 2248: return VectorNode::Ideal(phase, can_reshape); > > Hmm, so this is really the "else" case, if your optimization does not succeed, right? > > Wrapping `VectorNode::Ideal` somewhere in the middle is going to make future optimizations here much harder. > How would they check their conditions next to yours? That would be quite a mess. > > I suggest you do this: > - `XorVNode::Ideal` does > - checks `in1 == in2` case > - calls a method called `XorVNode::Ideal_XorV_VectorMaskCmp`. Check if it succeeded, i.e. returns `nullptr`. > - Finally, i.e. none of the optimizations above worked: call `VectorNode::Ideal` > > Then you pack all your new logic here into `XorVNode::Ideal_XorV_VectorMaskCmp`. You can also find a better name, it is just what I came up with just now. > > This gives us a much more **modular** design, and it is easier to add another new optimization to `XorVNode::Ideal`. It is easy to change the precedence of the optimizations by just changing the order, etc. Done, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2076646028 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2076646366 From asmehra at openjdk.org Wed May 7 03:54:13 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 7 May 2025 03:54:13 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: <7e6TPADKIO-d9cqpuhk-O-4bEX1esJsQdFtztwF5gcU=.8df43b49-7e62-4aa4-8f14-184b9376467b@github.com> References: <7e6TPADKIO-d9cqpuhk-O-4bEX1esJsQdFtztwF5gcU=.8df43b49-7e62-4aa4-8f14-184b9376467b@github.com> Message-ID: On Tue, 6 May 2025 14:11:54 GMT, Ashutosh Mehra wrote: > I can recreate this test locally. Looking into it. Looks like the code for C2 blobs generated in assembly phase is not correct. For example, code for new_instance blob is: [0.141s][550409][aot,codecache,stubs] Decoding CodeBlob, name: C2 Runtime new_instance, at [0x00007fed0f904260, 0x00007fed0f9042b8] 88 bytes [0.141s][550409][aot,codecache,stubs] ;; N1: # out( B1 ) <- in( B3 B2 ) Freq: 1 [0.141s][550409][aot,codecache,stubs] ;; B1: # out( B3 B2 ) <- BLOCK HEAD IS JUNK Freq: 1 [0.141s][550409][aot,codecache,stubs] 0x00007fed0f904260: sub $0x8,%rsp [0.141s][550409][aot,codecache,stubs] 0x00007fed0f904267: mov %rbp,(%rsp) [0.141s][550409][aot,codecache,stubs] 0x00007fed0f90426b: mov %rsp,0x3e8(%r15) [0.141s][550409][aot,codecache,stubs] 0x00007fed0f904272: mov %rsi,%rdi [0.141s][550409][aot,codecache,stubs] 0x00007fed0f904275: mov %r15,%rsi [0.141s][550409][aot,codecache,stubs] 0x00007fed0f904278: movabs $0x7fed28908982,%r10 [0.141s][550409][aot,codecache,stubs] 0x00007fed0f904282: call *%r10 [0.141s][550409][aot,codecache,stubs] 0x00007fed0f904285: nopl 0x0(%rax,%rax,1) [0.141s][550409][aot,codecache,stubs] 0x00007fed0f90428d: mov %r12,0x3e8(%r15) [0.141s][550409][aot,codecache,stubs] 0x00007fed0f904294: mov %r12,0x3f0(%r15) [0.141s][550409][aot,codecache,stubs] 0x00007fed0f90429b: mov 0x440(%r15),%rax [0.141s][550409][aot,codecache,stubs] 0x00007fed0f9042a2: mov %r12,0x440(%r15) [0.141s][550409][aot,codecache,stubs] 0x00007fed0f9042a9: cmp 0x8(%r15),%r12 [0.141s][550409][aot,codecache,stubs] 0x00007fed0f9042ad: jne 0x00007fed0f9042b1 [0.141s][550409][aot,codecache,stubs] ;; B2: # out( N1 ) <- in( B1 ) Freq: 0.999999 [0.141s][550409][aot,codecache,stubs] 0x00007fed0f9042af: pop %rbp [0.141s][550409][aot,codecache,stubs] 0x00007fed0f9042b0: ret [0.141s][550409][aot,codecache,stubs] ;; B3: # out( N1 ) <- in( B1 ) Freq: 1e-06 [0.141s][550409][aot,codecache,stubs] 0x00007fed0f9042b1: pop %rbp [0.141s][550409][aot,codecache,stubs] 0x00007fed0f9042b2: jmp Stub::forward_exception [0.141s][550409][aot,codecache,stubs] 0x00007fed0f9042b7: hlt Look at the instructions generated after the call to runtime method: [0.141s][550409][aot,codecache,stubs] 0x00007fed0f90428d: mov %r12,0x3e8(%r15) [0.141s][550409][aot,codecache,stubs] 0x00007fed0f904294: mov %r12,0x3f0(%r15) [0.141s][550409][aot,codecache,stubs] 0x00007fed0f90429b: mov 0x440(%r15),%rax [0.141s][550409][aot,codecache,stubs] 0x00007fed0f9042a2: mov %r12,0x440(%r15) [0.141s][550409][aot,codecache,stubs] 0x00007fed0f9042a9: cmp 0x8(%r15),%r12 [0.141s][550409][aot,codecache,stubs] 0x00007fed0f9042ad: jne 0x00007fed0f9042b1 Comparing it with the case when AOT code cache is not used: 0x7fffdef07c8d: movq $0x0,0x3e8(%r15) 0x7fffdef07c98: movq $0x0,0x3f0(%r15) 0x7fffdef07ca3: mov 0x440(%r15),%rax 0x7fffdef07caa: movq $0x0,0x440(%r15) 0x7fffdef07cb5: mov 0x8(%r15),%r10 0x7fffdef07cb9: test %r10,%r10 0x7fffdef07cbc: jne 0x7fffdef07cc0 For some reason r12 is used instead of null (0x0). r12 is the CompressedOop::base address. C2 code that generates this is https://github.com/openjdk/jdk/blob/762423d64d10dcdb37800767d2b2f1b7757c804a/src/hotspot/share/opto/generateOptoStub.cpp#L222 I suspect I missed porting a change from premain. @adinn @vnkozlov any idea what that could be? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2856943220 From kvn at openjdk.org Wed May 7 06:02:15 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 7 May 2025 06:02:15 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 21:13:24 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - Fix win64 compile failures > > Signed-off-by: Ashutosh Mehra > - Fix AOTCodeFlags.java test > > Signed-off-by: Ashutosh Mehra > - Fix compile failure in minimal config > > Signed-off-by: Ashutosh Mehra > - Revert back changes that added AOTRuntimeConstants. > Ensure CompressedOops::base and CompressedKlssPointers::base does not > change in production run > > Signed-off-by: Ashutosh Mehra > - Fix merge conflicts > > Signed-off-by: Ashutosh Mehra > - Store/load AsmRemarks and DbgStrings in aot code cache > > Signed-off-by: Ashutosh Mehra > - Add missing external address in aarch64 > > Signed-off-by: Ashutosh Mehra > - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab Easy to fix. Look for checks `UseCompressedOops && (CompressedOops::base() == nullptr)` in predicates in `x86_64.ad` and add additional check `!AOTCodeCache::is_on_for dump()`. May be create new function `r12_is_null()` or something. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2857157137 From aboldtch at openjdk.org Wed May 7 06:15:14 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 7 May 2025 06:15:14 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 10:21:54 GMT, Jatin Bhateja wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding As I cannot test this on APX enabled hardware, I will leave the testing and verifying that this approach works up to you. But the change looks good, and it maintains the original behaviour for none APX enabled hardware. ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24919#pullrequestreview-2820461864 From jbhateja at openjdk.org Wed May 7 06:19:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 7 May 2025 06:19:17 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 10:21:54 GMT, Jatin Bhateja wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Hi @TobiHartmann , @eme64 , can you kindly run this version through your test infra. This is an APX-specific issue. I have verified its correctness using SDE, both following tests are now passing. https://github.com/openjdk/jdk/tree/master/test/hotspot/jtreg/compiler/c2/irTests/gc ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2857197887 From dfenacci at openjdk.org Wed May 7 07:00:28 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 7 May 2025 07:00:28 GMT Subject: RFR: 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000) In-Reply-To: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> References: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> Message-ID: On Wed, 30 Apr 2025 10:30:33 GMT, Daniel Lund?n wrote: > Certain idealizations introduce more new nodes than expected when adding the new assert in the changeset for [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833). The limit checked by the new assert is too optimistic. > > ### Changeset > > Tweak the maximum live node increase per iteration in the main IGVN loop from `NodeLimitFudgeFactor * 2` (4000 by default) to `NodeLimitFudgeFactor * 3` (6000 by default). This change does not only affect the newly added assert in [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833), but also the IGVN live node count bailout which is `MaxNodeLimit` minus the maximum live node increase per iteration. That is, the bailout by default is currently at 80000 - 4000 = 76000 live nodes, and 80000 - 6000 = 74000 live nodes after this changeset. In practice, the difference does not matter (see Testing below). > > The motivation for just tweaking the limit and keeping the assert added by [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833) is that individual IGVN transformations (within a single iteration of the IGVN loop) should, in theory, only affect a local set of nodes in the ideal graph. Therefore, the assert is a good sanity check that various transformations (current ones and whatever we might add in the future) do not scale in the size of the ideal graph (i.e., they are local transformations). > > I have not managed to construct a reliable regression test, as triggering the assert is difficult (highly intermittent). Also, the issue is benign (a too optimistic limit). > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14594986152) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Checked IGVN live node count bailouts in DaCapo, Renaissance, SPECjvm, and SPECjbb and observed no bailouts before nor after this changeset. Looks good to me. Just a quick (curiosity) question: why did you choose to multiply by 3 and not, for instance, doubling the current max amount (i.e. x 4)? ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/24960#pullrequestreview-2820582438 From jbhateja at openjdk.org Wed May 7 07:03:32 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 7 May 2025 07:03:32 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 02:10:56 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Refactor code > > Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this > optimization, making the code more modular. > - Merge branch 'master' into JDK-8354242 > - Update the jtreg test > - Merge branch 'master' into JDK-8354242 > - Addressed some review comments > > 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. > 2. Improve code comments. > - Merge branch 'master' into JDK-8354242 > - Merge branch 'master' into JDK-8354242 > - 8354242: VectorAPI: combine vector not operation with compare > > This patch optimizes the following patterns: > For integer types: > ``` > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > ``` > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > ``` > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > ``` > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: > With option `-XX:UseSVE=2`: > ``` > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4... src/hotspot/share/opto/vectornode.cpp line 2231: > 2229: } > 2230: if (in1->Opcode() != Op_VectorMaskCmp || in1->outcnt() > 1 || > 2231: !((VectorMaskCmpNode*) in1)->predicate_can_be_inverted() || Do you plan to extend your testcase / matching logic to cover following equivalent patterns: - compare.xor(maskAll(true)) - compare.xor(VectorMask.fromLong(SPECIES, -1L)) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2076939541 From shade at openjdk.org Wed May 7 07:07:18 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 7 May 2025 07:07:18 GMT Subject: RFR: 8356259: Lift basic -Xlog:jit* logging to "info" level [v2] In-Reply-To: References: <2fpJJXAU-vYZkTcjJtTiy5gie8wiw836gMv3kbcidXs=.47732a59-c5ce-4d66-9f40-8d78c657374f@github.com> Message-ID: On Tue, 6 May 2025 19:18:54 GMT, Aleksey Shipilev wrote: >> We have unified logging for JIT activity: -Xlog:jit+compilation, -Xlog:jit+inlining, etc. These serve as convenient replacements for -XX:+PrintCompilation, -XX:+PrintInlining, etc. And these replacements are useful, because UL can be forwarded to file, their format can be adjusted, and they can be handled asynchronously. >> >> However, all useful messages are on "debug" level, which is inconvenient and surprising. It is reasonable to expect some level of basic logging when supplying -Xlog:jit+compilation, e.g. "info" level. I believe we should lift at least some of the logging to "info" level for these. >> >> Additional testing: >> - [x] Eyeballing `-Xlog:jit*` logs after the patch >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Only do jit+compilation OK, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25061#issuecomment-2857360250 From duke at openjdk.org Wed May 7 07:09:23 2025 From: duke at openjdk.org (kuaiwei) Date: Wed, 7 May 2025 07:09:23 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function Message-ID: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> I wrote a test to check if every C2 IR node has correct size_of() function. And I found some of them are missed. They added new fields and not add size_of() to reflect new size. In linux, it does not cause issue so far, because gcc allocate more space for alignment and can keep these additional `bool` flags. But it will report failure on windows. And if anyone modified base class, it will cause problem. PS, My test is in https://github.com/openjdk/jdk/compare/master...kuaiwei:jdk:test/check_node_size , but it has many hack on IR nodes to make test to run. ------------- Commit messages: - 8356328: Some C2 IR nodes miss size_of() function Changes: https://git.openjdk.org/jdk/pull/25081/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25081&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356328 Stats: 4 lines in 3 files changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25081.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25081/head:pull/25081 PR: https://git.openjdk.org/jdk/pull/25081 From mbaesken at openjdk.org Wed May 7 07:22:21 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 7 May 2025 07:22:21 GMT Subject: RFR: 8356269: Fix broken web-links after JDK-8295470 In-Reply-To: References: Message-ID: <90fOl3WwWvI-FtRjQ6qmA-U4fQboiFY5A5J0vQTBaes=.8018f778-04d1-443f-a72a-36eae02df416@github.com> On Tue, 6 May 2025 14:37:37 GMT, Matthias Baesken wrote: > There are some issues with [JDK-8295470](https://bugs.openjdk.org/browse/JDK-8295470) > https://wiki.openjdk.org/display/CodeTools/jcstress seems to be dead now (also used in TestGenerator.java). > There is a typo hhttps at one place, needs to be fixed. Thanks for the reviews ! I adjusted the title as suggested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25068#issuecomment-2857392231 From mbaesken at openjdk.org Wed May 7 07:22:22 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 7 May 2025 07:22:22 GMT Subject: Integrated: 8356269: Fix broken web-links after JDK-8295470 In-Reply-To: References: Message-ID: On Tue, 6 May 2025 14:37:37 GMT, Matthias Baesken wrote: > There are some issues with [JDK-8295470](https://bugs.openjdk.org/browse/JDK-8295470) > https://wiki.openjdk.org/display/CodeTools/jcstress seems to be dead now (also used in TestGenerator.java). > There is a typo hhttps at one place, needs to be fixed. This pull request has now been integrated. Changeset: 21f01e0c Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/21f01e0c89e40ae2701d8cb24c737be78f4dcd19 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod 8356269: Fix broken web-links after JDK-8295470 Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/25068 From chagedorn at openjdk.org Wed May 7 07:27:15 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 7 May 2025 07:27:15 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function In-Reply-To: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> Message-ID: On Wed, 7 May 2025 07:04:26 GMT, kuaiwei wrote: > I wrote a test to check if every C2 IR node has correct size_of() function. And I found some of them are missed. They added new fields and not add size_of() to reflect new size. In linux, it does not cause issue so far, because gcc allocate more space for alignment and can keep these additional `bool` flags. But it will report failure on windows. And if anyone modified base class, it will cause problem. > > PS, My test is in https://github.com/openjdk/jdk/compare/master...kuaiwei:jdk:test/check_node_size , but it has many hack on IR nodes to make test to run. Good catch! It would currently only be a problem when we clone nodes which is probably hard to check statically (could, for example, be part of a loop body and then be cloned). Some questions: - Have you also checked the Mach nodes? - Have you also checked that `cmp()` is overridden in case `hash()` is not `NO_HASH` for those nodes that specify at least one field? Just a side node, you can also just use `sizeof(*this)` which is often done in the code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25081#issuecomment-2857406497 From duke at openjdk.org Wed May 7 07:38:16 2025 From: duke at openjdk.org (kuaiwei) Date: Wed, 7 May 2025 07:38:16 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function In-Reply-To: References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> Message-ID: On Wed, 7 May 2025 07:24:30 GMT, Christian Hagedorn wrote: > Good catch! It would currently only be a problem when we clone nodes which is probably hard to check statically (could, for example, be part of a loop body and then be cloned). > > Some questions: > > * Have you also checked the Mach nodes? > * Have you also checked that `cmp()` is overridden in case `hash()` is not `NO_HASH` for those nodes that specify at least one field? > > Just a side node, you can also just use `sizeof(*this)` which is often done in the code. I checked node list in share/opto/classes.hpp, so MachNode/MachNullCheckNode/MachProjNode are checked. For mach nodes created by adlc, I found adlc will always add size_of function. I haven't checked `cmp() and hash()` , I will check if my test can cover these. IMO, `sizeof(SomeNode)` is more clear than `sizeof(*this)` , so I choose this style. Thanks for your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25081#issuecomment-2857436783 From duke at openjdk.org Wed May 7 07:39:53 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 7 May 2025 07:39:53 GMT Subject: RFR: 8356310: compiler/print/TestPrintAssemblyDeoptRace.java fails with Improperly specified VM option 'DeoptimizeALot' Message-ID: <5qwnTkxex4jovLEWgVaTbCu3-Rycs_jKIgYH-S3S_r4=.ba854f92-0146-40d3-a7e9-6a3f249936de@github.com> This PR adds `-XX:+IgnoreUnrecoginzedVMOptions` to fix `compiler/print/TestPrintAssemblyDeoptRace.java` on product builds. ------------- Commit messages: - Add -XX:+IgnoreUnrecoginzedVMOptions to fix testing on product builds Changes: https://git.openjdk.org/jdk/pull/25082/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25082&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356310 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25082/head:pull/25082 PR: https://git.openjdk.org/jdk/pull/25082 From mchevalier at openjdk.org Wed May 7 07:39:53 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 7 May 2025 07:39:53 GMT Subject: RFR: 8356310: compiler/print/TestPrintAssemblyDeoptRace.java fails with Improperly specified VM option 'DeoptimizeALot' In-Reply-To: <5qwnTkxex4jovLEWgVaTbCu3-Rycs_jKIgYH-S3S_r4=.ba854f92-0146-40d3-a7e9-6a3f249936de@github.com> References: <5qwnTkxex4jovLEWgVaTbCu3-Rycs_jKIgYH-S3S_r4=.ba854f92-0146-40d3-a7e9-6a3f249936de@github.com> Message-ID: On Wed, 7 May 2025 07:28:53 GMT, Manuel H?ssig wrote: > This PR adds `-XX:+IgnoreUnrecoginzedVMOptions` to fix `compiler/print/TestPrintAssemblyDeoptRace.java` on product builds. Been there, done that... test/hotspot/jtreg/compiler/print/TestPrintAssemblyDeoptRace.java line 26: > 24: /* > 25: * @test > 26: * @bug 8258229 8356310 I'm not sure how much it helps a future reader that wants to figure what the test is about. Doesn't hurt so much either tho. ------------- Marked as reviewed by mchevalier (Author). PR Review: https://git.openjdk.org/jdk/pull/25082#pullrequestreview-2820675781 PR Review Comment: https://git.openjdk.org/jdk/pull/25082#discussion_r2076992638 From rcastanedalo at openjdk.org Wed May 7 07:41:19 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 7 May 2025 07:41:19 GMT Subject: RFR: 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000) In-Reply-To: References: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> Message-ID: On Tue, 6 May 2025 15:34:51 GMT, Christian Hagedorn wrote: >> Certain idealizations introduce more new nodes than expected when adding the new assert in the changeset for [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833). The limit checked by the new assert is too optimistic. >> >> ### Changeset >> >> Tweak the maximum live node increase per iteration in the main IGVN loop from `NodeLimitFudgeFactor * 2` (4000 by default) to `NodeLimitFudgeFactor * 3` (6000 by default). This change does not only affect the newly added assert in [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833), but also the IGVN live node count bailout which is `MaxNodeLimit` minus the maximum live node increase per iteration. That is, the bailout by default is currently at 80000 - 4000 = 76000 live nodes, and 80000 - 6000 = 74000 live nodes after this changeset. In practice, the difference does not matter (see Testing below). >> >> The motivation for just tweaking the limit and keeping the assert added by [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833) is that individual IGVN transformations (within a single iteration of the IGVN loop) should, in theory, only affect a local set of nodes in the ideal graph. Therefore, the assert is a good sanity check that various transformations (current ones and whatever we might add in the future) do not scale in the size of the ideal graph (i.e., they are local transformations). >> >> I have not managed to construct a reliable regression test, as triggering the assert is difficult (highly intermittent). Also, the issue is benign (a too optimistic limit). >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14594986152) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Checked IGVN live node count bailouts in DaCapo, Renaissance, SPECjvm, and SPECjbb and observed no bailouts before nor after this changeset. > > Looks resonable to me. > @chhagedorn Thanks for the review! > > @eme64 > > > @dlunde Did I understand this right: a single node was transformed, and it created over 4k new nodes? > > Yes, a single call to `transform_old` resulted in more than 4k new nodes. > > > ``` > > DEBUG_ONLY(int live_nodes_before = C->live_nodes();) > > Node* nn = transform_old(n); > > DEBUG_ONLY(int live_nodes_after = C->live_nodes();) > > ``` > > > > > > > > > > > > > > > > > > > > > > > > Do you know which node was transformed, and what exactly happens there? > > I investigated, but did not manage to reproduce the failure locally (so I could not look at it in detail). No success with reproducing through replay files either. In Oracle-internal testing, the failure reproduces in only 1% of the test runs. I did do a simple dump of the nodes `n` and `nn` during an iteration that triggered the assert, and got the below. > > ``` > 19517 Phi === 1444 17794 19566 19568 19570 14402 19571 19573 19575 19576 19580 19584 19585 19591 19597 19598 19606 19617 19720 19704 19826 19703 19837 19618 19851 19701 19956 19700 19970 19698 19987 19697 20004 19695 20024 19694 20044 19692 20067 19691 20090 19689 20116 19688 20142 19686 20171 19685 20200 19683 20232 19682 20264 20301 19620 17860 20405 19680 20441 20480 19621 19678 20583 19677 20622 20666 20768 17855 20869 19675 20912 19623 20958 19673 21058 19672 21104 19624 17850 17850 21155 19670 21206 19668 21260 19667 21314 17845 21373 19665 21431 19663 21492 19662 21553 21615 19660 21678 19659 21742 21807 19657 21873 19656 21940 22008 19654 22077 19653 22147 22218 19651 22290 19650 22363 22437 19648 22512 19647 22588 22665 19645 22743 19644 22822 22902 19642 22983 19641 23065 23148 19639 23232 19638 23317 23403 19636 23490 19635 23578 23667 19633 23757 19632 23848 23940 19630 24033 19629 24127 24222 19627 24318 19626 24415 24513 [[ 19529 ]] #memory Memory: @java/lang/Lon g (java/io/Serializable,java/lang/Comparable,java/lang/constant/Constable,java/lang/constant/ConstantDesc):NotNull:exact+16 *,iid=4638, name=value, idx=32; !orig=19506,[2618],[39511],[764] !jvms: VarHandleTestByteArrayAsLong::testArrayReadWrite @ bci:83 (line 1059) > > 19517 Phi === 1444 17794 19566 19568 19570 14402 19571 19573 19575 19576 19580 19584 19585 19591 19597 19598 19606 19617 19720 19704 19826 19703 19837 19618 19851 19701 19956 19700 19970 19698 19987 19697 20004 19695 20024 19694 20044 19692 20067 19691 20090 19689 20116 19688 20142 19686 20171 19685 20200 19683 20232 19682 20264 20301 19620 17860 20405 19680 20441 20480 19621 19678 20583 19677 20622 20666 20768 17855 20869 19675 20912 19623 20958 19673 21058 19672 21104 19624 17850 17850 21155 19670 21206 19668 21260 19667 21314 17845 21373 19665 21431 19663 21492 19662 21553 21615 19660 21678 19659 21742 21807 19657 21873 19656 21940 22008 19654 22077 19653 22147 22218 19651 22290 19650 22363 22437 19648 22512 19647 22588 22665 19645 22743 19644 22822 22902 19642 22983 19641 23065 23148 19639 23232 19638 23317 23403 19636 23490 19635 23578 23667 19633 23757 19632 23848 23940 19630 24033 19629 24127 24222 19627 24318 19626 24415 24513 [[ 19529 ]] #memory Memory: @java/lang/Lon g (java/io/Serializable,java/lang/Comparable,java/lang/constant/Constable,java/lang/constant/ConstantDesc):NotNull:exact+16 *,iid=4638, name=value, idx=32; !orig=19506,[2618],[39511],[764] !jvms: VarHandleTestByteArrayAsLong::testArrayReadWrite @ bci:83 (line 1059) > ``` > > That is, a (locally unchanged) large Phi node. I would assume `PhiNode::Ideal` added 4k new nodes somewhere further up the inputs. If we cannot bound the amount of nodes that can be created by `PhiNode::Ideal`, wouldn't it be more robust to simply disable the single-iteration node increase assertion for `PhiNode`? Otherwise there is the risk that we encounter the failure again with a slightly larger test case. Alternatively, if we could (?) derive a tighter bound for `PhiNode` (e.g. based on its number of inputs, number of memory slices for memory phis, etc.) we could try to compute it and use it in the assertion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24960#issuecomment-2857442781 From roland at openjdk.org Wed May 7 07:43:21 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 7 May 2025 07:43:21 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v15] In-Reply-To: References: <-rQb2ZR6hrzt-7Q0EwQqlxjvVuDQQOgYqzX3tZVPL38=.2577f4e0-c35f-434e-88d1-f0db41bb5364@github.com> Message-ID: On Wed, 23 Apr 2025 09:03:48 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 41 commits: >> >> - merge fix >> - Merge branch 'master' into JDK-8342692 >> - merge fix >> - Merge branch 'master' into JDK-8342692 >> - merge >> - Merge branch 'master' into JDK-8342692 >> - Merge branch 'master' into JDK-8342692 >> - whitespace >> - Merge branch 'master' into JDK-8342692 >> - TestMemorySegment test fix >> - ... and 31 more: https://git.openjdk.org/jdk/compare/dc5c4148...065abb29 > > src/hotspot/share/opto/c2_globals.hpp line 824: > >> 822: product(bool, ShortRunningLongLoop, true, DIAGNOSTIC, \ >> 823: "long counted loop/long range checks: don't create loop nest if" \ >> 824: "loop runs for small enough number of iterations") \ > > Could it make sense to have `ShortLoopIter` be a flag as well? That would allow you to write a nice JMH benchmark, where we can modify the threshold :) > > Wait... you mention `ShortLoopIter` in the PR description, but it only occurs once in a comment... what happened here? See: https://github.com/openjdk/jdk/pull/21630#issuecomment-2538327199 I removed it following the discussion with @merykitty ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2077001856 From thartmann at openjdk.org Wed May 7 07:47:16 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 7 May 2025 07:47:16 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v2] In-Reply-To: <4n5OGLVPn8sEuDgcJqZ5oKco3N_trnSxHNwyBawRQF4=.fe8ecf59-fb55-49c9-b8da-99efee63dde4@github.com> References: <3xXLZZOHl6oejisEzmNv206aQo4y6FuJoWhsOO_GWqM=.682d7701-baa2-4654-8216-e4de526456d1@github.com> <_jqax2exjj4DnvqP-lVK4kiwJ59C0XS6B8DE6quAHGc=.579945f7-36a6-49f8-9b04-c0fe63f60a5f@github.com> <4n5OGLVPn8sEuDgcJqZ5oKco3N_trnSxHNwyBawRQF4=.fe8ecf59-fb55-49c9-b8da-99efee63dde4@github.com> Message-ID: <21d6dUz886V6-TbWTNyt22KX6UaBNbsfZQz3hnQVjNA=.edb3bbd3-0dba-4e54-92cf-88e680cc5149@github.com> On Tue, 6 May 2025 18:02:55 GMT, Daniel Lund?n wrote: >> Ok, maybe I was not clear enough. >> >> The comment says: >> >>> // Triggers an assert in PhaseCFG::raise_above_anti_dependences if loop strip mining verification is disabled: >> >> My question is, does the test on top of which the comment is placed (`test4` right?) ever run with loop strip mining verification disabled and if it does, does the assert get triggered? If `test4` does not do this, seems to me it would be nice to have an additional test that verifies just that rather than accept/assume the comment as valid without a test that actually verifies this. Thoughts? > > I would assume `test4` is a regression test and that the `assert` no longer triggers in any situation (otherwise we'd still have a bug)? Also, from what I can see, there is no VM flag that disables loop strip mining verification. > > Perhaps I'm still misunderstanding you? @TobiHartmann I see you added this test back in 2021, could you help bring us some clarity? This changeset only renames the occurrence of `insert_anti_dependences` in `TestSplitIfPinnedLoadInStripMinedLoop.java` to `raise_above_anti_dependences`. > Is this test still valid? I think it's as valid as any other regression test. With time, there's no guarantee that the test would still trigger the original issue if the fix would accidentally be reverted. But these tests still have a lot of value because they trigger state that apparently no other test triggered and often they still reproduce the issue with older JDK releases (or reveal new issues with new changes). > but this assert appears to be removed? It's still here, see: https://github.com/openjdk/jdk/blob/910d77d39e6fb9ca339272c75fa4ff7ff99bffcf/src/hotspot/share/opto/gcm.cpp#L889-L890 > does the test on top of which the comment is placed (test4 right?) ever run with loop strip mining verification disabled and if it does, does the assert get triggered? It's not possible to disable loop strip mining but IIRC I disabled it manually to see what other issues we hit without verification, for example in release builds (see also my comments in https://github.com/openjdk/jdk/pull/2315). So this test basically covers a different failure mode. I think Daniel's change is good as it is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2077009625 From shade at openjdk.org Wed May 7 07:47:22 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 7 May 2025 07:47:22 GMT Subject: Integrated: 8356259: Lift basic -Xlog:jit* logging to "info" level In-Reply-To: <2fpJJXAU-vYZkTcjJtTiy5gie8wiw836gMv3kbcidXs=.47732a59-c5ce-4d66-9f40-8d78c657374f@github.com> References: <2fpJJXAU-vYZkTcjJtTiy5gie8wiw836gMv3kbcidXs=.47732a59-c5ce-4d66-9f40-8d78c657374f@github.com> Message-ID: <761jqrKse3Lh7FxmHrUMnDPws8xEXOMB-o-Ry1HT6QI=.4c6bae97-8e59-4aff-aaa3-56dfac751eaa@github.com> On Tue, 6 May 2025 09:52:24 GMT, Aleksey Shipilev wrote: > We have unified logging for JIT activity: -Xlog:jit+compilation, -Xlog:jit+inlining, etc. These serve as convenient replacements for -XX:+PrintCompilation, -XX:+PrintInlining, etc. And these replacements are useful, because UL can be forwarded to file, their format can be adjusted, and they can be handled asynchronously. > > However, all useful messages are on "debug" level, which is inconvenient and surprising. It is reasonable to expect some level of basic logging when supplying -Xlog:jit+compilation, e.g. "info" level. I believe we should lift at least some of the logging to "info" level for these. > > Additional testing: > - [x] Eyeballing `-Xlog:jit*` logs after the patch > - [x] Linux x86_64 server fastdebug, `all` This pull request has now been integrated. Changeset: 50895835 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/50895835e0c78f54a0b33db7f42f3769e2a1e652 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8356259: Lift basic -Xlog:jit* logging to "info" level Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/25061 From thartmann at openjdk.org Wed May 7 07:48:16 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 7 May 2025 07:48:16 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: <1gGtDEUALoWyrLQwwRD9bo2wb55O5Lh2DTnWTXQ8Oe8=.45ef5737-2ea6-4179-a998-79d8d51aca13@github.com> On Tue, 6 May 2025 10:21:54 GMT, Jatin Bhateja wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Sure, I'll run it through testing and report back. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2857462391 From chagedorn at openjdk.org Wed May 7 07:49:13 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 7 May 2025 07:49:13 GMT Subject: RFR: 8356310: compiler/print/TestPrintAssemblyDeoptRace.java fails with Improperly specified VM option 'DeoptimizeALot' In-Reply-To: References: <5qwnTkxex4jovLEWgVaTbCu3-Rycs_jKIgYH-S3S_r4=.ba854f92-0146-40d3-a7e9-6a3f249936de@github.com> Message-ID: On Wed, 7 May 2025 07:34:29 GMT, Marc Chevalier wrote: >> This PR adds `-XX:+IgnoreUnrecoginzedVMOptions` to fix `compiler/print/TestPrintAssemblyDeoptRace.java` on product builds. > > test/hotspot/jtreg/compiler/print/TestPrintAssemblyDeoptRace.java line 26: > >> 24: /* >> 25: * @test >> 26: * @bug 8258229 8356310 > > I'm not sure how much it helps a future reader that wants to figure what the test is about. Doesn't hurt so much either tho. That's a good point. Since it's only a follow-up fix to make the test work with product, you don't need to add the bug number of this issue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25082#discussion_r2077009636 From epeter at openjdk.org Wed May 7 07:57:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 07:57:17 GMT Subject: RFR: 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000) In-Reply-To: References: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> Message-ID: On Wed, 7 May 2025 07:38:16 GMT, Roberto Casta?eda Lozano wrote: > If we cannot bound the amount of nodes that can be created by PhiNode::Ideal, wouldn't it be more robust to simply disable the single-iteration node increase assertion for PhiNode? Otherwise there is the risk that we encounter the failure again with a slightly larger test case. Alternatively, if we could (?) derive a tighter bound for PhiNode (e.g. based on its number of inputs, number of memory slices for memory phis, etc.) we could try to compute it and use it in the assertion. @robcasloz @dlunde Yes, such an exception may help us keep tight bounds on most nodes. And maybe we can even quantify more precisely how many nodes we expect to be created by `PhiNode::Ideal`. Maybe it is somehow linear in its inputs? @dlunde it would also be interesting to look more deeply into `PhiNode::Ideal`, and see what happens there. The Phi has 150+ inputs, but how does that generate 4k+ nodes? That would be 4000/150 ~ 25+ nodes per input. I'm just wondering if this is really sane? And is it profitable? Might it be better to check if we are creating that many nodes before doing it, and blowing through the node budget? It might be worth investigating. But I do hear that it is difficult to reproduce. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24960#issuecomment-2857485599 From duke at openjdk.org Wed May 7 07:57:19 2025 From: duke at openjdk.org (erifan) Date: Wed, 7 May 2025 07:57:19 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 06:59:34 GMT, Jatin Bhateja wrote: >> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Refactor code >> >> Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this >> optimization, making the code more modular. >> - Merge branch 'master' into JDK-8354242 >> - Update the jtreg test >> - Merge branch 'master' into JDK-8354242 >> - Addressed some review comments >> >> 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >> 2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >> This patch optimizes the following patterns: >> For integer types: >> ``` >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> ``` >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> ``` >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> ``` >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: >> With option `-XX:UseSVE=2`: >> ``` >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.156... > > src/hotspot/share/opto/vectornode.cpp line 2231: > >> 2229: } >> 2230: if (in1->Opcode() != Op_VectorMaskCmp || in1->outcnt() > 1 || >> 2231: !((VectorMaskCmpNode*) in1)->predicate_can_be_inverted() || > > Do you plan to extend your testcase / matching logic to cover following equivalent patterns: > > - compare.xor(maskAll(true)) > - compare.xor(VectorMask.fromLong(SPECIES, -1L)) Oh, I didn't think of this case, let me try. Thanks~ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2077026150 From duke at openjdk.org Wed May 7 08:04:30 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 7 May 2025 08:04:30 GMT Subject: RFR: 8356310: compiler/print/TestPrintAssemblyDeoptRace.java fails with Improperly specified VM option 'DeoptimizeALot' [v2] In-Reply-To: <5qwnTkxex4jovLEWgVaTbCu3-Rycs_jKIgYH-S3S_r4=.ba854f92-0146-40d3-a7e9-6a3f249936de@github.com> References: <5qwnTkxex4jovLEWgVaTbCu3-Rycs_jKIgYH-S3S_r4=.ba854f92-0146-40d3-a7e9-6a3f249936de@github.com> Message-ID: > This PR adds `-XX:+IgnoreUnrecoginzedVMOptions` to fix `compiler/print/TestPrintAssemblyDeoptRace.java` on product builds. Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: - Remove unneeded UnlockDiagnosticVMOptions - Removed bug number ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25082/files - new: https://git.openjdk.org/jdk/pull/25082/files/4e42cb6a..36608159 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25082&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25082&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25082.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25082/head:pull/25082 PR: https://git.openjdk.org/jdk/pull/25082 From thartmann at openjdk.org Wed May 7 08:04:31 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 7 May 2025 08:04:31 GMT Subject: RFR: 8356310: compiler/print/TestPrintAssemblyDeoptRace.java fails with Improperly specified VM option 'DeoptimizeALot' [v2] In-Reply-To: References: <5qwnTkxex4jovLEWgVaTbCu3-Rycs_jKIgYH-S3S_r4=.ba854f92-0146-40d3-a7e9-6a3f249936de@github.com> Message-ID: On Wed, 7 May 2025 08:02:04 GMT, Manuel H?ssig wrote: >> This PR adds `-XX:+IgnoreUnrecoginzedVMOptions` to fix `compiler/print/TestPrintAssemblyDeoptRace.java` on product builds. > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - Remove unneeded UnlockDiagnosticVMOptions > - Removed bug number Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25082#pullrequestreview-2820732676 From chagedorn at openjdk.org Wed May 7 08:04:31 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 7 May 2025 08:04:31 GMT Subject: RFR: 8356310: compiler/print/TestPrintAssemblyDeoptRace.java fails with Improperly specified VM option 'DeoptimizeALot' [v2] In-Reply-To: References: <5qwnTkxex4jovLEWgVaTbCu3-Rycs_jKIgYH-S3S_r4=.ba854f92-0146-40d3-a7e9-6a3f249936de@github.com> Message-ID: On Wed, 7 May 2025 08:02:04 GMT, Manuel H?ssig wrote: >> This PR adds `-XX:+IgnoreUnrecoginzedVMOptions` to fix `compiler/print/TestPrintAssemblyDeoptRace.java` on product builds. > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - Remove unneeded UnlockDiagnosticVMOptions > - Removed bug number Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25082#pullrequestreview-2820733742 From rcastanedalo at openjdk.org Wed May 7 08:04:31 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 7 May 2025 08:04:31 GMT Subject: RFR: 8356310: compiler/print/TestPrintAssemblyDeoptRace.java fails with Improperly specified VM option 'DeoptimizeALot' [v2] In-Reply-To: References: <5qwnTkxex4jovLEWgVaTbCu3-Rycs_jKIgYH-S3S_r4=.ba854f92-0146-40d3-a7e9-6a3f249936de@github.com> Message-ID: <86toVg-Dbs0423YuD9MDdeCnQLOdOH-oosSYVJfKFKU=.2844186d-fa19-47f5-94e9-ca634905c77e@github.com> On Wed, 7 May 2025 08:02:04 GMT, Manuel H?ssig wrote: >> This PR adds `-XX:+IgnoreUnrecoginzedVMOptions` to fix `compiler/print/TestPrintAssemblyDeoptRace.java` on product builds. > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - Remove unneeded UnlockDiagnosticVMOptions > - Removed bug number Trivial. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25082#pullrequestreview-2820744454 From epeter at openjdk.org Wed May 7 08:04:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 08:04:32 GMT Subject: RFR: 8356310: compiler/print/TestPrintAssemblyDeoptRace.java fails with Improperly specified VM option 'DeoptimizeALot' [v2] In-Reply-To: References: <5qwnTkxex4jovLEWgVaTbCu3-Rycs_jKIgYH-S3S_r4=.ba854f92-0146-40d3-a7e9-6a3f249936de@github.com> Message-ID: On Wed, 7 May 2025 08:02:04 GMT, Manuel H?ssig wrote: >> This PR adds `-XX:+IgnoreUnrecoginzedVMOptions` to fix `compiler/print/TestPrintAssemblyDeoptRace.java` on product builds. > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - Remove unneeded UnlockDiagnosticVMOptions > - Removed bug number Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25082#pullrequestreview-2820751695 From duke at openjdk.org Wed May 7 08:04:33 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 7 May 2025 08:04:33 GMT Subject: RFR: 8356310: compiler/print/TestPrintAssemblyDeoptRace.java fails with Improperly specified VM option 'DeoptimizeALot' In-Reply-To: <5qwnTkxex4jovLEWgVaTbCu3-Rycs_jKIgYH-S3S_r4=.ba854f92-0146-40d3-a7e9-6a3f249936de@github.com> References: <5qwnTkxex4jovLEWgVaTbCu3-Rycs_jKIgYH-S3S_r4=.ba854f92-0146-40d3-a7e9-6a3f249936de@github.com> Message-ID: On Wed, 7 May 2025 07:28:53 GMT, Manuel H?ssig wrote: > This PR adds `-XX:+IgnoreUnrecoginzedVMOptions` to fix `compiler/print/TestPrintAssemblyDeoptRace.java` on product builds. I also removed the unneeded `-XX:UnlockDiagnosticVMOptions` flag. Thank you for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25082#issuecomment-2857495482 PR Comment: https://git.openjdk.org/jdk/pull/25082#issuecomment-2857502428 From duke at openjdk.org Wed May 7 08:04:33 2025 From: duke at openjdk.org (duke) Date: Wed, 7 May 2025 08:04:33 GMT Subject: RFR: 8356310: compiler/print/TestPrintAssemblyDeoptRace.java fails with Improperly specified VM option 'DeoptimizeALot' In-Reply-To: <5qwnTkxex4jovLEWgVaTbCu3-Rycs_jKIgYH-S3S_r4=.ba854f92-0146-40d3-a7e9-6a3f249936de@github.com> References: <5qwnTkxex4jovLEWgVaTbCu3-Rycs_jKIgYH-S3S_r4=.ba854f92-0146-40d3-a7e9-6a3f249936de@github.com> Message-ID: On Wed, 7 May 2025 07:28:53 GMT, Manuel H?ssig wrote: > This PR adds `-XX:+IgnoreUnrecoginzedVMOptions` to fix `compiler/print/TestPrintAssemblyDeoptRace.java` on product builds. @mhaessig Your change (at version 366081599470a0cded5f8f970fec7ef62f455aa1) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25082#issuecomment-2857504073 From duke at openjdk.org Wed May 7 08:04:34 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 7 May 2025 08:04:34 GMT Subject: RFR: 8356310: compiler/print/TestPrintAssemblyDeoptRace.java fails with Improperly specified VM option 'DeoptimizeALot' [v2] In-Reply-To: References: <5qwnTkxex4jovLEWgVaTbCu3-Rycs_jKIgYH-S3S_r4=.ba854f92-0146-40d3-a7e9-6a3f249936de@github.com> Message-ID: On Wed, 7 May 2025 07:44:52 GMT, Christian Hagedorn wrote: >> test/hotspot/jtreg/compiler/print/TestPrintAssemblyDeoptRace.java line 26: >> >>> 24: /* >>> 25: * @test >>> 26: * @bug 8258229 8356310 >> >> I'm not sure how much it helps a future reader that wants to figure what the test is about. Doesn't hurt so much either tho. > > That's a good point. Since it's only a follow-up fix to make the test work with product, you don't need to add the bug number of this issue. Good to know. I removed it in [af03557](https://github.com/openjdk/jdk/pull/25082/commits/af03557b51a6cb7aa66b36968ecd522a25811e28) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25082#discussion_r2077023338 From duke at openjdk.org Wed May 7 08:19:24 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 7 May 2025 08:19:24 GMT Subject: Integrated: 8356310: compiler/print/TestPrintAssemblyDeoptRace.java fails with Improperly specified VM option 'DeoptimizeALot' In-Reply-To: <5qwnTkxex4jovLEWgVaTbCu3-Rycs_jKIgYH-S3S_r4=.ba854f92-0146-40d3-a7e9-6a3f249936de@github.com> References: <5qwnTkxex4jovLEWgVaTbCu3-Rycs_jKIgYH-S3S_r4=.ba854f92-0146-40d3-a7e9-6a3f249936de@github.com> Message-ID: On Wed, 7 May 2025 07:28:53 GMT, Manuel H?ssig wrote: > This PR adds `-XX:+IgnoreUnrecoginzedVMOptions` to fix `compiler/print/TestPrintAssemblyDeoptRace.java` on product builds. This pull request has now been integrated. Changeset: b5fd289f Author: Manuel H?ssig Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/b5fd289f53e8380dfc38c3615acd10396ac647d5 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8356310: compiler/print/TestPrintAssemblyDeoptRace.java fails with Improperly specified VM option 'DeoptimizeALot' Reviewed-by: epeter, mchevalier, thartmann, chagedorn, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/25082 From duke at openjdk.org Wed May 7 08:27:54 2025 From: duke at openjdk.org (kuaiwei) Date: Wed, 7 May 2025 08:27:54 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function [v2] In-Reply-To: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> Message-ID: > I wrote a test to check if every C2 IR node has correct size_of() function. And I found some of them are missed. They added new fields and not add size_of() to reflect new size. In linux, it does not cause issue so far, because gcc allocate more space for alignment and can keep these additional `bool` flags. But it will report failure on windows. And if anyone modified base class, it will cause problem. > > PS, My test is in https://github.com/openjdk/jdk/compare/master...kuaiwei:jdk:test/check_node_size , but it has many hack on IR nodes to make test to run. kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Add missing size_of() in machnode.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25081/files - new: https://git.openjdk.org/jdk/pull/25081/files/c7054cb7..1eb11ad0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25081&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25081&range=00-01 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25081.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25081/head:pull/25081 PR: https://git.openjdk.org/jdk/pull/25081 From duke at openjdk.org Wed May 7 08:27:54 2025 From: duke at openjdk.org (kuaiwei) Date: Wed, 7 May 2025 08:27:54 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function In-Reply-To: References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> Message-ID: On Wed, 7 May 2025 07:24:30 GMT, Christian Hagedorn wrote: >> I wrote a test to check if every C2 IR node has correct size_of() function. And I found some of them are missed. They added new fields and not add size_of() to reflect new size. In linux, it does not cause issue so far, because gcc allocate more space for alignment and can keep these additional `bool` flags. But it will report failure on windows. And if anyone modified base class, it will cause problem. >> >> PS, My test is in https://github.com/openjdk/jdk/compare/master...kuaiwei:jdk:test/check_node_size , but it has many hack on IR nodes to make test to run. > > Good catch! It would currently only be a problem when we clone nodes which is probably hard to check statically (could, for example, be part of a loop body and then be cloned). > > Some questions: > - Have you also checked the Mach nodes? > - Have you also checked that `cmp()` is overridden in case `hash()` is not `NO_HASH` for those nodes that specify at least one field? > > Just a side node, you can also just use `sizeof(*this)` which is often done in the code. @chhagedorn I checked `machnode.hpp` manually and found some of them still miss `size_of()` . I added them in new patch. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25081#issuecomment-2857595948 From mli at openjdk.org Wed May 7 08:36:21 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 7 May 2025 08:36:21 GMT Subject: Integrated: 8356030: RISC-V: enable (part of) BasicDoubleOpTest.java In-Reply-To: <3UkoITinG0CBPVt9q5O8vpnHKh154itJ4STteFDM1cc=.b5da8c9f-2ca8-4d4a-91b6-70ae0a949a94@github.com> References: <3UkoITinG0CBPVt9q5O8vpnHKh154itJ4STteFDM1cc=.b5da8c9f-2ca8-4d4a-91b6-70ae0a949a94@github.com> Message-ID: On Thu, 1 May 2025 11:31:50 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > Originally, I was going to enable all test cases on riscv in this test file. But seems there was already a try to implement RoundDoubleModeV (which is IRNode.ROUND_DOUBLE_MODE_V) in https://github.com/openjdk/jdk/pull/21164, but failed because of some performance regression. > So I'll just enable part of test cases in this pr. > > Thanks! This pull request has now been integrated. Changeset: da004cb6 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/da004cb6579c96c444faa55496db0056e9ac34e0 Stats: 24 lines in 1 file changed: 13 ins; 0 del; 11 mod 8356030: RISC-V: enable (part of) BasicDoubleOpTest.java Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/24983 From mli at openjdk.org Wed May 7 08:36:22 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 7 May 2025 08:36:22 GMT Subject: Integrated: 8355699: RISC-V: support SUADD/SADD/SUSUB/SSUB In-Reply-To: References: Message-ID: On Fri, 2 May 2025 12:19:53 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch to add SUADD/SADD/SUSUB/SSUB for vector api? > > Thanks! > > ## Test > data > > Benchmark | (size) | Mode | Cnt | Score - master | Score - patch | improvement (master/patch) > -- | -- | -- | -- | -- | -- | -- > ByteMaxVector.SADD | 1024 | avgt | 10 | 23693.941 | 381.441 | 62.117 > ByteMaxVector.SSUB | 1024 | avgt | 10 | 24067.009 | 379.836 | 63.362 > ByteMaxVector.SUADD | 1024 | avgt | 10 | 24131.819 | 382.678 | 63.06 > ByteMaxVector.SUSUB | 1024 | avgt | 10 | 23140.494 | 380.768 | 60.773 > IntMaxVector.SADD | 1024 | avgt | 10 | 88526.058 | 1378.77 | 64.207 > IntMaxVector.SSUB | 1024 | avgt | 10 | 94204.768 | 1383.613 | 68.086 > IntMaxVector.SUADD | 1024 | avgt | 10 | 82470.743 | 1384.668 | 59.56 > IntMaxVector.SUSUB | 1024 | avgt | 10 | 84443.805 | 1759.69 | 47.988 > LongMaxVector.SADD | 1024 | avgt | 10 | 187690.117 | 3770.84 | 49.774 > LongMaxVector.SSUB | 1024 | avgt | 10 | 187334.716 | 3814.869 | 49.106 > LongMaxVector.SUADD | 1024 | avgt | 10 | 186891.578 | 2747.753 | 68.016 > LongMaxVector.SUSUB | 1024 | avgt | 10 | 186092.582 | 2730.588 | 68.151 > ShortMaxVector.SADD | 1024 | avgt | 10 | 43991.814 | 726.703 | 60.536 > ShortMaxVector.SSUB | 1024 | avgt | 10 | 40560.356 | 730.238 | 55.544 > ShortMaxVector.SUADD | 1024 | avgt | 10 | 43349.632 | 729.758 | 59.403 > ShortMaxVector.SUSUB | 1024 | avgt | 10 | 42686.701 | 726.059 | 58.792 > > This pull request has now been integrated. Changeset: 1a4bbb00 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/1a4bbb0027ae9e6df3b668454fa155861d531f72 Stats: 168 lines in 4 files changed: 146 ins; 1 del; 21 mod 8355699: RISC-V: support SUADD/SADD/SUSUB/SSUB Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/25005 From dlunden at openjdk.org Wed May 7 08:37:22 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 7 May 2025 08:37:22 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v2] In-Reply-To: <21d6dUz886V6-TbWTNyt22KX6UaBNbsfZQz3hnQVjNA=.edb3bbd3-0dba-4e54-92cf-88e680cc5149@github.com> References: <3xXLZZOHl6oejisEzmNv206aQo4y6FuJoWhsOO_GWqM=.682d7701-baa2-4654-8216-e4de526456d1@github.com> <_jqax2exjj4DnvqP-lVK4kiwJ59C0XS6B8DE6quAHGc=.579945f7-36a6-49f8-9b04-c0fe63f60a5f@github.com> <4n5OGLVPn8sEuDgcJqZ5oKco3N_trnSxHNwyBawRQF4=.fe8ecf59-fb55-49c9-b8da-99efee63dde4@github.com> <21d6dUz886V6-TbWTNyt22KX6UaBNbsfZQz3hnQVjNA=.edb3bbd3-0dba-4e54-92cf-88e680cc5149@github.com> Message-ID: On Wed, 7 May 2025 07:44:52 GMT, Tobias Hartmann wrote: >> I would assume `test4` is a regression test and that the `assert` no longer triggers in any situation (otherwise we'd still have a bug)? Also, from what I can see, there is no VM flag that disables loop strip mining verification. >> >> Perhaps I'm still misunderstanding you? @TobiHartmann I see you added this test back in 2021, could you help bring us some clarity? This changeset only renames the occurrence of `insert_anti_dependences` in `TestSplitIfPinnedLoadInStripMinedLoop.java` to `raise_above_anti_dependences`. > >> Is this test still valid? > > I think it's as valid as any other regression test. With time, there's no guarantee that the test would still trigger the original issue if the fix would accidentally be reverted. But these tests still have a lot of value because they trigger state that apparently no other test triggered and often they still reproduce the issue with older JDK releases (or reveal new issues with new changes). > >> but this assert appears to be removed? > > It's still here, see: > https://github.com/openjdk/jdk/blob/910d77d39e6fb9ca339272c75fa4ff7ff99bffcf/src/hotspot/share/opto/gcm.cpp#L889-L890 > >> does the test on top of which the comment is placed (test4 right?) ever run with loop strip mining verification disabled and if it does, does the assert get triggered? > > It's not possible to disable loop strip mining but IIRC I disabled it manually to see what other issues we hit without verification, for example in release builds (see also my comments in https://github.com/openjdk/jdk/pull/2315). So this test basically covers a different failure mode. > > I think Daniel's change is good as it is. Thanks @TobiHartmann. Note that this changeset does remove the assert, and replaces it with another (stronger, in theory) assert. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2077109945 From mli at openjdk.org Wed May 7 08:37:26 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 7 May 2025 08:37:26 GMT Subject: Integrated: 8355704: RISC-V: enable TestIRFma.java In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 12:47:25 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch to enable TestIRFma.java? > FmaF/D (checked by TestIRFma.java) are supported on riscv, but for some reason we can not enable it easily, but we should enable it. > > NOTE: the reason I change IRNode matching rules is that, previously it verify the `FINAL CODE` where every platform could have different instruct name; I change it from machOnlyNameRegex to beforeMatchingNameRegex, to make it verify the `PrintIdeal` where every platform share the same names. > > Also tested on machine with `asimd` support. > > Thanks! This pull request has now been integrated. Changeset: 50554fa1 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/50554fa1982f042fb1d7b6c8a16334b97b31bb63 Stats: 39 lines in 2 files changed: 39 ins; 0 del; 0 mod 8355704: RISC-V: enable TestIRFma.java Reviewed-by: rehn, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/24947 From dlunden at openjdk.org Wed May 7 09:03:15 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 7 May 2025 09:03:15 GMT Subject: RFR: 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000) In-Reply-To: References: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> Message-ID: On Wed, 7 May 2025 06:57:29 GMT, Damon Fenacci wrote: >> Certain idealizations introduce more new nodes than expected when adding the new assert in the changeset for [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833). The limit checked by the new assert is too optimistic. >> >> ### Changeset >> >> Tweak the maximum live node increase per iteration in the main IGVN loop from `NodeLimitFudgeFactor * 2` (4000 by default) to `NodeLimitFudgeFactor * 3` (6000 by default). This change does not only affect the newly added assert in [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833), but also the IGVN live node count bailout which is `MaxNodeLimit` minus the maximum live node increase per iteration. That is, the bailout by default is currently at 80000 - 4000 = 76000 live nodes, and 80000 - 6000 = 74000 live nodes after this changeset. In practice, the difference does not matter (see Testing below). >> >> The motivation for just tweaking the limit and keeping the assert added by [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833) is that individual IGVN transformations (within a single iteration of the IGVN loop) should, in theory, only affect a local set of nodes in the ideal graph. Therefore, the assert is a good sanity check that various transformations (current ones and whatever we might add in the future) do not scale in the size of the ideal graph (i.e., they are local transformations). >> >> I have not managed to construct a reliable regression test, as triggering the assert is difficult (highly intermittent). Also, the issue is benign (a too optimistic limit). >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14594986152) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Checked IGVN live node count bailouts in DaCapo, Renaissance, SPECjvm, and SPECjbb and observed no bailouts before nor after this changeset. > > Looks good to me. Just a quick (curiosity) question: why did you choose to multiply by 3 and not, for instance, doubling the current max amount (i.e. x 4)? @dafedafe > Looks good to me. Just a quick (curiosity) question: why did you choose to multiply by 3 and not, for instance, doubling the current max amount (i.e. x 4)? Thanks for the review! We'd like to keep this bound as small as possible so that we do not get a too conservative IGVN node count bailout. But, I guess using 4 instead of 3 doesn't really matter in practice and perhaps it looks cleaner to use a power of 2. @robcasloz @eme64 > If we cannot bound the amount of nodes that can be created by PhiNode::Ideal, wouldn't it be more robust to simply disable the single-iteration node increase assertion for PhiNode? Otherwise there is the risk that we encounter the failure again with a slightly larger test case. Alternatively, if we could (?) derive a tighter bound for PhiNode (e.g. based on its number of inputs, number of memory slices for memory phis, etc.) we could try to compute it and use it in the assertion. > @robcasloz @dlunde Yes, such an exception may help us keep tight bounds on most nodes. And maybe we can even quantify more precisely how many nodes we expect to be created by PhiNode::Ideal. Maybe it is somehow linear in its inputs? The issue with weakening the per-iteration assertion in special cases is that we _must_ ensure that we do not grow by more than `max_live_nodes_increase_per_iteration` in a single iteration. Below is my failure analysis for [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833) which describes the issue. > After the changes for JDK-8333393, we apply a Phi idealization, involving splitting Phis through MergeMems, a lot more frequently. This idealization internally applies further idealizations for new Phi nodes generated during the idealization. In certain cases, these internal idealizations result in a large increase of live nodes within a single iteration of the main IGVN loop in PhaseIterGVN::optimize. In particular, when we are close to the MaxNodeLimit (80 000 by default), it can happen that we go from below MaxNodeLimit - NodeLimitFudgeFactor * 2 (= 76 000 by default) to more than 80 000 nodes in a single iteration. In such cases, the node count bailout at the top of the PhaseIterGVN::optimize loop does not trigger as expected and we instead crash at an assert in node creation as we surpass MaxNodeLimit nodes. I guess we could just remove [the assert during node creation](https://github.com/openjdk/jdk/blob/50554fa1982f042fb1d7b6c8a16334b97b31bb63/src/hotspot/share/opto/node.cpp#L78) as an alternative solution, or disable it during IGVN (similarly to how it is currently disabled during code generation). > @dlunde it would also be interesting to look more deeply into PhiNode::Ideal, and see what happens there. The Phi has 150+ inputs, but how does that generate 4k+ nodes? That would be 4000/150 ~ 25+ nodes per input. I'm just wondering if this is really sane? And is it profitable? Might it be better to check if we are creating that many nodes before doing it, and blowing through the node budget? It might be worth investigating. But I do hear that it is difficult to reproduce. I agree that we should investigate further, but suggest to do this as a separate RFE (to not continue polluting testing pipelines with the assert). My semi-educated guess is that the new nodes are added as part of the call to `MemNode::optimize_memory_chain` in `PhiNode::Ideal`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24960#issuecomment-2857755141 From chagedorn at openjdk.org Wed May 7 09:13:34 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 7 May 2025 09:13:34 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v7] In-Reply-To: <0b56TIXbIwSy7Zo77WAx4uweu2kM8iAmjPMomeT3sts=.06d78493-7b94-4386-a7be-4fb65837926b@github.com> References: <0b56TIXbIwSy7Zo77WAx4uweu2kM8iAmjPMomeT3sts=.06d78493-7b94-4386-a7be-4fb65837926b@github.com> Message-ID: On Tue, 1 Apr 2025 07:18:45 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > typo Thanks for the updates in the meantime! Just some random comments here and there when browsing through the code and trying to grasp what is included everywhere. I will make more passes later :-) test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 59: > 57: * string with hashtag {@code #} "holes" that are then replaced by the template arguments and the > 58: * {@link #let} definitions. > 59: * Should we also insert `

` between the snippets? Same further down at method comments. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 129: > 127: * ); > 128: * > 129: * // Use the template with one arguments, and render it to a String. Suggestion: * // Use the template with one argument, and render it to a String. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 422: > 420: * Multi-line string > 421: * """, > 422: * "normal string ", Integer.valueOf(3), Float.valueOf(1.5f), Is `valueOf()` required or would it auto-box? test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 428: > 426: * } > 427: * > 428: * @param tokens A list of tokens, which can be {@link String}s,boxed primitive types Suggestion: * @param tokens A list of tokens, which can be {@link String}s, boxed primitive types test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 492: > 490: * var template = Template.make("a", (Integer a) -> let("b", a * 2, (Integer b) -> body( > 491: * """ > 492: * System.out.prinln("Use a and b with hashtag replacement: #a and #b"); Suggestion: * System.out.println("Use a and b with hashtag replacement: #a and #b"); test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 494: > 492: * System.out.prinln("Use a and b with hashtag replacement: #a and #b"); > 493: * """, > 494: * "System.out.println(\"Use a and b as capture variables:\" + a + " and " + b + ");\n" Suggestion: * "System.out.println("Use a and b as capture variables:"" + a + " and " + b + ");\n" test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 539: > 537: * """ > 538: * System.out.println("Currently at depth #depth with fuel #fuel"); > 539: * """ Suggestion: * """, test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java line 141: > 139: """, > 140: // Call the testTemplate for each type and operator, generating a > 141: // list of list of TemplateWithArgs: lists? Suggestion: // list of lists of TemplateWithArgs: test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java line 151: > 149: )); > 150: > 151: // For each type, we chose a list of operators that do not throw exceptions. Suggestion: // For each type, we choose a list of operators that do not throw exceptions. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java line 159: > 157: new Type("long", () -> GEN_LONG.next(), List.of("+", "-", "*", "&", "|", "^")), > 158: new Type("float", () -> GEN_FLOAT.next(), List.of("+", "-", "*", "/")), > 159: new Type("double", () -> GEN_DOUBLE.next(), List.of("+", "-", "*", "/")) You can directly use `GEN_X::next` instead of `() -> GEN_X.next()`. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java line 162: > 160: ); > 161: > 162: // Use the template with one arguments, and render it to a String. Suggestion: // Use the template with one argument and render it to a String. ------------- PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2820964558 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077181100 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077179114 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077177786 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077168801 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077184218 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077184500 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077174853 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077153706 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077154003 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077158958 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077159333 From thartmann at openjdk.org Wed May 7 09:31:19 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 7 May 2025 09:31:19 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v2] In-Reply-To: References: <3xXLZZOHl6oejisEzmNv206aQo4y6FuJoWhsOO_GWqM=.682d7701-baa2-4654-8216-e4de526456d1@github.com> <_jqax2exjj4DnvqP-lVK4kiwJ59C0XS6B8DE6quAHGc=.579945f7-36a6-49f8-9b04-c0fe63f60a5f@github.com> <4n5OGLVPn8sEuDgcJqZ5oKco3N_trnSxHNwyBawRQF4=.fe8ecf59-fb55-49c9-b8da-99efee63dde4@github.com> <21d6dUz886V6-TbWTNyt22KX6UaBNbsfZQz3hnQVjNA=.edb3bbd3-0dba-4e54-92cf-88e680cc5149@github.com> Message-ID: On Wed, 7 May 2025 08:34:28 GMT, Daniel Lund?n wrote: >>> Is this test still valid? >> >> I think it's as valid as any other regression test. With time, there's no guarantee that the test would still trigger the original issue if the fix would accidentally be reverted. But these tests still have a lot of value because they trigger state that apparently no other test triggered and often they still reproduce the issue with older JDK releases (or reveal new issues with new changes). >> >>> but this assert appears to be removed? >> >> It's still here, see: >> https://github.com/openjdk/jdk/blob/910d77d39e6fb9ca339272c75fa4ff7ff99bffcf/src/hotspot/share/opto/gcm.cpp#L889-L890 >> >>> does the test on top of which the comment is placed (test4 right?) ever run with loop strip mining verification disabled and if it does, does the assert get triggered? >> >> It's not possible to disable loop strip mining but IIRC I disabled it manually to see what other issues we hit without verification, for example in release builds (see also my comments in https://github.com/openjdk/jdk/pull/2315). So this test basically covers a different failure mode. >> >> I think Daniel's change is good as it is. > > Thanks @TobiHartmann. Note that this changeset does remove the assert, and replaces it with another (stronger, in theory) assert. Ah right, I missed that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2077222399 From adinn at openjdk.org Wed May 7 09:55:16 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 7 May 2025 09:55:16 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: <7e6TPADKIO-d9cqpuhk-O-4bEX1esJsQdFtztwF5gcU=.8df43b49-7e62-4aa4-8f14-184b9376467b@github.com> Message-ID: On Wed, 7 May 2025 03:51:32 GMT, Ashutosh Mehra wrote: > I suspect I missed porting a change from premain. @adinn @vnkozlov any idea what that could be? @ashu-mehra Just to explain what is going on here: This is a performance trick. When compressed oops base is null r12 (aka rheapbase) will have been initialized to zero so it can be used as a zero register. This allows, for example, a move instruction to employ a register operand immediate rather than include a 64 bit zero value in the instruction stream, which results in reduced code size. In this case the two moves are zeroing the current Java thread's frame anchor fields, last Java frame pc and sp, which are only set while the thread is in native. This trick is fine when generated code is run in the same JVM but no use if the code is generated in a VM with zero compressed oops base then reloaded into a JVM where it is no longer null. So, Vladimir's advice is to disable this trick when generating AOT code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2857925946 From epeter at openjdk.org Wed May 7 10:14:35 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 10:14:35 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v8] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - Whitespace - Suggestions by Christian Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/be1c0ee9..c4e5184e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=06-07 Stats: 9 lines in 2 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Wed May 7 10:14:35 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 10:14:35 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v7] In-Reply-To: <0b56TIXbIwSy7Zo77WAx4uweu2kM8iAmjPMomeT3sts=.06d78493-7b94-4386-a7be-4fb65837926b@github.com> References: <0b56TIXbIwSy7Zo77WAx4uweu2kM8iAmjPMomeT3sts=.06d78493-7b94-4386-a7be-4fb65837926b@github.com> Message-ID: <51C7t1NybRKPPxsxLd25BO7i4DfkbPAYt3Nsy_bSelw=.78f74012-065e-4ac4-a2a3-a853ede155c7@github.com> On Tue, 1 Apr 2025 07:18:45 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > typo test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 144: > 142: * the {@link Template}s provide hashtag replacements in the Strings: the {@link Template} argument > 143: * names are captured, and the argument values automatically replace any {@code "#name"} in the Strings. See the > 144: * different overloads of {@link #make} for examples. Additional hashtag replacements can be defined Suggestion: * different overloads of {@link #make} for examples. Additional hashtag replacements can be defined ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077290499 From epeter at openjdk.org Wed May 7 10:17:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 10:17:39 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 - Whitespace - Suggestions by Christian Co-authored-by: Christian Hagedorn - typo - For Christian: example and more intro - fix hashtag - manual merge - Apply suggestions from code review Co-authored-by: Christian Hagedorn - move library - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 - ... and 6 more: https://git.openjdk.org/jdk/compare/0844745e...fae7ced6 ------------- Changes: https://git.openjdk.org/jdk/pull/24217/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=08 Stats: 4191 lines in 25 files changed: 4191 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From duke at openjdk.org Wed May 7 10:27:16 2025 From: duke at openjdk.org (erifan) Date: Wed, 7 May 2025 10:27:16 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 06:59:34 GMT, Jatin Bhateja wrote: >> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Refactor code >> >> Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this >> optimization, making the code more modular. >> - Merge branch 'master' into JDK-8354242 >> - Update the jtreg test >> - Merge branch 'master' into JDK-8354242 >> - Addressed some review comments >> >> 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >> 2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >> This patch optimizes the following patterns: >> For integer types: >> ``` >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> ``` >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> ``` >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> ``` >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: >> With option `-XX:UseSVE=2`: >> ``` >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.156... > > src/hotspot/share/opto/vectornode.cpp line 2231: > >> 2229: } >> 2230: if (in1->Opcode() != Op_VectorMaskCmp || in1->outcnt() > 1 || >> 2231: !((VectorMaskCmpNode*) in1)->predicate_can_be_inverted() || > > Do you plan to extend your testcase / matching logic to cover following equivalent patterns: > > - compare.xor(maskAll(true)) > - compare.xor(VectorMask.fromLong(SPECIES, -1L)) Hi @jatin-bhateja It is feasible. But I was thinking about whether another solution would be better, which is to turn `VectorMask.fromLong(SPECIES, -1L)` into `MaskAll(true)` in the mid-end. In this way, we don't need to check this pattern in this optimization. What do you think ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2077316033 From jbhateja at openjdk.org Wed May 7 11:06:21 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 7 May 2025 11:06:21 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 10:24:14 GMT, erifan wrote: >> src/hotspot/share/opto/vectornode.cpp line 2231: >> >>> 2229: } >>> 2230: if (in1->Opcode() != Op_VectorMaskCmp || in1->outcnt() > 1 || >>> 2231: !((VectorMaskCmpNode*) in1)->predicate_can_be_inverted() || >> >> Do you plan to extend your testcase / matching logic to cover following equivalent patterns: >> >> - compare.xor(maskAll(true)) >> - compare.xor(VectorMask.fromLong(SPECIES, -1L)) > > Hi @jatin-bhateja It is feasible. But I was thinking about whether another solution would be better, which is to turn `VectorMask.fromLong(SPECIES, -1L)` into `MaskAll(true)` in the mid-end. In this way, we don't need to check this pattern in this optimization. What do you think ? Yes, that's the right approach. For this PR, I think you can mix some test points covering compare, xor(maskAll(true)). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2077373581 From jbhateja at openjdk.org Wed May 7 11:09:16 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 7 May 2025 11:09:16 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2] In-Reply-To: <7IGU54ppPKoebXOt0BaS9r-eYf92mtILbHLu8RsfmSk=.a0b3259c-e5ea-43a2-b34c-38d439bd0a41@github.com> References: <7IGU54ppPKoebXOt0BaS9r-eYf92mtILbHLu8RsfmSk=.a0b3259c-e5ea-43a2-b34c-38d439bd0a41@github.com> Message-ID: On Fri, 25 Apr 2025 09:17:02 GMT, Jatin Bhateja wrote: >> Thanks for telling me this information. Another more important reason to check outcnt here is to prevent this optimization when the uses of VectorMaskCmp is greater than 1, because this optimization may not be worthwhile. For example: >> >> >> public static void testVectorMaskCmp() { >> IntVector bv = IntVector.fromArray(I_SPECIES, ib, 0); >> IntVector av = IntVector.fromArray(I_SPECIES, ia, 0); >> VectorMask m1 = av.compare(VectorOperators.NE, bv); // two uses >> VectorMask m2 =m1.not(); >> m1.intoArray(m, 0); >> av.lanewise(VectorOperators.ABS, m2).intoArray(ia, 0); >> } >> >> >> If we do not check outcnt and still do this optimization, two VectorMaskCmp nodes will be generated, and finally two VectorMaskCmp instructions will be generated. This is unreasonable because VectorMaskCmp has much higher latency than xor instruction on aarch64. > > Thanks, we can add this comment to the code where we are checking outcnt. What if all the other users are also XorNodes?. At present, you are checking for one XOR user; shouldn't it be all or one scenario? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2077378879 From duke at openjdk.org Wed May 7 11:21:45 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Wed, 7 May 2025 11:21:45 GMT Subject: RFR: 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list [v4] In-Reply-To: References: Message-ID: > Issue: The assertion failure , `assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list`, occurs when [loop striping mining ](https://bugs.openjdk.org/browse/JDK-8186027)may create a [MaxL](https://bugs.openjdk.org/browse/JDK-8324655) after macro expansion. > > Analysis : Before the macro nodes are expanded in` expand_macro_nodes`, there is a process where nodes from the macro list are eliminated. This also includes elimination of any `OuterStripMinedLoop` node in the macro list. The bug occurs due to the refining of the strip mined loop in `adjust_strip_mined_loop` function just before it is eliminated. In this case, a` MaxL` node is added to the macro list in `adjust_strip_mined_loop`. > > Fix: The fix involves performing the refining of the strip mined loop before elimination process. More specifically, moving the `adjust_strip_mined_loop` function outside the elimination loop. > > Improvement: The process of eliminating macro nodes by calling `eliminate_macro_nodes` and performing additional Opaque and LoopLimit nodes elimination in ` expand_macro_nodes` is unintuitive as suggested in [JDK-8325478 ](https://bugs.openjdk.org/browse/JDK-8325478) and the current fix should be moved along with the other elimination code. Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: addressing review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24890/files - new: https://git.openjdk.org/jdk/pull/24890/files/a45e1340..e8bcbc9b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24890&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24890&range=02-03 Stats: 4 lines in 2 files changed: 3 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24890/head:pull/24890 PR: https://git.openjdk.org/jdk/pull/24890 From jbhateja at openjdk.org Wed May 7 11:24:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 7 May 2025 11:24:17 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 02:10:56 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Refactor code > > Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this > optimization, making the code more modular. > - Merge branch 'master' into JDK-8354242 > - Update the jtreg test > - Merge branch 'master' into JDK-8354242 > - Addressed some review comments > > 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. > 2. Improve code comments. > - Merge branch 'master' into JDK-8354242 > - Merge branch 'master' into JDK-8354242 > - 8354242: VectorAPI: combine vector not operation with compare > > This patch optimizes the following patterns: > For integer types: > ``` > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > ``` > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > ``` > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > ``` > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: > With option `-XX:UseSVE=2`: > ``` > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4... test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java line 40: > 38: @Fork(jvmArgs = { "--add-modules=jdk.incubator.vector" }) > 39: public class MaskCompareNotBenchmark { > 40: private static final int ARRAYLEN = 4096; ARRAYLEN should be configurable @Param. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2077390413 From gbarany at openjdk.org Wed May 7 11:27:36 2025 From: gbarany at openjdk.org (=?UTF-8?B?R2VyZ8O2?= Barany) Date: Wed, 7 May 2025 11:27:36 GMT Subject: RFR: 8354443: [Graal] crash after deopt in TestG1BarrierGeneration.java Message-ID: Remove special cases in `nmethod::is_deopt_entry` and `nmethod::is_deopt_mh_entry`. Graal used to generate a different code pattern from C2 for deopt handlers. This was changed in https://github.com/oracle/graal/commit/099f57b58edb23ed2184c11badea24edf36f30d2 to align Graal's code generation with C2. The special cases are no longer needed. ------------- Commit messages: - 8354443: [Graal] crash after deopt in TestG1BarrierGeneration.java Changes: https://git.openjdk.org/jdk/pull/25088/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25088&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354443 Stats: 11 lines in 1 file changed: 0 ins; 9 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25088.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25088/head:pull/25088 PR: https://git.openjdk.org/jdk/pull/25088 From rcastanedalo at openjdk.org Wed May 7 11:28:15 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 7 May 2025 11:28:15 GMT Subject: RFR: 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000) In-Reply-To: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> References: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> Message-ID: On Wed, 30 Apr 2025 10:30:33 GMT, Daniel Lund?n wrote: > Certain idealizations introduce more new nodes than expected when adding the new assert in the changeset for [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833). The limit checked by the new assert is too optimistic. > > ### Changeset > > Tweak the maximum live node increase per iteration in the main IGVN loop from `NodeLimitFudgeFactor * 2` (4000 by default) to `NodeLimitFudgeFactor * 3` (6000 by default). This change does not only affect the newly added assert in [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833), but also the IGVN live node count bailout which is `MaxNodeLimit` minus the maximum live node increase per iteration. That is, the bailout by default is currently at 80000 - 4000 = 76000 live nodes, and 80000 - 6000 = 74000 live nodes after this changeset. In practice, the difference does not matter (see Testing below). > > The motivation for just tweaking the limit and keeping the assert added by [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833) is that individual IGVN transformations (within a single iteration of the IGVN loop) should, in theory, only affect a local set of nodes in the ideal graph. Therefore, the assert is a good sanity check that various transformations (current ones and whatever we might add in the future) do not scale in the size of the ideal graph (i.e., they are local transformations). > > I have not managed to construct a reliable regression test, as triggering the assert is difficult (highly intermittent). Also, the issue is benign (a too optimistic limit). > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14594986152) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Checked IGVN live node count bailouts in DaCapo, Renaissance, SPECjvm, and SPECjbb and observed no bailouts before nor after this changeset. Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24960#pullrequestreview-2821390391 From rcastanedalo at openjdk.org Wed May 7 11:28:16 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 7 May 2025 11:28:16 GMT Subject: RFR: 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000) In-Reply-To: References: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> Message-ID: On Wed, 7 May 2025 09:00:51 GMT, Daniel Lund?n wrote: > The issue with weakening the per-iteration assertion in special cases is that we _must_ ensure that we do not grow by more than `max_live_nodes_increase_per_iteration` in a single iteration. Below is my failure analysis for [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833) which describes the issue. Fair enough, thanks for the explanation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24960#issuecomment-2858208510 From jbhateja at openjdk.org Wed May 7 11:40:05 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 7 May 2025 11:40:05 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v15] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: - Making _features_bitmap size configurable - cleanups & refactorings ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/650e3d61..cfc09d05 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=13-14 Stats: 192 lines in 9 files changed: 58 ins; 87 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From epeter at openjdk.org Wed May 7 11:44:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 11:44:15 GMT Subject: RFR: 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000) In-Reply-To: References: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> Message-ID: <3CAMBOGrG_N4pDKewDh5lREic6ulRh-WXqR1iV034NY=.f9925ac4-f978-4a2c-9ac0-83e822cb8771@github.com> On Wed, 7 May 2025 09:00:51 GMT, Daniel Lund?n wrote: > I agree that we should investigate further, but suggest to do this as a separate RFE (to not continue polluting testing pipelines with the assert). My semi-educated guess is that the new nodes are added as part of the call to MemNode::optimize_memory_chain in PhiNode::Ideal. @dlunde Ok, then let's declare this as a "quickfix", and file a follow-up RFE. Maybe it should also be declared a lower-priority bug? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24960#issuecomment-2858257342 From epeter at openjdk.org Wed May 7 11:50:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 11:50:23 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v7] In-Reply-To: References: <0b56TIXbIwSy7Zo77WAx4uweu2kM8iAmjPMomeT3sts=.06d78493-7b94-4386-a7be-4fb65837926b@github.com> Message-ID: On Wed, 7 May 2025 08:57:18 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> typo > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java line 159: > >> 157: new Type("long", () -> GEN_LONG.next(), List.of("+", "-", "*", "&", "|", "^")), >> 158: new Type("float", () -> GEN_FLOAT.next(), List.of("+", "-", "*", "/")), >> 159: new Type("double", () -> GEN_DOUBLE.next(), List.of("+", "-", "*", "/")) > > You can directly use `GEN_X::next` instead of `() -> GEN_X.next()`. Ah thanks, nice simplification! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077446307 From epeter at openjdk.org Wed May 7 11:56:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 11:56:24 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v7] In-Reply-To: References: <0b56TIXbIwSy7Zo77WAx4uweu2kM8iAmjPMomeT3sts=.06d78493-7b94-4386-a7be-4fb65837926b@github.com> Message-ID: On Wed, 7 May 2025 09:05:20 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> typo > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 422: > >> 420: * Multi-line string >> 421: * """, >> 422: * "normal string ", Integer.valueOf(3), Float.valueOf(1.5f), > > Is `valueOf()` required or would it auto-box? Just updated the `TestTemplate.java`, and it would auto-box. I'll improve the example here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077452189 From epeter at openjdk.org Wed May 7 11:56:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 11:56:24 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v7] In-Reply-To: References: <0b56TIXbIwSy7Zo77WAx4uweu2kM8iAmjPMomeT3sts=.06d78493-7b94-4386-a7be-4fb65837926b@github.com> Message-ID: On Wed, 7 May 2025 11:50:43 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 422: >> >>> 420: * Multi-line string >>> 421: * """, >>> 422: * "normal string ", Integer.valueOf(3), Float.valueOf(1.5f), >> >> Is `valueOf()` required or would it auto-box? > > Just updated the `TestTemplate.java`, and it would auto-box. I'll improve the example here. Expanded the example, and comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077456136 From chagedorn at openjdk.org Wed May 7 12:03:10 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 7 May 2025 12:03:10 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 10:17:39 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 > - Whitespace > - Suggestions by Christian > > Co-authored-by: Christian Hagedorn > - typo > - For Christian: example and more intro > - fix hashtag > - manual merge > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - move library > - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 > - ... and 6 more: https://git.openjdk.org/jdk/compare/0844745e...fae7ced6 Next batch of comments. Will probably resume tomorrow :-) test/hotspot/jtreg/compiler/lib/template_framework/README.md line 6: > 4: We want to make it easy to generate variants of tests. Often, we would like to have a set of tests, corresponding to a set of types, a set of operators, a set of constants, etc. Writing all the tests by hand is cumbersome or even impossible. When generating such tests with scripts, it would be preferable if the code generation happens automatically, and the generator script was checked into the code base. Code generation can go beyond simple regression tests, and one might want to generate random code from a list of possible templates, to fuzz individual Java features and compiler optimizations. > 5: > 6: The Template Framework provides a facility to generate code with Templates. Templates are essencially a list of tokens that are concatenated (i.e. rendered) to a String. The Templates can have "holes", which are filled (replaced) by different values at each Template instantiation. For example, these "holes" can be filled with different types, operators or constants. Templates can also be nested, allowing a modular use of the Templates. Suggestion: The Template Framework provides a facility to generate code with Templates. Templates are essencially a list of tokens that are concatenated (i.e. rendered) to a String. The Templates can have "holes", which are filled (replaced) by different values at each Template instantiation. For example, these "holes" can be filled with different types, operators or constants. Templates can also be nested, allowing a modular use of Templates. test/hotspot/jtreg/compiler/lib/template_framework/README.md line 8: > 6: The Template Framework provides a facility to generate code with Templates. Templates are essencially a list of tokens that are concatenated (i.e. rendered) to a String. The Templates can have "holes", which are filled (replaced) by different values at each Template instantiation. For example, these "holes" can be filled with different types, operators or constants. Templates can also be nested, allowing a modular use of the Templates. > 7: > 8: The Template Framework only generates code in the form of a String. This code can then be compiled and executed, for example with help of the [Compile Framework](../compile_framework/README.md). Suggestion: The Template Framework only generates code in the form of a String. This code can then be compiled and executed, for example with the help of the [Compile Framework](../compile_framework/README.md). test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 50: > 48: * filled (replaced) by different values at each Template instantiation. For example, these "holes" can > 49: * be filled with different types, operators or constants. Templates can also be nested, allowing a modular > 50: * use of the Templates. Suggestion: * use of Templates. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 54: > 52: *

> 53: * Example: > 54: * The following are snippets from the example test {@code TestAdvanced.java}. Suggestion: * The following snippets are from the example test {@code TestAdvanced.java}. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 57: > 55: * First, we define a template that generates a {@code @Test} method for a given type, operator and > 56: * constant generator. We define two constants {@code con1} and {@code con2}, and then use a multiline > 57: * string with hashtag {@code #} "holes" that are then replaced by the template arguments and the Suggestion: * string with hashtags {@code #} (i.e. "holes") that are then replaced by the template arguments and the test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 81: > 79: * } > 80: * > 81: * To get an executable test, we define a class Template, which takes a list of types, Not entirely clear what you mean with "a class Template". Do you mean "we define a Template"? test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 82: > 80: * > 81: * To get an executable test, we define a class Template, which takes a list of types, > 82: * and calls the test template for each type and operator. We use the {@code TestFramework} Suggestion: * and calls the {@code testTemplate} defined above for each type and operator. We use the {@code TestFramework} test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 105: > 103: * """, > 104: * // Call the testTemplate for each type and operator, generating a > 105: * // list of list of TemplateWithArgs: Suggestion: * // list of lists of TemplateWithArgs: test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 126: > 124: * new Type("long", () -> GEN_LONG.next(), List.of("+", "-", "*", "&", "|", "^")), > 125: * new Type("float", () -> GEN_FLOAT.next(), List.of("+", "-", "*", "/")), > 126: * new Type("double", () -> GEN_DOUBLE.next(), List.of("+", "-", "*", "/")) Same here as commented earlier: You can directly use `GEN_X::next()`. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 166: > 164: * > 165: *

> 166: * A {@link TemplateBinding} allows the recurisve use of {@link Template}s. With the indirection of such a binding, Suggestion: * A {@link TemplateBinding} allows the recursive use of {@link Template}s. With the indirection of such a binding, test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 170: > 168: * with a certain amount of {@link #fuel}, which is decreased at each {@link Template} nesting by a certain amount > 169: * (can be changed with {@link #setFuelCost}). Recursive templates are supposed to terminate once the {@link #fuel} > 170: * is depleated (i.e. reaches zero). Suggestion: * is depleted (i.e. reaches zero). test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 175: > 173: * Code generation often involves defining fields and variables, which are then available inside a defined > 174: * scope, and can be sampled in any nested scope. To allow the use of names for multiple applications (e.g. > 175: * fields, variables, methods, etc), we define a {@link Name}, which captures the {@link String} representation Suggestion: * fields, variables, methods, etc.), we define a {@link Name}, which captures the {@link String} representation test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 178: > 176: * to be used in code, as well as its type and if it is mutable. One can add such a {@link Name} to the > 177: * current code scope with {@link #addName}, and sample from the current or outer scopes with {@link #sampleName}. > 178: * When generating code, one might want to create {@link Name}s (variables, fields, etc) in local scope, or Suggestion: * When generating code, one might want to create {@link Name}s (variables, fields, etc.) in local scope, or test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 179: > 177: * current code scope with {@link #addName}, and sample from the current or outer scopes with {@link #sampleName}. > 178: * When generating code, one might want to create {@link Name}s (variables, fields, etc) in local scope, or > 179: * in some outer scope with the use of {@link Hook}s. Maybe mention here again that all of the explained above can be found in tutorial like examples (I guess in `TestTutorial`)?. Because it was not that easy to grasp how these different options to create Templates now work in practice. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 61: > 59: CompileFramework comp = new CompileFramework(); > 60: > 61: // Add java source files. Maybe it would also be nice to see the actually generated strings for the templates. Should we add an easy way to do this just for the tutorials in this file? Maybe we can do it by asking the user to pass an environment property like `-DPrintTemplates=true` or something like that. Or is there already a way provided by the framework to print the resulting templates on demand? test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 85: > 83: // This example shows the use of various Tokens. > 84: public static String generateWithListOfTokens() { > 85: // A Template is essencially a function / lambda that produces a Suggestion: // A Template is essentially a function / lambda that produces a test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 89: > 87: var templateClass = Template.make(() -> body( > 88: // The "body" method is filled by a sequence of "Tokens". > 89: // This can be Strings and multi-line Strings, but also Suggestion: // These can be Strings and multi-line Strings, but also test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 98: > 96: System.out.println("Hello World!"); > 97: """, > 98: "int a = ", Integer.valueOf(1), ";\n", Might be better to use `System.lineSeparator()` instead of `\n` to be platform independent. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 102: > 100: // Special Float values are "smartly" formatted! > 101: "float nan = ", Float.valueOf(Float.POSITIVE_INFINITY), ";\n", > 102: "boolean c = ", Boolean.valueOf(true), ";\n", Are these explicit calls to `valueOf()` necessary? Aren't these auto-boxed? test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 105: > 103: // Lists of Tokens are also allowed: > 104: List.of("int ", "d = 5", ";\n"), > 105: // That can be great for streaming / mapping over an existing list: By "that" you just mean the following line? Maybe rephrase to: "We can also stream / map over an existing list or one created on the fly: test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 106: > 104: List.of("int ", "d = 5", ";\n"), > 105: // That can be great for streaming / mapping over an existing list: > 106: List.of(3, 5, 7, 11).stream().map(i -> "System.out.println(" + i + ");\n").toList(), You can use `Stream.of()`: Suggestion: Stream.of(3, 5, 7, 11).map(i -> "System.out.println(" + i + ");\n").toList(), test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 131: > 129: "System.out.println(", arg, ");\n", // capture arg via lambda argument > 130: "System.out.println(#arg);\n", // capture arg via hashtag replacement > 131: "if (#arg != ", arg, ") { throw new RuntimeException(\"mismatch\"); }\n" When should I use the lambda argument and when the hashtag replacement? Maybe add a comment here for some guidance or link to later tutorials where it becomes obvious. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 143: > 141: public static void main() { > 142: """, > 143: templateHello.withArgs(), `withArgs()` looks strange when there are no args. Could we find a better name for it? But maybe I'm missing a pattern here. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 156: > 154: } > 155: > 156: // Example with hashtag replacements (arguments and let), and $-name renamings. Tacking a break now from reviewing. Bookmark for myself :-) ------------- PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2821262179 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077333259 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077333828 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077338055 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077338763 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077341028 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077349886 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077350455 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077350740 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077355385 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077416870 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077418958 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077420881 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077421229 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077422864 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077453865 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077433478 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077434239 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077441698 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077435250 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077444105 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077436099 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077457338 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077460724 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077465045 From epeter at openjdk.org Wed May 7 12:03:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 12:03:12 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v7] In-Reply-To: References: <0b56TIXbIwSy7Zo77WAx4uweu2kM8iAmjPMomeT3sts=.06d78493-7b94-4386-a7be-4fb65837926b@github.com> Message-ID: On Wed, 7 May 2025 09:07:16 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> typo > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 59: > >> 57: * string with hashtag {@code #} "holes" that are then replaced by the template arguments and the >> 58: * {@link #let} definitions. >> 59: * > > Should we also insert `

` between the snippets? Same further down at method comments. Ah nice catch! I put them in at the beginning, but then expanded the documentation and forgot it there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077464122 From epeter at openjdk.org Wed May 7 12:03:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 12:03:00 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v10] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: 3 more suggestions by Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/fae7ced6..9c95f6aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=08-09 Stats: 20 lines in 3 files changed: 5 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Wed May 7 12:10:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 12:10:04 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v11] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: More suggestions by Christian Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/9c95f6aa..f689a902 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=09-10 Stats: 14 lines in 3 files changed: 0 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Wed May 7 12:14:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 12:14:25 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 10:46:58 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - Whitespace >> - Suggestions by Christian >> >> Co-authored-by: Christian Hagedorn >> - typo >> - For Christian: example and more intro >> - fix hashtag >> - manual merge >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - move library >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - ... and 6 more: https://git.openjdk.org/jdk/compare/0844745e...fae7ced6 > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 81: > >> 79: * } >> 80: * >> 81: * To get an executable test, we define a class Template, which takes a list of types, > > Not entirely clear what you mean with "a class Template". Do you mean "we define a Template"? I meant to say "a Template that has a class body with a main method" or similar. Will update the comment. > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 126: > >> 124: * new Type("long", () -> GEN_LONG.next(), List.of("+", "-", "*", "&", "|", "^")), >> 125: * new Type("float", () -> GEN_FLOAT.next(), List.of("+", "-", "*", "/")), >> 126: * new Type("double", () -> GEN_DOUBLE.next(), List.of("+", "-", "*", "/")) > > Same here as commented earlier: You can directly use `GEN_X::next()`. done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077483285 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077486002 From dlunden at openjdk.org Wed May 7 12:26:18 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 7 May 2025 12:26:18 GMT Subject: RFR: 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000) In-Reply-To: References: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> Message-ID: On Wed, 7 May 2025 11:25:42 GMT, Roberto Casta?eda Lozano wrote: >> @dafedafe >>> Looks good to me. Just a quick (curiosity) question: why did you choose to multiply by 3 and not, for instance, doubling the current max amount (i.e. x 4)? >> >> Thanks for the review! We'd like to keep this bound as small as possible so that we do not get a too conservative IGVN node count bailout. But, I guess using 4 instead of 3 doesn't really matter in practice and perhaps it looks cleaner to use a power of 2. >> >> @robcasloz @eme64 >>> If we cannot bound the amount of nodes that can be created by PhiNode::Ideal, wouldn't it be more robust to simply disable the single-iteration node increase assertion for PhiNode? Otherwise there is the risk that we encounter the failure again with a slightly larger test case. Alternatively, if we could (?) derive a tighter bound for PhiNode (e.g. based on its number of inputs, number of memory slices for memory phis, etc.) we could try to compute it and use it in the assertion. >> >>> @robcasloz @dlunde Yes, such an exception may help us keep tight bounds on most nodes. And maybe we can even quantify more precisely how many nodes we expect to be created by PhiNode::Ideal. Maybe it is somehow linear in its inputs? >> >> The issue with weakening the per-iteration assertion in special cases is that we _must_ ensure that we do not grow by more than `max_live_nodes_increase_per_iteration` in a single iteration. Below is my failure analysis for [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833) which describes the issue. >> >>> After the changes for JDK-8333393, we apply a Phi idealization, involving splitting Phis through MergeMems, a lot more frequently. This idealization internally applies further idealizations for new Phi nodes generated during the idealization. In certain cases, these internal idealizations result in a large increase of live nodes within a single iteration of the main IGVN loop in PhaseIterGVN::optimize. In particular, when we are close to the MaxNodeLimit (80 000 by default), it can happen that we go from below MaxNodeLimit - NodeLimitFudgeFactor * 2 (= 76 000 by default) to more than 80 000 nodes in a single iteration. In such cases, the node count bailout at the top of the PhaseIterGVN::optimize loop does not trigger as expected and we instead crash at an assert in node creation as we surpass MaxNodeLimit nodes. >> >> I guess we could just remove [the assert during node creation](https://github.com/openjdk/jdk/blob/50554fa1982f042fb1d7b6c8a16334b97b31bb63/src/hotspot/share/opto/nod... > >> The issue with weakening the per-iteration assertion in special cases is that we _must_ ensure that we do not grow by more than `max_live_nodes_increase_per_iteration` in a single iteration. Below is my failure analysis for [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833) which describes the issue. > > Fair enough, thanks for the explanation. Thanks for the review @robcasloz! @eme64 > @dlunde Ok, then let's declare this as a "quickfix", and file a follow-up RFE. Maybe it should also be declared a lower-priority bug? Sounds good to me. Yes, definitely lower priority for now. We do not even know if the transformation is expected or not, although I agree it looks suspicious. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24960#issuecomment-2858382888 From epeter at openjdk.org Wed May 7 12:26:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 12:26:27 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: Message-ID: <7SzNLUFr7QE8t33ha1QUtU8KDmuZObcpuvSkAnIelPU=.9132b498-e247-4874-b86d-33aef480d31a@github.com> On Wed, 7 May 2025 11:31:45 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - Whitespace >> - Suggestions by Christian >> >> Co-authored-by: Christian Hagedorn >> - typo >> - For Christian: example and more intro >> - fix hashtag >> - manual merge >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - move library >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - ... and 6 more: https://git.openjdk.org/jdk/compare/0844745e...fae7ced6 > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 179: > >> 177: * current code scope with {@link #addName}, and sample from the current or outer scopes with {@link #sampleName}. >> 178: * When generating code, one might want to create {@link Name}s (variables, fields, etc) in local scope, or >> 179: * in some outer scope with the use of {@link Hook}s. > > Maybe mention here again that all of the explained above can be found in tutorial like examples (I guess in `TestTutorial`)?. Because it was not that easy to grasp how these different options to create Templates now work in practice. Ok, fair. This is just a high level explanation. Especially Hooks and Names are also not the "starter features", I think. So it's ok if you have to go look at the examples or other uses, I think. > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 98: > >> 96: System.out.println("Hello World!"); >> 97: """, >> 98: "int a = ", Integer.valueOf(1), ";\n", > > Might be better to use `System.lineSeparator()` instead of `\n` to be platform independent. Hmm. You may be right. But then again, this also worked on Windows... what platform would it even fail on? > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 102: > >> 100: // Special Float values are "smartly" formatted! >> 101: "float nan = ", Float.valueOf(Float.POSITIVE_INFINITY), ";\n", >> 102: "boolean c = ", Boolean.valueOf(true), ";\n", > > Are these explicit calls to `valueOf()` necessary? Aren't these auto-boxed? Removed the boxing, good idea! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077497848 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077501215 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077500191 From epeter at openjdk.org Wed May 7 12:26:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 12:26:27 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: <7SzNLUFr7QE8t33ha1QUtU8KDmuZObcpuvSkAnIelPU=.9132b498-e247-4874-b86d-33aef480d31a@github.com> References: <7SzNLUFr7QE8t33ha1QUtU8KDmuZObcpuvSkAnIelPU=.9132b498-e247-4874-b86d-33aef480d31a@github.com> Message-ID: On Wed, 7 May 2025 12:21:28 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 98: >> >>> 96: System.out.println("Hello World!"); >>> 97: """, >>> 98: "int a = ", Integer.valueOf(1), ";\n", >> >> Might be better to use `System.lineSeparator()` instead of `\n` to be platform independent. > > Hmm. You may be right. But then again, this also worked on Windows... what platform would it even fail on? Plus, it is really clunky to use the much longer `System.lineSeparator()` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077502445 From epeter at openjdk.org Wed May 7 12:26:28 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 12:26:28 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: <7SzNLUFr7QE8t33ha1QUtU8KDmuZObcpuvSkAnIelPU=.9132b498-e247-4874-b86d-33aef480d31a@github.com> Message-ID: On Wed, 7 May 2025 12:22:11 GMT, Emanuel Peter wrote: >> Hmm. You may be right. But then again, this also worked on Windows... what platform would it even fail on? > > Plus, it is really clunky to use the much longer `System.lineSeparator()` ? I prefer multiline strings, but that does not always work. `\n` is just less of a disturbance. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077505572 From epeter at openjdk.org Wed May 7 12:29:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 12:29:21 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:45:46 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - Whitespace >> - Suggestions by Christian >> >> Co-authored-by: Christian Hagedorn >> - typo >> - For Christian: example and more intro >> - fix hashtag >> - manual merge >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - move library >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - ... and 6 more: https://git.openjdk.org/jdk/compare/0844745e...fae7ced6 > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 105: > >> 103: // Lists of Tokens are also allowed: >> 104: List.of("int ", "d = 5", ";\n"), >> 105: // That can be great for streaming / mapping over an existing list: > > By "that" you just mean the following line? Maybe rephrase to: "We can also stream / map over an existing list or one created on the fly: haha, now we kinda removed the list, since we are doing stream direclty. I think I will revert your suggestion here, back to `List.of().stream`, just to make clear that we can do all of that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2077510202 From roland at openjdk.org Wed May 7 13:20:23 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 7 May 2025 13:20:23 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v15] In-Reply-To: References: <-rQb2ZR6hrzt-7Q0EwQqlxjvVuDQQOgYqzX3tZVPL38=.2577f4e0-c35f-434e-88d1-f0db41bb5364@github.com> Message-ID: On Wed, 23 Apr 2025 09:18:27 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 41 commits: >> >> - merge fix >> - Merge branch 'master' into JDK-8342692 >> - merge fix >> - Merge branch 'master' into JDK-8342692 >> - merge >> - Merge branch 'master' into JDK-8342692 >> - Merge branch 'master' into JDK-8342692 >> - whitespace >> - Merge branch 'master' into JDK-8342692 >> - TestMemorySegment test fix >> - ... and 31 more: https://git.openjdk.org/jdk/compare/dc5c4148...065abb29 > > test/hotspot/jtreg/compiler/rangechecks/TestLongRangeCheck.java line 37: > >> 35: * @run main/othervm -ea -Xbootclasspath/a:. -XX:+UnlockDiagnosticVMOptions -XX:+WhiteBoxAPI -XX:-BackgroundCompilation -XX:-UseOnStackReplacement >> 36: * -XX:+UnlockExperimentalVMOptions -XX:PerMethodSpecTrapLimit=5000 -XX:PerMethodTrapLimit=100 -XX:+IgnoreUnrecognizedVMOptions >> 37: * -XX:-StressShortRunningLongLoop > > Why was this necessary? A comment could be nice. That one doesn't appear to be needed anymore. I removed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2077609428 From dnsimon at openjdk.org Wed May 7 13:25:15 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 7 May 2025 13:25:15 GMT Subject: RFR: 8354443: [Graal] crash after deopt in TestG1BarrierGeneration.java In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:17:52 GMT, Gerg? Barany wrote: > Remove special cases in `nmethod::is_deopt_entry` and `nmethod::is_deopt_mh_entry`. Graal used to generate a different code pattern from C2 for deopt handlers. This was changed in https://github.com/oracle/graal/commit/099f57b58edb23ed2184c11badea24edf36f30d2 to align Graal's code generation with C2. The special cases are no longer needed. LGTM ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25088#pullrequestreview-2821736057 From roland at openjdk.org Wed May 7 13:25:23 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 7 May 2025 13:25:23 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v15] In-Reply-To: References: <-rQb2ZR6hrzt-7Q0EwQqlxjvVuDQQOgYqzX3tZVPL38=.2577f4e0-c35f-434e-88d1-f0db41bb5364@github.com> Message-ID: On Wed, 23 Apr 2025 09:17:21 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 41 commits: >> >> - merge fix >> - Merge branch 'master' into JDK-8342692 >> - merge fix >> - Merge branch 'master' into JDK-8342692 >> - merge >> - Merge branch 'master' into JDK-8342692 >> - Merge branch 'master' into JDK-8342692 >> - whitespace >> - Merge branch 'master' into JDK-8342692 >> - TestMemorySegment test fix >> - ... and 31 more: https://git.openjdk.org/jdk/compare/dc5c4148...065abb29 > > test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 799: > >> 797: IRNode.ADD_VI, "> 0", >> 798: IRNode.STORE_VECTOR, "> 0"}, >> 799: applyIfAnd = { "ShortRunningLongLoop", "true", "AlignVector", "false" }, > > Can you just copy the IR rule, please, so that we still have a failing rule without `ShortRunningLongLoop`? > > The reason I have it here is so that I will catch these cases that are currently not properly vectorized... and it would be a shame if we lost these tests. > > Also: can we whitelist `ShortRunningLongLoop` for the IR framework? I think we should make sure that we run all these MemorySegment tests with `ShortRunningLongLoop` enabled and disabled, just to make sure everything is ok with and without. > > What do you think? > > FYI: I'm making changes to this test again in https://github.com/openjdk/jdk/pull/24278. But I don't want to hold you back here with that. > > Still: maybe you can take my approach with `NoSpeculativeAliasingCheck`, and add a run with `ShortRunningLongLoop` enabled or disabled. Just to make sure we have at least something running with both enabled and also with disabled. Wouldn't I then need to duplicate every `@run` line in the test i.e.: @run driver compiler.loopopts.superword.TestMemorySegment ByteArray @run driver compiler.loopopts.superword.TestMemorySegment ByteArray AlignVector would become: @run driver compiler.loopopts.superword.TestMemorySegment ByteArray @run driver compiler.loopopts.superword.TestMemorySegment ByteArray AlignVector @run driver compiler.loopopts.superword.TestMemorySegment ByteArray ShortLoop @run driver compiler.loopopts.superword.TestMemorySegment ByteArray AlignVector ShortLoop Same for `CharArray` etc... That seems like a lot of extra complexity. Or would it be sufficient to only add it for `ByteArray` to have the non short loop case at least minimally covered? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2077618768 From roland at openjdk.org Wed May 7 13:30:35 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 7 May 2025 13:30:35 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v16] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 43 commits: - review - Merge branch 'master' into JDK-8342692 - merge fix - Merge branch 'master' into JDK-8342692 - merge fix - Merge branch 'master' into JDK-8342692 - merge - Merge branch 'master' into JDK-8342692 - Merge branch 'master' into JDK-8342692 - whitespace - ... and 33 more: https://git.openjdk.org/jdk/compare/4458719a...ed774a56 ------------- Changes: https://git.openjdk.org/jdk/pull/21630/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=15 Stats: 1311 lines in 24 files changed: 1252 ins; 13 del; 46 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From roland at openjdk.org Wed May 7 13:30:36 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 7 May 2025 13:30:36 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v9] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 09:29:21 GMT, Emanuel Peter wrote: >> @eme64 yes, it's ready for review. > > @rwestrel I'll have a look at the predicate code later, I'm a little scared of the complexity there. But maybe we cannot do better to avoid circularity...? > @chhagedorn You definitely need to eventually look at the predicate code, you're the expert here :) @eme64 I pushed an update that should your other comments ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2858608620 From yzheng at openjdk.org Wed May 7 13:36:16 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 7 May 2025 13:36:16 GMT Subject: RFR: 8354443: [Graal] crash after deopt in TestG1BarrierGeneration.java In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:17:52 GMT, Gerg? Barany wrote: > Remove special cases in `nmethod::is_deopt_entry` and `nmethod::is_deopt_mh_entry`. Graal used to generate a different code pattern from C2 for deopt handlers. This was changed in https://github.com/oracle/graal/commit/099f57b58edb23ed2184c11badea24edf36f30d2 to align Graal's code generation with C2. The special cases are no longer needed. LGTM ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/25088#pullrequestreview-2821790193 From adinn at openjdk.org Wed May 7 13:38:26 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 7 May 2025 13:38:26 GMT Subject: RFR: 8356085: JVM crash: Internal Error (codeBuffer.cpp:1005), pid=65197, tid=29187 Message-ID: This patch merges the ZGC-specific component of the compiler stubs buffer size configuration into the default size. The stubs are actually independent of ZGC but the extra space is depended on by normal builds that include ZGC which means that cross-compile builds which exclude ZGC are failing. Now the space is the same in either case. ------------- Commit messages: - 8356085 JVM crash: Internal Error (codeBuffer.cpp:1005), pid=65197, tid=29187 Changes: https://git.openjdk.org/jdk/pull/25094/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25094&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356085 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25094.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25094/head:pull/25094 PR: https://git.openjdk.org/jdk/pull/25094 From shade at openjdk.org Wed May 7 13:49:14 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 7 May 2025 13:49:14 GMT Subject: RFR: 8356085: JVM crash: Internal Error (codeBuffer.cpp:1005), pid=65197, tid=29187 In-Reply-To: References: Message-ID: On Wed, 7 May 2025 13:33:11 GMT, Andrew Dinn wrote: > This patch merges the ZGC-specific component of the compiler stubs buffer size configuration into the default size. The stubs are actually independent of ZGC but the extra space is depended on by normal builds that include ZGC which means that cross-compile builds which exclude ZGC are failing. Now the space is the same in either case. Looks fine. I suggest to rename the issue to something more relevant. It is also fairly wild (although somewhat understandable) that we build build-jdk in a different configuration, but it is not a concern for this bugfix. ------------- PR Review: https://git.openjdk.org/jdk/pull/25094#pullrequestreview-2821835020 From kvn at openjdk.org Wed May 7 14:16:16 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 7 May 2025 14:16:16 GMT Subject: RFR: 8356085: AArch64: compiler stub buffer size wrongly depends on ZGC In-Reply-To: References: Message-ID: On Wed, 7 May 2025 13:33:11 GMT, Andrew Dinn wrote: > This patch merges the ZGC-specific component of the compiler stubs buffer size configuration into the default size. The stubs are actually independent of ZGC but the extra space is depended on by normal builds that include ZGC which means that cross-compile builds which exclude ZGC are failing. Now the space is the same in either case. Should we do this for all platforms? ------------- PR Review: https://git.openjdk.org/jdk/pull/25094#pullrequestreview-2821929958 From bkilambi at openjdk.org Wed May 7 14:20:54 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 7 May 2025 14:20:54 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations Message-ID: This patch adds aarch64 backend (both Neon and SVE) for FP16 vector operations - add, mul, sub, div, min, max, sqrt and fma. Testing: JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) pass on aarch64 which also includes the JTREG test to test the FP16 vector operations - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` ------------- Commit messages: - 8355585: Aarch64: Add aarch64 backend for Float16 vector operations Changes: https://git.openjdk.org/jdk/pull/25096/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25096&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355585 Stats: 1198 lines in 8 files changed: 495 ins; 0 del; 703 mod Patch: https://git.openjdk.org/jdk/pull/25096.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25096/head:pull/25096 PR: https://git.openjdk.org/jdk/pull/25096 From kvn at openjdk.org Wed May 7 14:30:27 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 7 May 2025 14:30:27 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 21:13:24 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - Fix win64 compile failures > > Signed-off-by: Ashutosh Mehra > - Fix AOTCodeFlags.java test > > Signed-off-by: Ashutosh Mehra > - Fix compile failure in minimal config > > Signed-off-by: Ashutosh Mehra > - Revert back changes that added AOTRuntimeConstants. > Ensure CompressedOops::base and CompressedKlssPointers::base does not > change in production run > > Signed-off-by: Ashutosh Mehra > - Fix merge conflicts > > Signed-off-by: Ashutosh Mehra > - Store/load AsmRemarks and DbgStrings in aot code cache > > Signed-off-by: Ashutosh Mehra > - Add missing external address in aarch64 > > Signed-off-by: Ashutosh Mehra > - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab In `premain` branch we don't have null in R12 because we set shift together with base when we generate AOT code: [compressedOops.cpp#L51](https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/oops/compressedOops.cpp#L51) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2858806920 From kvn at openjdk.org Wed May 7 14:43:15 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 7 May 2025 14:43:15 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: <7e6TPADKIO-d9cqpuhk-O-4bEX1esJsQdFtztwF5gcU=.8df43b49-7e62-4aa4-8f14-184b9376467b@github.com> Message-ID: On Wed, 7 May 2025 09:52:23 GMT, Andrew Dinn wrote: >>> I can recreate this test locally. Looking into it. >> >> Looks like the code for C2 blobs generated in assembly phase is not correct. >> For example, code for new_instance blob is: >> >> >> [0.141s][550409][aot,codecache,stubs] Decoding CodeBlob, name: C2 Runtime new_instance, at [0x00007fed0f904260, 0x00007fed0f9042b8] 88 bytes >> [0.141s][550409][aot,codecache,stubs] ;; N1: # out( B1 ) <- in( B3 B2 ) Freq: 1 >> [0.141s][550409][aot,codecache,stubs] ;; B1: # out( B3 B2 ) <- BLOCK HEAD IS JUNK Freq: 1 >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f904260: sub $0x8,%rsp >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f904267: mov %rbp,(%rsp) >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f90426b: mov %rsp,0x3e8(%r15) >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f904272: mov %rsi,%rdi >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f904275: mov %r15,%rsi >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f904278: movabs $0x7fed28908982,%r10 >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f904282: call *%r10 >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f904285: nopl 0x0(%rax,%rax,1) >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f90428d: mov %r12,0x3e8(%r15) >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f904294: mov %r12,0x3f0(%r15) >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f90429b: mov 0x440(%r15),%rax >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f9042a2: mov %r12,0x440(%r15) >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f9042a9: cmp 0x8(%r15),%r12 >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f9042ad: jne 0x00007fed0f9042b1 >> [0.141s][550409][aot,codecache,stubs] ;; B2: # out( N1 ) <- in( B1 ) Freq: 0.999999 >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f9042af: pop %rbp >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f9042b0: ret >> [0.141s][550409][aot,codecache,stubs] ;; B3: # out( N1 ) <- in( B1 ) Freq: 1e-06 >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f9042b1: pop %rbp >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f9042b2: jmp Stub::forward_exception >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f9042b7: hlt >> >> Look at the instructions generated after the call to runtime method: >> >> >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f90428d: mov %r12,0x3e8(%r15) >> [0.141s][550409][aot,codecache,stubs] 0x00007fed0f9042... > >> I suspect I missed porting a change from premain. @adinn @vnkozlov any idea what that could be? > > @ashu-mehra Just to explain what is going on here: > > This is a performance trick. When compressed oops base is null r12 (aka rheapbase) will have been initialized to zero so it can be used as a zero register. This allows, for example, a move instruction to employ a register operand immediate rather than include a 64 bit zero value in the instruction stream, which results in reduced code size. In this case the two moves are zeroing the current Java thread's frame anchor fields, last Java frame pc and sp, which are only set while the thread is in native. > > This trick is fine when generated code is run in the same JVM but no use if the code is generated in a VM with zero compressed oops base then reloaded into a JVM where it is no longer null. > > So, Vladimir's advice is to disable this trick when generating AOT code. @adinn Do you remember why we commented `UseCompatibleCompressedOops` setting?: https://github.com/openjdk/leyden/commit/478f86f9cd6df6b92c037c83d0540b9c5fe97e5c It is still not enabled - how we are not crashing in premain? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2858861733 From gbarany at openjdk.org Wed May 7 14:45:13 2025 From: gbarany at openjdk.org (=?UTF-8?B?R2VyZ8O2?= Barany) Date: Wed, 7 May 2025 14:45:13 GMT Subject: RFR: 8354443: [Graal] crash after deopt in TestG1BarrierGeneration.java In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:17:52 GMT, Gerg? Barany wrote: > Remove special cases in `nmethod::is_deopt_entry` and `nmethod::is_deopt_mh_entry`. Graal used to generate a different code pattern from C2 for deopt handlers. This was changed in https://github.com/oracle/graal/commit/099f57b58edb23ed2184c11badea24edf36f30d2 to align Graal's code generation with C2. The special cases are no longer needed. Thanks for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25088#issuecomment-2858866796 From duke at openjdk.org Wed May 7 14:45:13 2025 From: duke at openjdk.org (duke) Date: Wed, 7 May 2025 14:45:13 GMT Subject: RFR: 8354443: [Graal] crash after deopt in TestG1BarrierGeneration.java In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:17:52 GMT, Gerg? Barany wrote: > Remove special cases in `nmethod::is_deopt_entry` and `nmethod::is_deopt_mh_entry`. Graal used to generate a different code pattern from C2 for deopt handlers. This was changed in https://github.com/oracle/graal/commit/099f57b58edb23ed2184c11badea24edf36f30d2 to align Graal's code generation with C2. The special cases are no longer needed. @gergo- Your change (at version 8028476c2e28e2c168676209260fa68194f74cf1) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25088#issuecomment-2858870106 From gbarany at openjdk.org Wed May 7 14:52:20 2025 From: gbarany at openjdk.org (=?UTF-8?B?R2VyZ8O2?= Barany) Date: Wed, 7 May 2025 14:52:20 GMT Subject: Integrated: 8354443: [Graal] crash after deopt in TestG1BarrierGeneration.java In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:17:52 GMT, Gerg? Barany wrote: > Remove special cases in `nmethod::is_deopt_entry` and `nmethod::is_deopt_mh_entry`. Graal used to generate a different code pattern from C2 for deopt handlers. This was changed in https://github.com/oracle/graal/commit/099f57b58edb23ed2184c11badea24edf36f30d2 to align Graal's code generation with C2. The special cases are no longer needed. This pull request has now been integrated. Changeset: 90f0f1b8 Author: Gerg? Barany Committer: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/90f0f1b88badbf1f72d7b9434621457aa47cde30 Stats: 11 lines in 1 file changed: 0 ins; 9 del; 2 mod 8354443: [Graal] crash after deopt in TestG1BarrierGeneration.java Reviewed-by: dnsimon, yzheng ------------- PR: https://git.openjdk.org/jdk/pull/25088 From rcastanedalo at openjdk.org Wed May 7 14:55:29 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 7 May 2025 14:55:29 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v2] In-Reply-To: <95YBckz3m_3L4DtOY38G7BjOFvljWoqGRqV3EIJi2-8=.f06b86d4-6e2f-4cb6-b5e2-382c7831b4d3@github.com> References: <95YBckz3m_3L4DtOY38G7BjOFvljWoqGRqV3EIJi2-8=.f06b86d4-6e2f-4cb6-b5e2-382c7831b4d3@github.com> Message-ID: On Wed, 30 Apr 2025 10:17:34 GMT, Daniel Lund?n wrote: >> The current documentation for `PhaseCFG::insert_anti_dependences` is difficult to follow and sometimes even misleading. We should ensure the method is appropriately documented. >> >> ### Changeset >> >> - Rename `PhaseCFG::insert_anti_dependences` to `PhaseCFG::raise_above_anti_dependences`. The purpose of `PhaseCFG::raise_above_anti_dependences` is twofold: raise the load's LCA so that the load is scheduled before anti-dependent stores, and if necessary add anti-dependence edges between the load and certain anti-dependent stores (to ensure we later "raise" the load before anti-dependent stores in LCM). The name `PhaseCFG::insert_anti_dependences` suggests that we only add anti-dependence edges. The name `PhaseCFG::raise_above_anti_dependences`, therefore, seems more appropriate. >> - Significantly add to and revise the source code documentation of `PhaseCFG::raise_above_anti_dependences`. >> - Add, move, and revise `assert`s in `PhaseCFG::raise_above_anti_dependences`, including improved `assert` messages in a few places. >> - In the main worklist loop of `PhaseCFG::raise_above_anti_dependences`: >> - Clean up how we identify the search root (avoid mutation). >> - Add a missing early exit for `Phi` nodes when `LCA == early`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14706896111) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Updates after reviews Thanks for doing this, Daniel! `insert_anti_dependences` is indeed easier to understand after your proposed cleanups and additional comments. I have a few questions and suggestions. Please update also the reference to `insert_anti_dependencies` in `src/hotspot/share/adlc/output_h.cpp`. src/hotspot/share/opto/gcm.cpp line 667: > 665: > 666: //------------------------raise_above_anti_dependences--------------------------- > 667: // The argument load has a current scheduling range in the dominator tree that Could you start this long comment with a one-sentence summary of what the function does? I.e. something like "Enforce a scheduling of the given load where its input memory state is not overwritten by an anti-dependent store". src/hotspot/share/opto/gcm.cpp line 710: > 708: // B4, which means that the updated LCA is B2. Now, consider the store in B2. > 709: // Raising the LCA above B2 has no effect, because B2 is on the dominator tree > 710: // branch between early and the current LCA (in fact, B2 is the current LCA). I found this sentence a bit unclear, could you clarify what you mean by "has no effect"? src/hotspot/share/opto/gcm.cpp line 723: > 721: // edges back to the load. The caller is expected to eventually schedule the > 722: // load in the LCA, but may also hoist the load above the LCA, if it is not the > 723: // early block. What code expects the caller to schedule the load in the LCA? Maybe rephrase into something more relaxed like e.g. "The caller may schedule the load in the LCA, or it may hoist the load above the LCA, if it is not the early block.". src/hotspot/share/opto/gcm.cpp line 758: > 756: > 757: // Note the earliest legal placement of 'load', as determined by > 758: // by the unique point in the dominator tree where all memory effects Suggestion: // the unique point in the dominator tree where all memory effects src/hotspot/share/opto/gcm.cpp line 779: > 777: ResourceArea* area = Thread::current()->resource_area(); > 778: > 779: // Bookkeeping of possibly anti-dependent stores that we find outside of the Suggestion: // Bookkeeping of possibly anti-dependent stores that we find below the src/hotspot/share/opto/gcm.cpp line 785: > 783: Node_List non_early_stores(area); > 784: > 785: // Flag that indicates if we must attempt to raise the LCA after the main Suggestion: // Whether we must attempt to raise the LCA after the main Also, could you clarify what you mean by "attempt"? Could LCA raising fail somehow? src/hotspot/share/opto/gcm.cpp line 803: > 801: // MergeMems do not modify the memory state. Anti-dependent stores or memory > 802: // Phis may, however, exist downstream of MergeMems. Therefore, we must > 803: // permit the search to continue through MergeMems. Memory-state-modifying Now that you have already explained that "memory-state-modifying nodes" are also referred to as "stores", you could stick to using "stores" for brevity. src/hotspot/share/opto/gcm.cpp line 850: > 848: // - just past a MergeMem with the edge (MergeMem, use_mem_state). > 849: // we have passed a MergeMem and are now at an edge > 850: // (MergeMem, use_mem_state). Are these two lines intended to be here? src/hotspot/share/opto/gcm.cpp line 853: > 851: assert(def_mem_state == nullptr || def_mem_state == initial_mem || > 852: def_mem_state->is_MergeMem(), > 853: "invariant failed"); Suggestion: "unexpected memory state"); src/hotspot/share/opto/gcm.cpp line 892: > 890: > 891: // At this point, use_mem_state is either a store or a memory Phi. > 892: assert(!use_mem_state->is_MergeMem(), "invariant failed"); Suggestion: assert(!use_mem_state->is_MergeMem(), "use_mem_state should be either a store or a memory Phi"); src/hotspot/share/opto/gcm.cpp line 951: > 949: // which we must raise the LCA above (set_raise_LCA_mark), and keep > 950: // track of nodes that potentially need anti-dependence edges > 951: // (non_early_stores). The only exceptions to this is if we Suggestion: // (non_early_stores). The only exceptions to this are if we src/hotspot/share/opto/gcm.cpp line 957: > 955: // > 956: // After the worklist loop, we perform an efficient combined LCA-raising > 957: // operation over all marks and then only add anti-dependence edges where Suggestion: // operation over all marks and only then add anti-dependence edges where src/hotspot/share/opto/gcm.cpp line 1014: > 1012: pred_block->set_raise_LCA_mark(load_index); > 1013: must_raise_LCA = true; > 1014: } else /* if (pred_block == early */ { Suggestion: } else /* if (pred_block == early) */ { src/hotspot/share/opto/gcm.cpp line 1052: > 1050: } > 1051: } > 1052: // (Worklist is now empty; we have visited all possible anti-dependences.) Suggestion: // Worklist is now empty; we have visited all possible anti-dependences. test/hotspot/jtreg/compiler/loopopts/TestSplitIfPinnedLoadInStripMinedLoop.java line 141: > 139: > 140: // Same as test2 but with reference to inner loop induction variable 'j' and different order of instructions. > 141: // Triggered an assert in PhaseCFG::raise_above_anti_dependences if loop strip mining verification was disabled: If the proposed assertions are stronger than the one in mainline, there is no need to rewrite this sentence in past tense, in my opinion. ------------- PR Review: https://git.openjdk.org/jdk/pull/24926#pullrequestreview-2821989713 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2077769631 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2077771523 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2077772673 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2077774045 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2077776098 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2077778886 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2077783735 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2077786973 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2077789623 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2077792516 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2077793880 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2077794806 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2077796052 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2077813208 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2077819729 From adinn at openjdk.org Wed May 7 15:02:23 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 7 May 2025 15:02:23 GMT Subject: RFR: 8356085: AArch64: compiler stub buffer size wrongly depends on ZGC In-Reply-To: References: Message-ID: On Wed, 7 May 2025 14:13:22 GMT, Vladimir Kozlov wrote: > Should we do this for all platforms? That's quite an interesting question. The immediate answer is no. Only aarch64 was specifying extra space for ZGC. So, there is nothing more to do. The more interesting question is Q1) Why did aarch64 originally do this? The bonus question is Q2) Which ports allocate extra space for ZGC in other stubgen buffers? Q1. The only reason a buffer may need more space for ZGC is because the stubs it includes perform object reads or writes -- ZGC injects more barrier instructions than other GCs. It seems that the only candidates are the array copy stubs. On aarch64 these used to be generated in the compiler stubs buffer but the reorg moved them to final stubs as per other arches. That's why the extra ZGC space is redundant. Q2. Only aarch64, riscv and x86 allocate extra storage in the final stubs: src/hotspot/cpu/aarch64/stubDeclarations_aarch64.hpp: do_arch_blob(final, 20000 ZGC_ONLY(+60000)) src/hotspot/cpu/riscv/stubDeclarations_riscv.hpp: do_arch_blob(final, 20000 ZGC_ONLY(+10000)) src/hotspot/cpu/x86/stubDeclarations_x86.hpp: do_arch_blob(final, 31000 WINDOWS_ONLY(+22000) ZGC_ONLY(+20000)) I believe the other ports generate less copy routines overall and/or share more of the generated code so the size disparity with the default (G1) case is not enough to require an additional allocation (the size of the ZGC barriers may also be smaller for these ports -- not sure). src/hotspot/cpu/arm/stubDeclarations_arm.hpp: do_arch_blob(final, 22000) src/hotspot/cpu/ppc/stubDeclarations_ppc.hpp: do_arch_blob(final, 24000) src/hotspot/cpu/s390/stubDeclarations_s390.hpp: do_arch_blob(final, 20000) Bottom line: it doesn't look like there is any more work to do. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25094#issuecomment-2858937162 From asmehra at openjdk.org Wed May 7 15:06:19 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 7 May 2025 15:06:19 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: <7e6TPADKIO-d9cqpuhk-O-4bEX1esJsQdFtztwF5gcU=.8df43b49-7e62-4aa4-8f14-184b9376467b@github.com> Message-ID: On Wed, 7 May 2025 14:40:29 GMT, Vladimir Kozlov wrote: > It is still not enabled - how we are not crashing in premain? That's a news to me. I thought we were setting `UseCompatibleCompressedOops` to true which is why we are avoiding this issue in premain. But .. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2858954202 From adinn at openjdk.org Wed May 7 15:34:15 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 7 May 2025 15:34:15 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: <7e6TPADKIO-d9cqpuhk-O-4bEX1esJsQdFtztwF5gcU=.8df43b49-7e62-4aa4-8f14-184b9376467b@github.com> Message-ID: On Wed, 7 May 2025 14:40:29 GMT, Vladimir Kozlov wrote: >>> I suspect I missed porting a change from premain. @adinn @vnkozlov any idea what that could be? >> >> @ashu-mehra Just to explain what is going on here: >> >> This is a performance trick. When compressed oops base is null r12 (aka rheapbase) will have been initialized to zero so it can be used as a zero register. This allows, for example, a move instruction to employ a register operand immediate rather than include a 64 bit zero value in the instruction stream, which results in reduced code size. In this case the two moves are zeroing the current Java thread's frame anchor fields, last Java frame pc and sp, which are only set while the thread is in native. >> >> This trick is fine when generated code is run in the same JVM but no use if the code is generated in a VM with zero compressed oops base then reloaded into a JVM where it is no longer null. >> >> So, Vladimir's advice is to disable this trick when generating AOT code. > > @adinn Do you remember why we commented `UseCompatibleCompressedOops` setting?: > https://github.com/openjdk/leyden/commit/478f86f9cd6df6b92c037c83d0540b9c5fe97e5c > > It is still not enabled - how we are not crashing in premain? @vnkozlov Ergonomics comes into play. It is set to true in cdsConfig.cpp if CacheDataStore != nullptr. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2859064405 From yzheng at openjdk.org Wed May 7 15:39:20 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 7 May 2025 15:39:20 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v15] In-Reply-To: References: Message-ID: <4qUlnS5IhZxUDg2w5C3aAo_saQ1IXSnbkmSNwpgzpes=.092d9c5d-836d-41d9-aa9b-e94c4520fea7@github.com> On Wed, 7 May 2025 11:40:05 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: > > - Making _features_bitmap size configurable > - cleanups & refactorings JVMCI changes look good. Will run some Graal tests on this PR src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotJVMCIBackendFactory.java line 121: > 119: long featureIndex = bitIndex >>> featuresElementShiftCount; > 120: long featureBitMask = 1L << (bitIndex & featuresElementMask); > 121: assert featureIndex < featuresBitMapSize; `featuresBitMapSize` is size in bytes while `featureIndex` is index to long array ------------- PR Review: https://git.openjdk.org/jdk/pull/24329#pullrequestreview-2822266780 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2077922290 From asmehra at openjdk.org Wed May 7 15:40:16 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 7 May 2025 15:40:16 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: <7e6TPADKIO-d9cqpuhk-O-4bEX1esJsQdFtztwF5gcU=.8df43b49-7e62-4aa4-8f14-184b9376467b@github.com> Message-ID: <8z5J8AajcQA7YjCT8k4MFzgtN7el8CablPhGPvau9yY=.dac69bdc-6466-4415-8105-fea569573668@github.com> On Wed, 7 May 2025 15:31:35 GMT, Andrew Dinn wrote: >> @adinn Do you remember why we commented `UseCompatibleCompressedOops` setting?: >> https://github.com/openjdk/leyden/commit/478f86f9cd6df6b92c037c83d0540b9c5fe97e5c >> >> It is still not enabled - how we are not crashing in premain? > > @vnkozlov Ergonomics comes into play. It is set to true in cdsConfig.cpp if CacheDataStore != nullptr. @adinn in premain it is commented out - https://github.com/openjdk/leyden/blob/f09d2f7724c628c90df51eacb16b33fee710ed1a/src/hotspot/share/cds/cdsConfig.cpp#L790 #ifdef _LP64 // FLAG_SET_ERGO_IF_DEFAULT(UseCompatibleCompressedOops, true); // FIXME @iklam - merge with mainline - UseCompatibleCompressedOops #endif I don't see any other place it is set. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2859083913 From adinn at openjdk.org Wed May 7 15:46:19 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 7 May 2025 15:46:19 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: <8z5J8AajcQA7YjCT8k4MFzgtN7el8CablPhGPvau9yY=.dac69bdc-6466-4415-8105-fea569573668@github.com> References: <7e6TPADKIO-d9cqpuhk-O-4bEX1esJsQdFtztwF5gcU=.8df43b49-7e62-4aa4-8f14-184b9376467b@github.com> <8z5J8AajcQA7YjCT8k4MFzgtN7el8CablPhGPvau9yY=.dac69bdc-6466-4415-8105-fea569573668@github.com> Message-ID: <82gBa1jyJx2ufjpp1VfS5j9KeivBL9N5M8Rf-3pL8RY=.c7e5f335-a391-4d87-a1a1-808b92753f37@github.com> On Wed, 7 May 2025 15:37:54 GMT, Ashutosh Mehra wrote: > I don't see any other place it is set. I do ;-) Look at cdsConfig.cpp:468 in method CDSConfig::check_vm_args_consistency() if (CacheDataStore != nullptr) { // Leyden temp work-around: // // By default, when using CacheDataStore, use the HeapBasedNarrowOop mode so that // AOT code can be always work regardless of runtime heap range. // // If you are *absolutely sure* that the CompressedOops::mode() will be the same // between training and production runs (e.g., if you specify -Xmx128m // for both training and production runs, and you know the OS will always reserve // the heap under 4GB), you can explicitly disable this with: // java -XX:-UseCompatibleCompressedOops -XX:CacheDataStore=... // However, this is risky and there's a chance that the production run will be slower // because it is unable to load the AOT code cache. #ifdef _LP64 FLAG_SET_ERGO_IF_DEFAULT(UseCompatibleCompressedOops, true); // <== here #endif ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2859103569 From kvn at openjdk.org Wed May 7 15:50:14 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 7 May 2025 15:50:14 GMT Subject: RFR: 8356085: AArch64: compiler stub buffer size wrongly depends on ZGC In-Reply-To: References: Message-ID: On Wed, 7 May 2025 13:33:11 GMT, Andrew Dinn wrote: > This patch merges the ZGC-specific component of the compiler stubs buffer size configuration into the default size. The stubs are actually independent of ZGC but the extra space is depended on by normal builds that include ZGC which means that cross-compile builds which exclude ZGC are failing. Now the space is the same in either case. Okay. Should we just remove ZGC_ONLY() part without increase buffer's size? ------------- PR Review: https://git.openjdk.org/jdk/pull/25094#pullrequestreview-2822340818 From kvn at openjdk.org Wed May 7 15:55:13 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 7 May 2025 15:55:13 GMT Subject: RFR: 8356085: AArch64: compiler stub buffer size wrongly depends on ZGC In-Reply-To: References: Message-ID: On Wed, 7 May 2025 13:33:11 GMT, Andrew Dinn wrote: > This patch merges the ZGC-specific component of the compiler stubs buffer size configuration into the default size. The stubs are actually independent of ZGC but the extra space is depended on by normal builds that include ZGC which means that cross-compile builds which exclude ZGC are failing. Now the space is the same in either case. I read JBS comments and understand that we do need extra space for compiler's stubs on aarch64 regardless ZGC. Approved. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25094#pullrequestreview-2822353577 From kvn at openjdk.org Wed May 7 16:06:17 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 7 May 2025 16:06:17 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: <82gBa1jyJx2ufjpp1VfS5j9KeivBL9N5M8Rf-3pL8RY=.c7e5f335-a391-4d87-a1a1-808b92753f37@github.com> References: <7e6TPADKIO-d9cqpuhk-O-4bEX1esJsQdFtztwF5gcU=.8df43b49-7e62-4aa4-8f14-184b9376467b@github.com> <8z5J8AajcQA7YjCT8k4MFzgtN7el8CablPhGPvau9yY=.dac69bdc-6466-4415-8105-fea569573668@github.com> <82gBa1jyJx2ufjpp1VfS5j9KeivBL9N5M8Rf-3pL8RY=.c7e5f335-a391-4d87-a1a1-808b92753f37@github.com> Message-ID: On Wed, 7 May 2025 15:43:55 GMT, Andrew Dinn wrote: >> @adinn in premain it is commented out - >> https://github.com/openjdk/leyden/blob/f09d2f7724c628c90df51eacb16b33fee710ed1a/src/hotspot/share/cds/cdsConfig.cpp#L790 >> >> >> #ifdef _LP64 >> // FLAG_SET_ERGO_IF_DEFAULT(UseCompatibleCompressedOops, true); // FIXME @iklam - merge with mainline - UseCompatibleCompressedOops >> #endif >> >> >> I don't see any other place it is set. > >> I don't see any other place it is set. > > I do ;-) > > Look at cdsConfig.cpp:468 in method CDSConfig::check_vm_args_consistency() > > > > if (CacheDataStore != nullptr) { > // Leyden temp work-around: > // > // By default, when using CacheDataStore, use the HeapBasedNarrowOop mode so that > // AOT code can be always work regardless of runtime heap range. > // > // If you are *absolutely sure* that the CompressedOops::mode() will be the same > // between training and production runs (e.g., if you specify -Xmx128m > // for both training and production runs, and you know the OS will always reserve > // the heap under 4GB), you can explicitly disable this with: > // java -XX:-UseCompatibleCompressedOops -XX:CacheDataStore=... > // However, this is risky and there's a chance that the production run will be slower > // because it is unable to load the AOT code cache. > #ifdef _LP64 > FLAG_SET_ERGO_IF_DEFAULT(UseCompatibleCompressedOops, true); // <== here > #endif @adinn I think you are looking on old version of `premain` branch. The latest changeset is Igor's "Address review comments". ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2859162780 From sparasa at openjdk.org Wed May 7 16:12:46 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 7 May 2025 16:12:46 GMT Subject: RFR: 8356281: Fix for TestFPComparison failure due to incorrect result Message-ID: This PR fixes the cause of failure in TestFPComparison while using APX NDD instructions. The test passes after using this fix as shown below: Passed: compiler/c2/irTests/TestFPComparison.java Test results: passed: 1 ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR SKIP jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison.java 1 1 0 0 0 ============================== TEST SUCCESS ------------- Commit messages: - JDK-8356281: Fix for TestFPComparison failure due to incorrect result Changes: https://git.openjdk.org/jdk/pull/25101/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25101&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356281 Stats: 15 lines in 1 file changed: 0 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/25101.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25101/head:pull/25101 PR: https://git.openjdk.org/jdk/pull/25101 From adinn at openjdk.org Wed May 7 16:13:18 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 7 May 2025 16:13:18 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: <7MtJxoIoftDUXjZEGtVUX-_Yct3XfjbT65fSztm_FBA=.65168a5b-53fd-4fdc-a128-18d2fc1ade97@github.com> On Mon, 5 May 2025 21:13:24 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - Fix win64 compile failures > > Signed-off-by: Ashutosh Mehra > - Fix AOTCodeFlags.java test > > Signed-off-by: Ashutosh Mehra > - Fix compile failure in minimal config > > Signed-off-by: Ashutosh Mehra > - Revert back changes that added AOTRuntimeConstants. > Ensure CompressedOops::base and CompressedKlssPointers::base does not > change in production run > > Signed-off-by: Ashutosh Mehra > - Fix merge conflicts > > Signed-off-by: Ashutosh Mehra > - Store/load AsmRemarks and DbgStrings in aot code cache > > Signed-off-by: Ashutosh Mehra > - Add missing external address in aarch64 > > Signed-off-by: Ashutosh Mehra > - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab Ah yes, I was looking at an old version. Apologies for that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2859183350 From adinn at openjdk.org Wed May 7 16:38:15 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 7 May 2025 16:38:15 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 21:13:24 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - Fix win64 compile failures > > Signed-off-by: Ashutosh Mehra > - Fix AOTCodeFlags.java test > > Signed-off-by: Ashutosh Mehra > - Fix compile failure in minimal config > > Signed-off-by: Ashutosh Mehra > - Revert back changes that added AOTRuntimeConstants. > Ensure CompressedOops::base and CompressedKlssPointers::base does not > change in production run > > Signed-off-by: Ashutosh Mehra > - Fix merge conflicts > > Signed-off-by: Ashutosh Mehra > - Store/load AsmRemarks and DbgStrings in aot code cache > > Signed-off-by: Ashutosh Mehra > - Add missing external address in aarch64 > > Signed-off-by: Ashutosh Mehra > - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab I looked back at the history and found that this line got commented out as part of a merge from mainline (date: 2025-01-15 shaid 4cd4c7cdfae7b4b5eb3308abc8fa8d0ed7581ad8). I'm not at all sure why Ioi did this and have no idea why he added my name to the original FIXME. The very next change he made in the premain branch was to update the FIXME comment, resetting it to his name. I also don't understand why this is not biting us in premain. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2859254016 From kvn at openjdk.org Wed May 7 16:44:23 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 7 May 2025 16:44:23 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 21:13:24 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - Fix win64 compile failures > > Signed-off-by: Ashutosh Mehra > - Fix AOTCodeFlags.java test > > Signed-off-by: Ashutosh Mehra > - Fix compile failure in minimal config > > Signed-off-by: Ashutosh Mehra > - Revert back changes that added AOTRuntimeConstants. > Ensure CompressedOops::base and CompressedKlssPointers::base does not > change in production run > > Signed-off-by: Ashutosh Mehra > - Fix merge conflicts > > Signed-off-by: Ashutosh Mehra > - Store/load AsmRemarks and DbgStrings in aot code cache > > Signed-off-by: Ashutosh Mehra > - Add missing external address in aarch64 > > Signed-off-by: Ashutosh Mehra > - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab There could be testing failures which expect some state of compressed oops. I will test enabling this flag. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2859268199 From cjplummer at openjdk.org Wed May 7 17:03:22 2025 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 7 May 2025 17:03:22 GMT Subject: RFR: 8355003: Implement Ahead-of-Time Method Profiling [v14] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 21:50:34 GMT, Igor Veresov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments SA changes look good. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24886#pullrequestreview-2822536705 From shade at openjdk.org Wed May 7 17:08:14 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 7 May 2025 17:08:14 GMT Subject: RFR: 8356085: AArch64: compiler stub buffer size wrongly depends on ZGC In-Reply-To: References: Message-ID: On Wed, 7 May 2025 13:33:11 GMT, Andrew Dinn wrote: > This patch merges the ZGC-specific component of the compiler stubs buffer size configuration into the default size. The stubs are actually independent of ZGC but the extra space is depended on by normal builds that include ZGC which means that cross-compile builds which exclude ZGC are failing. Now the space is the same in either case. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25094#pullrequestreview-2822549813 From sparasa at openjdk.org Wed May 7 18:55:18 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 7 May 2025 18:55:18 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v20] In-Reply-To: References: Message-ID: > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: undo demotion for eimul for RRImm ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/ca6b83a3..228936e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=18-19 Stats: 47 lines in 4 files changed: 14 ins; 24 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From sparasa at openjdk.org Wed May 7 19:00:59 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 7 May 2025 19:00:59 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v21] In-Reply-To: References: Message-ID: <_2u1KhRhzp2BeKn9MaUQhSPH8OXCWturNkj21xz5nn4=.c40c4371-4b11-4ee6-8ef3-c29f9c6723a5@github.com> > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: eimull revert fully to original version ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/228936e2..8761f770 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=19-20 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From kvn at openjdk.org Wed May 7 19:11:02 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 7 May 2025 19:11:02 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: <1shOfSgE57gegdChqv-NY00cKuBQrI2Fq-UEZ5w0gR4=.123963df-22fe-4b43-a45f-659bc504c064@github.com> On Mon, 5 May 2025 21:13:24 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - Fix win64 compile failures > > Signed-off-by: Ashutosh Mehra > - Fix AOTCodeFlags.java test > > Signed-off-by: Ashutosh Mehra > - Fix compile failure in minimal config > > Signed-off-by: Ashutosh Mehra > - Revert back changes that added AOTRuntimeConstants. > Ensure CompressedOops::base and CompressedKlssPointers::base does not > change in production run > > Signed-off-by: Ashutosh Mehra > - Fix merge conflicts > > Signed-off-by: Ashutosh Mehra > - Store/load AsmRemarks and DbgStrings in aot code cache > > Signed-off-by: Ashutosh Mehra > - Add missing external address in aarch64 > > Signed-off-by: Ashutosh Mehra > - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab We were lucky. I reproduced issue in `premain` with HelloWord by running `AOTMode=create -Xmx4G` and product run with `Xmx31g`. It has the same `shift = 3` but different base (0 for 4Gb). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2859928242 From kvn at openjdk.org Wed May 7 19:34:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 7 May 2025 19:34:53 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 21:13:24 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - Fix win64 compile failures > > Signed-off-by: Ashutosh Mehra > - Fix AOTCodeFlags.java test > > Signed-off-by: Ashutosh Mehra > - Fix compile failure in minimal config > > Signed-off-by: Ashutosh Mehra > - Revert back changes that added AOTRuntimeConstants. > Ensure CompressedOops::base and CompressedKlssPointers::base does not > change in production run > > Signed-off-by: Ashutosh Mehra > - Fix merge conflicts > > Signed-off-by: Ashutosh Mehra > - Store/load AsmRemarks and DbgStrings in aot code cache > > Signed-off-by: Ashutosh Mehra > - Add missing external address in aarch64 > > Signed-off-by: Ashutosh Mehra > - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab Unfortunately we can't enable the flag (that is why Ioi commented it, I think): # Internal Error (/leyden/open/src/hotspot/share/oops/compressedOops.cpp:87), pid=1169225, tid=1169229 # assert((intptr_t)base() <= ((intptr_t)_heap_address_range.start() - (intptr_t)os::vm_page_size()) || base() == nullptr) failed: invalid value # # JRE version: (25.0) (fastdebug build ) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 25-internal-LTS-2025-05-07-1649004.vkozlov..., mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0xba81e9] CompressedOops::initialize(ReservedHeapSpace const&)+0x3c9 @ashu-mehra please restore verification check for COOP base you had. instead of patching `x86_64.ad` file ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2859997830 From kvn at openjdk.org Wed May 7 19:37:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 7 May 2025 19:37:54 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 21:13:24 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - Fix win64 compile failures > > Signed-off-by: Ashutosh Mehra > - Fix AOTCodeFlags.java test > > Signed-off-by: Ashutosh Mehra > - Fix compile failure in minimal config > > Signed-off-by: Ashutosh Mehra > - Revert back changes that added AOTRuntimeConstants. > Ensure CompressedOops::base and CompressedKlssPointers::base does not > change in production run > > Signed-off-by: Ashutosh Mehra > - Fix merge conflicts > > Signed-off-by: Ashutosh Mehra > - Store/load AsmRemarks and DbgStrings in aot code cache > > Signed-off-by: Ashutosh Mehra > - Add missing external address in aarch64 > > Signed-off-by: Ashutosh Mehra > - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab I think it is fine not use AOT code when the heap base does not match. Based on my test it will happen only with big heap's size difference between assembly and production runs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2860006997 From epeter at openjdk.org Wed May 7 20:08:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 20:08:52 GMT Subject: RFR: 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000) In-Reply-To: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> References: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> Message-ID: On Wed, 30 Apr 2025 10:30:33 GMT, Daniel Lund?n wrote: > Certain idealizations introduce more new nodes than expected when adding the new assert in the changeset for [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833). The limit checked by the new assert is too optimistic. > > ### Changeset > > Tweak the maximum live node increase per iteration in the main IGVN loop from `NodeLimitFudgeFactor * 2` (4000 by default) to `NodeLimitFudgeFactor * 3` (6000 by default). This change does not only affect the newly added assert in [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833), but also the IGVN live node count bailout which is `MaxNodeLimit` minus the maximum live node increase per iteration. That is, the bailout by default is currently at 80000 - 4000 = 76000 live nodes, and 80000 - 6000 = 74000 live nodes after this changeset. In practice, the difference does not matter (see Testing below). > > The motivation for just tweaking the limit and keeping the assert added by [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833) is that individual IGVN transformations (within a single iteration of the IGVN loop) should, in theory, only affect a local set of nodes in the ideal graph. Therefore, the assert is a good sanity check that various transformations (current ones and whatever we might add in the future) do not scale in the size of the ideal graph (i.e., they are local transformations). > > I have not managed to construct a reliable regression test, as triggering the assert is difficult (highly intermittent). Also, the issue is benign (a too optimistic limit). > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14594986152) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Checked IGVN live node count bailouts in DaCapo, Renaissance, SPECjvm, and SPECjbb and observed no bailouts before nor after this changeset. Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24960#pullrequestreview-2823057214 From epeter at openjdk.org Wed May 7 20:11:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 20:11:52 GMT Subject: RFR: 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000) In-Reply-To: References: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> Message-ID: <0I8vVoHwJofrSc2QzgtYPp965OS3GoNg_mQMvmCZfh0=.cf916f04-fd03-4429-bd9f-6f77abe0b3b0@github.com> On Wed, 7 May 2025 12:23:46 GMT, Daniel Lund?n wrote: >>> The issue with weakening the per-iteration assertion in special cases is that we _must_ ensure that we do not grow by more than `max_live_nodes_increase_per_iteration` in a single iteration. Below is my failure analysis for [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833) which describes the issue. >> >> Fair enough, thanks for the explanation. > > Thanks for the review @robcasloz! > > @eme64 >> @dlunde Ok, then let's declare this as a "quickfix", and file a follow-up RFE. Maybe it should also be declared a lower-priority bug? > > Sounds good to me. Yes, definitely lower priority for now. We do not even know if the transformation is expected or not, although I agree it looks suspicious. @dlunde Approved, with the assumption that you will file that follow-up RFE and link it with this issue :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24960#issuecomment-2860207012 From epeter at openjdk.org Wed May 7 20:20:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 20:20:56 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: Message-ID: <7S09X71FtIkY5UB0KtgQaqAYBHXevIjV_ympjaavP-Y=.10627e43-91d8-4bae-8ae1-ef248116cc8a@github.com> On Wed, 7 May 2025 12:26:45 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 105: >> >>> 103: // Lists of Tokens are also allowed: >>> 104: List.of("int ", "d = 5", ";\n"), >>> 105: // That can be great for streaming / mapping over an existing list: >> >> By "that" you just mean the following line? Maybe rephrase to: "We can also stream / map over an existing list or one created on the fly: > > haha, now we kinda removed the list, since we are doing stream direclty. I think I will revert your suggestion here, back to `List.of().stream`, just to make clear that we can do all of that. haha, now we kinda removed the list, since we are doing stream direclty. I think I will revert your suggestion here, back to `List.of().stream`, just to make clear that we can do all of that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2078426855 From epeter at openjdk.org Wed May 7 20:20:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 20:20:57 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: <7S09X71FtIkY5UB0KtgQaqAYBHXevIjV_ympjaavP-Y=.10627e43-91d8-4bae-8ae1-ef248116cc8a@github.com> References: <7S09X71FtIkY5UB0KtgQaqAYBHXevIjV_ympjaavP-Y=.10627e43-91d8-4bae-8ae1-ef248116cc8a@github.com> Message-ID: <8BkFI4DKHxtHjXAQ5xpVwb988YVmR3UDXUkqntNEOAM=.c3ea0836-152a-4f66-aef8-c52bdbc38efd@github.com> On Wed, 7 May 2025 20:16:16 GMT, Emanuel Peter wrote: >> haha, now we kinda removed the list, since we are doing stream direclty. I think I will revert your suggestion here, back to `List.of().stream`, just to make clear that we can do all of that. > > haha, now we kinda removed the list, since we are doing stream direclty. I think I will revert your suggestion here, back to `List.of().stream`, just to make clear that we can do all of that. I like your suggestion, applied! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2078429602 From epeter at openjdk.org Wed May 7 20:24:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 20:24:01 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:51:45 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - Whitespace >> - Suggestions by Christian >> >> Co-authored-by: Christian Hagedorn >> - typo >> - For Christian: example and more intro >> - fix hashtag >> - manual merge >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - move library >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - ... and 6 more: https://git.openjdk.org/jdk/compare/0844745e...fae7ced6 > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 61: > >> 59: CompileFramework comp = new CompileFramework(); >> 60: >> 61: // Add java source files. > > Maybe it would also be nice to see the actually generated strings for the templates. Should we add an easy way to do this just for the tutorials in this file? Maybe we can do it by asking the user to pass an environment property like `-DPrintTemplates=true` or something like that. Or is there already a way provided by the framework to print the resulting templates on demand? There is `-DCompileFrameworkVerbose=true`, which will print the code that the CompileFramework compiles. I think that would be good enough, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2078434343 From epeter at openjdk.org Wed May 7 20:32:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 20:32:14 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v12] In-Reply-To: References: Message-ID: <1iJUastuyTcf7qIxmkVtOFPkfvw2uLqZvETG_V2UOAo=.ea376a1d-4645-462d-a375-735e9072209f@github.com> > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: More for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/f689a902..b161b662 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=10-11 Stats: 24 lines in 2 files changed: 8 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Wed May 7 20:32:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 20:32:15 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: <7SzNLUFr7QE8t33ha1QUtU8KDmuZObcpuvSkAnIelPU=.9132b498-e247-4874-b86d-33aef480d31a@github.com> References: <7SzNLUFr7QE8t33ha1QUtU8KDmuZObcpuvSkAnIelPU=.9132b498-e247-4874-b86d-33aef480d31a@github.com> Message-ID: On Wed, 7 May 2025 12:19:20 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 179: >> >>> 177: * current code scope with {@link #addName}, and sample from the current or outer scopes with {@link #sampleName}. >>> 178: * When generating code, one might want to create {@link Name}s (variables, fields, etc) in local scope, or >>> 179: * in some outer scope with the use of {@link Hook}s. >> >> Maybe mention here again that all of the explained above can be found in tutorial like examples (I guess in `TestTutorial`)?. Because it was not that easy to grasp how these different options to create Templates now work in practice. > > Ok, fair. This is just a high level explanation. Especially Hooks and Names are also not the "starter features", I think. So it's ok if you have to go look at the examples or other uses, I think. I linked to the examples like `TestTutorial`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2078444613 From coleenp at openjdk.org Wed May 7 20:33:56 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 7 May 2025 20:33:56 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 07:23:39 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move to oops This is a cleaner way to do this. I believe it's what we discussed with Kim. He can confirm. Some questions and comments and a small nit. src/hotspot/share/compiler/compileBroker.cpp line 1697: > 1695: JavaThread* thread = JavaThread::current(); > 1696: > 1697: methodHandle method(thread, task->method()); I think this is safe because the Method* is in the CompileTask and redefinition will find it there. Being unsure of this is why this is here in a handle. src/hotspot/share/oops/unloadableMethodHandle.inline.hpp line 35: > 33: #include "oops/weakHandle.inline.hpp" > 34: > 35: inline UnloadableMethodHandle::UnloadableMethodHandle(Method* method) { This should initialize method in the ctor initializer list. src/hotspot/share/oops/unloadableMethodHandle.inline.hpp line 51: > 49: // Method holder class cannot be unloaded. > 50: return nullptr; > 51: } This is nice that this doesn't require creating a jni handle for unloadable class loaders with this change. src/hotspot/share/runtime/vmStructs.cpp line 1266: > 1264: declare_toplevel_type(CDSFileMapRegion) \ > 1265: declare_toplevel_type(UpcallStub::FrameData) \ > 1266: declare_toplevel_type(UnloadableMethodHandle) \ So are these left for the async profiler? ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24018#pullrequestreview-2823027214 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2078430169 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2078443576 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2078379288 PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2078446115 From epeter at openjdk.org Wed May 7 20:40:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 20:40:04 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:54:00 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - Whitespace >> - Suggestions by Christian >> >> Co-authored-by: Christian Hagedorn >> - typo >> - For Christian: example and more intro >> - fix hashtag >> - manual merge >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - move library >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - ... and 6 more: https://git.openjdk.org/jdk/compare/0844745e...fae7ced6 > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 131: > >> 129: "System.out.println(", arg, ");\n", // capture arg via lambda argument >> 130: "System.out.println(#arg);\n", // capture arg via hashtag replacement >> 131: "if (#arg != ", arg, ") { throw new RuntimeException(\"mismatch\"); }\n" > > When should I use the lambda argument and when the hashtag replacement? Maybe add a comment here for some guidance or link to later tutorials where it becomes obvious. Honestly, I don't yet have a clear answer for this. Hmm. I'm not sure this is the best place to give this guidance. I guess the difference is to use a separate "token" vs a hashtag replacement. - token: can paste anything. But it requires you to interrupt the string and add commas. That can be a little clunky. And: you can only do a recursive Template call with the token method. - hashtag: you need it captured as string, either by a template argument or `let`. Does not allow recursive template calls. But it looks a little nicer cosmetically. Is this somewhat helpful? Maybe I can put that somewhere later in the tutorial? What do you think? > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 143: > >> 141: public static void main() { >> 142: """, >> 143: templateHello.withArgs(), > > `withArgs()` looks strange when there are no args. Could we find a better name for it? But maybe I'm missing a pattern here. Hmm, yeah, that is a slight concern. But it does return a `TemplateWithArgs`, which means a template that knows all the arguments already. This one happens to be a zero-arg version. I suppose I could rename it to `withArgsNone()` or `withZeroArgs` or `withNoArgs` for the zero-args version? Would that be an improvement? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2078452435 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2078456128 From epeter at openjdk.org Wed May 7 20:49:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 20:49:57 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:59:48 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - Whitespace >> - Suggestions by Christian >> >> Co-authored-by: Christian Hagedorn >> - typo >> - For Christian: example and more intro >> - fix hashtag >> - manual merge >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - move library >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - ... and 6 more: https://git.openjdk.org/jdk/compare/0844745e...fae7ced6 > > Next batch of comments. Will probably resume tomorrow :-) @chhagedorn Thanks for all the suggestions and questions. I applied most, and responded to a few where we still need to discuss a little :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2860311640 From epeter at openjdk.org Wed May 7 20:49:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 20:49:57 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: <7SzNLUFr7QE8t33ha1QUtU8KDmuZObcpuvSkAnIelPU=.9132b498-e247-4874-b86d-33aef480d31a@github.com> Message-ID: On Wed, 7 May 2025 12:24:04 GMT, Emanuel Peter wrote: >> Plus, it is really clunky to use the much longer `System.lineSeparator()` ? > > I prefer multiline strings, but that does not always work. `\n` is just less of a disturbance. ![image](https://github.com/user-attachments/assets/51e6af52-a8de-4d1f-93fd-f89b910f1310) Boah, I feel like this really looks quite unreadable / distracting. It makes the lines super long. Makes me look for other "hacky" solutions. Maybe some `#\n` string, to make a hashtag replacement that gets you the newline / lineSeparator? ![image](https://github.com/user-attachments/assets/226f7022-c799-4b04-867b-5233d8c9ee7c) That looks a little better. This is what I used to have: ![image](https://github.com/user-attachments/assets/3035aacf-ca09-4f5d-868e-d92a0adf2d04) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2078465509 From epeter at openjdk.org Wed May 7 20:49:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 20:49:59 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 20:35:05 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 131: >> >>> 129: "System.out.println(", arg, ");\n", // capture arg via lambda argument >>> 130: "System.out.println(#arg);\n", // capture arg via hashtag replacement >>> 131: "if (#arg != ", arg, ") { throw new RuntimeException(\"mismatch\"); }\n" >> >> When should I use the lambda argument and when the hashtag replacement? Maybe add a comment here for some guidance or link to later tutorials where it becomes obvious. > > Honestly, I don't yet have a clear answer for this. Hmm. > I'm not sure this is the best place to give this guidance. > > I guess the difference is to use a separate "token" vs a hashtag replacement. > - token: can paste anything. But it requires you to interrupt the string and add commas. That can be a little clunky. And: you can only do a recursive Template call with the token method. > - hashtag: you need it captured as string, either by a template argument or `let`. Does not allow recursive template calls. But it looks a little nicer cosmetically. > > Is this somewhat helpful? Maybe I can put that somewhere later in the tutorial? What do you think? Maybe my guidance would be to prefer hashtag, if need be with a `let`. Especially if it is about inserting something on the same line. If it is on a new line, then the token method looks nicer often. For example if you stream over a list. And recursive Template calls just have to be "tokens". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2078468295 From epeter at openjdk.org Wed May 7 20:54:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 20:54:53 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: <7SzNLUFr7QE8t33ha1QUtU8KDmuZObcpuvSkAnIelPU=.9132b498-e247-4874-b86d-33aef480d31a@github.com> Message-ID: On Wed, 7 May 2025 20:44:11 GMT, Emanuel Peter wrote: >> I prefer multiline strings, but that does not always work. `\n` is just less of a disturbance. > > ![image](https://github.com/user-attachments/assets/51e6af52-a8de-4d1f-93fd-f89b910f1310) > > Boah, I feel like this really looks quite unreadable / distracting. It makes the lines super long. > > Makes me look for other "hacky" solutions. Maybe some `#\n` string, to make a hashtag replacement that gets you the newline / lineSeparator? > > ![image](https://github.com/user-attachments/assets/226f7022-c799-4b04-867b-5233d8c9ee7c) > That looks a little better. > > This is what I used to have: > ![image](https://github.com/user-attachments/assets/3035aacf-ca09-4f5d-868e-d92a0adf2d04) Or should we just keep the `\n`, and wait until someone actually has an issue with it? Because on all of the platforms we run this on it works, even Windows where a lineSeparator() is supposed to be `\r\n`. And I'm a little hesitant to implement something heavy if we are not sure we really need it ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2078475551 From sviswanathan at openjdk.org Wed May 7 21:16:55 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 7 May 2025 21:16:55 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v21] In-Reply-To: <_2u1KhRhzp2BeKn9MaUQhSPH8OXCWturNkj21xz5nn4=.c40c4371-4b11-4ee6-8ef3-c29f9c6723a5@github.com> References: <_2u1KhRhzp2BeKn9MaUQhSPH8OXCWturNkj21xz5nn4=.c40c4371-4b11-4ee6-8ef3-c29f9c6723a5@github.com> Message-ID: On Wed, 7 May 2025 19:00:59 GMT, Srinivas Vamsi Parasa wrote: >> Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. >> >> For example: >> >> `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding >> `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > eimull revert fully to original version Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24431#pullrequestreview-2823225963 From epeter at openjdk.org Wed May 7 21:25:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 21:25:53 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v14] In-Reply-To: References: Message-ID: <9lqWA3ERAtAuuUDJNS0gIQDtN-RTOH_C-sxC_4ALH5g=.46c2438c-bdf6-43e1-847d-56c6c51e5454@github.com> On Mon, 28 Apr 2025 15:31:18 GMT, Srinivas Vamsi Parasa wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> cleanup ecmov, eorw and other refactoring > > Hi Sandhya (@sviswa7) and Jatin (@jatin-bhateja), > > Could you please review the refactored changes? > > Thanks, > Vamsi @vamsi-parasa @sviswa7 Did you already test this with `sde` and the `-future` flag? Once this is fully reviewed I can also run our internal testing, just let me know when you are ready :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2860419497 From epeter at openjdk.org Wed May 7 21:26:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 7 May 2025 21:26:56 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v15] In-Reply-To: References: <-rQb2ZR6hrzt-7Q0EwQqlxjvVuDQQOgYqzX3tZVPL38=.2577f4e0-c35f-434e-88d1-f0db41bb5364@github.com> Message-ID: On Wed, 7 May 2025 13:22:30 GMT, Roland Westrelin wrote: >> test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 799: >> >>> 797: IRNode.ADD_VI, "> 0", >>> 798: IRNode.STORE_VECTOR, "> 0"}, >>> 799: applyIfAnd = { "ShortRunningLongLoop", "true", "AlignVector", "false" }, >> >> Can you just copy the IR rule, please, so that we still have a failing rule without `ShortRunningLongLoop`? >> >> The reason I have it here is so that I will catch these cases that are currently not properly vectorized... and it would be a shame if we lost these tests. >> >> Also: can we whitelist `ShortRunningLongLoop` for the IR framework? I think we should make sure that we run all these MemorySegment tests with `ShortRunningLongLoop` enabled and disabled, just to make sure everything is ok with and without. >> >> What do you think? >> >> FYI: I'm making changes to this test again in https://github.com/openjdk/jdk/pull/24278. But I don't want to hold you back here with that. >> >> Still: maybe you can take my approach with `NoSpeculativeAliasingCheck`, and add a run with `ShortRunningLongLoop` enabled or disabled. Just to make sure we have at least something running with both enabled and also with disabled. > > Wouldn't I then need to duplicate every `@run` line in the test i.e.: > > @run driver compiler.loopopts.superword.TestMemorySegment ByteArray > @run driver compiler.loopopts.superword.TestMemorySegment ByteArray AlignVector > > > would become: > > > @run driver compiler.loopopts.superword.TestMemorySegment ByteArray > @run driver compiler.loopopts.superword.TestMemorySegment ByteArray AlignVector > @run driver compiler.loopopts.superword.TestMemorySegment ByteArray ShortLoop > @run driver compiler.loopopts.superword.TestMemorySegment ByteArray AlignVector ShortLoop > > > Same for `CharArray` etc... > That seems like a lot of extra complexity. Or would it be sufficient to only add it for `ByteArray` to have the non short loop case at least minimally covered? Yeah, I would only do it for one or two cases. Doing it for all would be a little excessive, and eventually we have too many combinations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2078517077 From asmehra at openjdk.org Wed May 7 21:52:59 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 7 May 2025 21:52:59 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: <9EQGY48c-yRgmnLAPd3wXy1JsQ7xbiyzFXlHZHSuEqY=.94aba91d-a006-4832-967d-fb90da38ce6f@github.com> On Wed, 7 May 2025 19:35:34 GMT, Vladimir Kozlov wrote: >> Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Merge branch 'master' into preserve-runtime-blobs-master >> - Address Vladimir's comments >> >> Signed-off-by: Ashutosh Mehra >> - Remove irrelevant comment >> >> Signed-off-by: Ashutosh Mehra >> - Fix win64 compile failures >> >> Signed-off-by: Ashutosh Mehra >> - Fix AOTCodeFlags.java test >> >> Signed-off-by: Ashutosh Mehra >> - Fix compile failure in minimal config >> >> Signed-off-by: Ashutosh Mehra >> - Revert back changes that added AOTRuntimeConstants. >> Ensure CompressedOops::base and CompressedKlssPointers::base does not >> change in production run >> >> Signed-off-by: Ashutosh Mehra >> - Fix merge conflicts >> >> Signed-off-by: Ashutosh Mehra >> - Store/load AsmRemarks and DbgStrings in aot code cache >> >> Signed-off-by: Ashutosh Mehra >> - Add missing external address in aarch64 >> >> Signed-off-by: Ashutosh Mehra >> - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab > > I think it is fine not use AOT code when the heap base does not match. Based on my test it will happen only with big heap's size difference between assembly and production runs. @vnkozlov I wonder if that assert needs modification as well. This assert assumes that if `CompressedOops::_base` is not null, then it will be set to a page size before the heap range start. This is fine because _base is not null only when heap range goes beyond `OopEncodingHeapMax`, and in such cases `ReservedHeapSpace::noaccess_prefix` is equal to `os::vm_page_size()`. But with `UseCompatibleCompressedOops` the _base is set to non-null value even when heap range is within `OopEncodingHeapMax` and in such cases `ReservedHeapSpace::noaccess_prefix` is 0. I think the assert should be using `noaccess_prefix` instead of hard-coding `os::vm_page_size`: - assert((intptr_t)base() <= ((intptr_t)_heap_address_range.start() - (intptr_t)os::vm_page_size()) || + assert((intptr_t)base() <= ((intptr_t)_heap_address_range.start() - (intptr_t)heap_space.noaccess_prefix()) || With this change I can run the helloworld program with `UseCompatibleCompressedOops` enabled. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2860472892 From vlivanov at openjdk.org Wed May 7 21:53:58 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 7 May 2025 21:53:58 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v15] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:40:05 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: > > - Making _features_bitmap size configurable > - cleanups & refactorings There are some SA-related failures. Fixed by [1]. Otherwise, testing results are good. [1] https://github.com/iwanowww/jdk/commit/9100ef190befbb1967f477532a0776c135a9b728 src/hotspot/cpu/x86/vm_version_x86.hpp line 458: > 456: > 457: private: > 458: uint64_t _features_bitmap[(MAX_CPU_FEATURES >> 6) + 1]; Suggestion: uint64_t _features_bitmap[(MAX_CPU_FEATURES / BitsPerLong) + 1]; src/hotspot/cpu/x86/vm_version_x86.hpp line 460: > 458: uint64_t _features_bitmap[(MAX_CPU_FEATURES >> 6) + 1]; > 459: > 460: STATIC_ASSERT(sizeof(_features_bitmap) * BitsPerByte > MAX_CPU_FEATURES); Suggestion: STATIC_ASSERT(sizeof(_features_bitmap) * BitsPerByte >= MAX_CPU_FEATURES); ------------- PR Review: https://git.openjdk.org/jdk/pull/24329#pullrequestreview-2822970103 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2078346536 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2078354983 From vlivanov at openjdk.org Wed May 7 21:53:59 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 7 May 2025 21:53:59 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v15] In-Reply-To: <4qUlnS5IhZxUDg2w5C3aAo_saQ1IXSnbkmSNwpgzpes=.092d9c5d-836d-41d9-aa9b-e94c4520fea7@github.com> References: <4qUlnS5IhZxUDg2w5C3aAo_saQ1IXSnbkmSNwpgzpes=.092d9c5d-836d-41d9-aa9b-e94c4520fea7@github.com> Message-ID: On Wed, 7 May 2025 15:28:09 GMT, Yudi Zheng wrote: >> Jatin Bhateja has updated the pull request incrementally with two additional commits since the last revision: >> >> - Making _features_bitmap size configurable >> - cleanups & refactorings > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotJVMCIBackendFactory.java line 121: > >> 119: long featureIndex = bitIndex >>> featuresElementShiftCount; >> 120: long featureBitMask = 1L << (bitIndex & featuresElementMask); >> 121: assert featureIndex < featuresBitMapSize; > > `featuresBitMapSize` is size in bytes while `featureIndex` is index to long array Good catch, Yudi. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2078544595 From sparasa at openjdk.org Wed May 7 22:08:54 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 7 May 2025 22:08:54 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v14] In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 15:31:18 GMT, Srinivas Vamsi Parasa wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> cleanup ecmov, eorw and other refactoring > > Hi Sandhya (@sviswa7) and Jatin (@jatin-bhateja), > > Could you please review the refactored changes? > > Thanks, > Vamsi > @vamsi-parasa @sviswa7 Did you already test this with `sde` and the `-future` flag? Once this is fully reviewed I can also run our internal testing, just let me know when you are ready :) Hi Emanuel (@eme64), Thank you for the message! We're waiting for one more review from Jatin. Will let you know when that's completed. Thanks, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2860510641 From kvn at openjdk.org Wed May 7 22:32:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 7 May 2025 22:32:54 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 21:13:24 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - Fix win64 compile failures > > Signed-off-by: Ashutosh Mehra > - Fix AOTCodeFlags.java test > > Signed-off-by: Ashutosh Mehra > - Fix compile failure in minimal config > > Signed-off-by: Ashutosh Mehra > - Revert back changes that added AOTRuntimeConstants. > Ensure CompressedOops::base and CompressedKlssPointers::base does not > change in production run > > Signed-off-by: Ashutosh Mehra > - Fix merge conflicts > > Signed-off-by: Ashutosh Mehra > - Store/load AsmRemarks and DbgStrings in aot code cache > > Signed-off-by: Ashutosh Mehra > - Add missing external address in aarch64 > > Signed-off-by: Ashutosh Mehra > - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab I also hit failure in `premain` testing in runtime/cds/appcds/leyden/LeydenHello.java#aot test on Windows-x64 due to wrong Compress Class encoding due to different base: [0.031s][info][cds] The current max heap size = 1024M, G1HeapRegion::GrainBytes = 1048576 [0.031s][info][cds] narrow_klass_base = 0x0000015298000000, arrow_klass_pointer_bits = 32, narrow_klass_shift = 0 AOT code uses 0x800000000 instead : 88: 8b 78 08 mov edi,DWORD PTR [rax+0x8] 8b: 49 ba 00 00 00 00 08 movabs r10,0x800000000 92: 00 00 00 95: 49 03 fa add rdi,r10 98: 48 3b 5f 38 cmp rbx,QWORD PTR [rdi+0x38] So we need the relocation fix from current changes in `premain` branch too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2860595058 From kvn at openjdk.org Wed May 7 22:43:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 7 May 2025 22:43:53 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: <9EQGY48c-yRgmnLAPd3wXy1JsQ7xbiyzFXlHZHSuEqY=.94aba91d-a006-4832-967d-fb90da38ce6f@github.com> References: <9EQGY48c-yRgmnLAPd3wXy1JsQ7xbiyzFXlHZHSuEqY=.94aba91d-a006-4832-967d-fb90da38ce6f@github.com> Message-ID: On Wed, 7 May 2025 21:50:04 GMT, Ashutosh Mehra wrote: > With this change I can run the helloworld program with UseCompatibleCompressedOops enabled. I will try your assert fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2860612123 From kvn at openjdk.org Wed May 7 23:15:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 7 May 2025 23:15:54 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 21:13:24 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - Fix win64 compile failures > > Signed-off-by: Ashutosh Mehra > - Fix AOTCodeFlags.java test > > Signed-off-by: Ashutosh Mehra > - Fix compile failure in minimal config > > Signed-off-by: Ashutosh Mehra > - Revert back changes that added AOTRuntimeConstants. > Ensure CompressedOops::base and CompressedKlssPointers::base does not > change in production run > > Signed-off-by: Ashutosh Mehra > - Fix merge conflicts > > Signed-off-by: Ashutosh Mehra > - Store/load AsmRemarks and DbgStrings in aot code cache > > Signed-off-by: Ashutosh Mehra > - Add missing external address in aarch64 > > Signed-off-by: Ashutosh Mehra > - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab Several AOT tests failed on linux-x64 with flag enabled: $ make test CONF=fast TEST=test/hotspot/jtreg/runtime/cds/appcds/aotClassLinking TEST_OPTS_JAVA_OPTIONS="-Xmixed -XX:+UseCompatibleCompressedOops" ... TEST TOTAL PASS FAIL ERROR SKIP jtreg:open/test/hotspot/jtreg/runtime/cds/appcds/aotClassLinking >> 15 5 8 0 2 I moved `UseCompatibleCompressedOops` flag setting to `check_vm_args_consistency()` to affect current workflow. And I used `if (AOTCodeCache::is_caching_enabled()) {` for its setting. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2860658834 PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2860664022 From sviswanathan at openjdk.org Thu May 8 00:18:52 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 8 May 2025 00:18:52 GMT Subject: RFR: 8356281: Fix for TestFPComparison failure due to incorrect result In-Reply-To: References: Message-ID: <1s6dHPf6iddBRe5ide_kmaws8HRKfm80gRWjE0raZ7w=.c549a64f-cebb-4632-bcf8-64c55f9ff2d0@github.com> On Wed, 7 May 2025 16:05:53 GMT, Srinivas Vamsi Parasa wrote: > This PR fixes the cause of failure in TestFPComparison while using APX NDD instructions. > > The test passes after using this fix as shown below: > > Passed: compiler/c2/irTests/TestFPComparison.java > Test results: passed: 1 > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP > jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison.java > 1 1 0 0 0 > ============================== > TEST SUCCESS cmovP_regUCF_ndd instruct doesn't have UseAPX as predicate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25101#issuecomment-2860898836 From kvn at openjdk.org Thu May 8 00:18:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 8 May 2025 00:18:55 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 21:13:24 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - Fix win64 compile failures > > Signed-off-by: Ashutosh Mehra > - Fix AOTCodeFlags.java test > > Signed-off-by: Ashutosh Mehra > - Fix compile failure in minimal config > > Signed-off-by: Ashutosh Mehra > - Revert back changes that added AOTRuntimeConstants. > Ensure CompressedOops::base and CompressedKlssPointers::base does not > change in production run > > Signed-off-by: Ashutosh Mehra > - Fix merge conflicts > > Signed-off-by: Ashutosh Mehra > - Store/load AsmRemarks and DbgStrings in aot code cache > > Signed-off-by: Ashutosh Mehra > - Add missing external address in aarch64 > > Signed-off-by: Ashutosh Mehra > - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab I think for these changes we should not use AOT code when the heap base does not match. Something changed in compressed oops code which prevents enforcing encoding. We can investigate and fix it later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2860897941 From john.r.rose at oracle.com Thu May 8 00:31:28 2025 From: john.r.rose at oracle.com (John Rose) Date: Wed, 07 May 2025 17:31:28 -0700 Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: <3C8B5BF7-0024-44A9-8198-F2AF30F1C5BB@oracle.com> On 7 May 2025, at 12:37, Vladimir Kozlov wrote: > I think it is fine not use AOT code when the heap base does not match. Based on my test it will happen only with big heap's size difference between assembly and production runs. Won?t ASLR spoil matches on some platforms? The match must depend on how the VM negotiates memory segments with the OS. If the OS is committed to random segment assignment, then routine mismatch is something we should be prepared to address, as the issue comes up. (I?m not saying we need to address it now, if tests seem to behave properly. But I suspect even now there are some platforms that will throw a monkey wrench at us.) ASLR: https://en.wikipedia.org/wiki/Address_space_layout_randomization From kvn at openjdk.org Thu May 8 01:19:06 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 8 May 2025 01:19:06 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 21:13:24 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - Fix win64 compile failures > > Signed-off-by: Ashutosh Mehra > - Fix AOTCodeFlags.java test > > Signed-off-by: Ashutosh Mehra > - Fix compile failure in minimal config > > Signed-off-by: Ashutosh Mehra > - Revert back changes that added AOTRuntimeConstants. > Ensure CompressedOops::base and CompressedKlssPointers::base does not > change in production run > > Signed-off-by: Ashutosh Mehra > - Fix merge conflicts > > Signed-off-by: Ashutosh Mehra > - Store/load AsmRemarks and DbgStrings in aot code cache > > Signed-off-by: Ashutosh Mehra > - Add missing external address in aarch64 > > Signed-off-by: Ashutosh Mehra > - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab @rose00 We have solution for that: we enforce encoding instructions (UseCompatibleCompressedOops flag) and use relocation info to patch correct COOP base address when loading AOT code. The current issue is something changed in COOP code in runtime which cause next assert hit when we enforce encoding and use AOT code: [instanceKlass.cpp#L794](https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/oops/instanceKlass.cpp#L794) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2861107654 From asmehra at openjdk.org Thu May 8 01:36:52 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 8 May 2025 01:36:52 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: <8W_FRkLbamdZ6l0Lkbn8WqXv_JXPjG-i5hBus2foor4=.4f80cd55-4141-46ff-8436-0cbbc9228461@github.com> On Thu, 8 May 2025 00:15:46 GMT, Vladimir Kozlov wrote: > I think for these changes we should not use AOT code when the heap base does not match. Something changed in compressed oops code which prevents enforcing encoding. We can investigate and fix it later. @vnkozlov for this PR we are relying on having relocation for COOP base, not on enforcing encoding. And that should be able to handle cases where heap base is different in assembly vs prod. Why do you suggest to not use AOT code when the heap base does not match? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2861154733 From xgong at openjdk.org Thu May 8 01:53:58 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 8 May 2025 01:53:58 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:02:43 GMT, Jatin Bhateja wrote: >> Hi @jatin-bhateja It is feasible. But I was thinking about whether another solution would be better, which is to turn `VectorMask.fromLong(SPECIES, -1L)` into `MaskAll(true)` in the mid-end. In this way, we don't need to check this pattern in this optimization. What do you think ? > > Yes, that's the right approach. For this PR, I think you can mix some test points covering compare, xor(maskAll(true)). Yes, converting `VectorMask.fromLong(SPECIES, -1L)` to `MaskAll()` would be better, and that will benefit AArch64 as well, since `MaskAll()` is much more cheaper than `fromLong()` on AArch64. We can add such a transformation with another PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2078760074 From kvn at openjdk.org Thu May 8 02:17:00 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 8 May 2025 02:17:00 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 21:13:24 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - Fix win64 compile failures > > Signed-off-by: Ashutosh Mehra > - Fix AOTCodeFlags.java test > > Signed-off-by: Ashutosh Mehra > - Fix compile failure in minimal config > > Signed-off-by: Ashutosh Mehra > - Revert back changes that added AOTRuntimeConstants. > Ensure CompressedOops::base and CompressedKlssPointers::base does not > change in production run > > Signed-off-by: Ashutosh Mehra > - Fix merge conflicts > > Signed-off-by: Ashutosh Mehra > - Store/load AsmRemarks and DbgStrings in aot code cache > > Signed-off-by: Ashutosh Mehra > - Add missing external address in aarch64 > > Signed-off-by: Ashutosh Mehra > - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab To support that you would need to modify x86_64.ad to exclude r12 from usage as 0 as we discussed. I think that using your original changes to not use AOT code with mismatched coop base is simpler for these changes and more robust for mainline. I concern that we may missing some places which will cause issues later. And we near JDK 25 fork. We can do more changes/improvements later in JDK 26. One suggestion I have is to skip using only new runtime blobs you are adding when bases do not match because adapters don't have this issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2861272270 From vlivanov at openjdk.org Thu May 8 02:40:39 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 8 May 2025 02:40:39 GMT Subject: RFR: 8356453: C2: assert(!vbox->is_Phi()) during vector box expansion Message-ID: Some Vector API tests fail with an assert during `PhaseVector::expand_vbox_node()`. The assert itself is the culprit, since it doesn't cover the case when VBox node is already expanded. Proposed fix adjusts the assert. Also, extended `PhaseVector::optimize_vector_boxes()` with `StressMacroExpansion` support. Testing: hs-tier1 - hs-tier5 ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/25110/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25110&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356453 Stats: 11 lines in 2 files changed: 9 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25110.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25110/head:pull/25110 PR: https://git.openjdk.org/jdk/pull/25110 From amitkumar at openjdk.org Thu May 8 05:08:56 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 8 May 2025 05:08:56 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 06:09:25 GMT, Amit Kumar wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > improved mvc implementation Hi Martin, So what will be next step here ? Should I put this question in community mailing list ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2861785718 From thartmann at openjdk.org Thu May 8 05:37:51 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 May 2025 05:37:51 GMT Subject: RFR: 8356453: C2: assert(!vbox->is_Phi()) during vector box expansion In-Reply-To: References: Message-ID: On Thu, 8 May 2025 01:48:17 GMT, Vladimir Ivanov wrote: > Some Vector API tests fail with an assert during `PhaseVector::expand_vbox_node()`. The assert itself is the culprit, since it doesn't cover the case when VBox node is already expanded. > > Proposed fix adjusts the assert. > > Also, extended `PhaseVector::optimize_vector_boxes()` with `StressMacroExpansion` support. > > Testing: hs-tier1 - hs-tier5 Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25110#pullrequestreview-2823859522 From thartmann at openjdk.org Thu May 8 06:54:02 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 May 2025 06:54:02 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function [v2] In-Reply-To: References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> Message-ID: On Wed, 7 May 2025 08:27:54 GMT, kuaiwei wrote: >> I wrote a test to check if every C2 IR node has correct size_of() function. And I found some of them are missed. They added new fields and not add size_of() to reflect new size. In linux, it does not cause issue so far, because gcc allocate more space for alignment and can keep these additional `bool` flags. But it will report failure on windows. And if anyone modified base class, it will cause problem. >> >> PS, My test is in https://github.com/openjdk/jdk/compare/master...kuaiwei:jdk:test/check_node_size , but it has many hack on IR nodes to make test to run. > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Add missing size_of() in machnode.hpp src/hotspot/share/opto/intrinsicnode.hpp line 201: > 199: virtual Node* Ideal(PhaseGVN* phase, bool can_reshape); > 200: virtual const Type* Value(PhaseGVN* phase) const; > 201: virtual uint size_of() const { return sizeof(EncodeISOArrayNode); } Shouldn't nodes that define fields also override `Node::cmp` and `Node::hash`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25081#discussion_r2079015217 From thartmann at openjdk.org Thu May 8 06:54:02 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 May 2025 06:54:02 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function [v2] In-Reply-To: References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> Message-ID: On Thu, 8 May 2025 06:47:14 GMT, Tobias Hartmann wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Add missing size_of() in machnode.hpp > > src/hotspot/share/opto/intrinsicnode.hpp line 201: > >> 199: virtual Node* Ideal(PhaseGVN* phase, bool can_reshape); >> 200: virtual const Type* Value(PhaseGVN* phase) const; >> 201: virtual uint size_of() const { return sizeof(EncodeISOArrayNode); } > > Shouldn't nodes that define fields also override `Node::cmp` and `Node::hash`? Ah, I see that Christian already asked that :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25081#discussion_r2079020465 From adinn at openjdk.org Thu May 8 07:47:59 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 8 May 2025 07:47:59 GMT Subject: RFR: 8356085: AArch64: compiler stub buffer size wrongly depends on ZGC In-Reply-To: References: Message-ID: On Wed, 7 May 2025 13:33:11 GMT, Andrew Dinn wrote: > This patch merges the ZGC-specific component of the compiler stubs buffer size configuration into the default size. The stubs are actually independent of ZGC but the extra space is depended on by normal builds that include ZGC which means that cross-compile builds which exclude ZGC are failing. Now the space is the same in either case. Thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/25094#issuecomment-2862090921 From adinn at openjdk.org Thu May 8 07:47:59 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 8 May 2025 07:47:59 GMT Subject: Integrated: 8356085: AArch64: compiler stub buffer size wrongly depends on ZGC In-Reply-To: References: Message-ID: On Wed, 7 May 2025 13:33:11 GMT, Andrew Dinn wrote: > This patch merges the ZGC-specific component of the compiler stubs buffer size configuration into the default size. The stubs are actually independent of ZGC but the extra space is depended on by normal builds that include ZGC which means that cross-compile builds which exclude ZGC are failing. Now the space is the same in either case. This pull request has now been integrated. Changeset: daf6fa1e Author: Andrew Dinn URL: https://git.openjdk.org/jdk/commit/daf6fa1e6153d3fdf48ef0840790794e57349c38 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8356085: AArch64: compiler stub buffer size wrongly depends on ZGC Reviewed-by: shade, kvn ------------- PR: https://git.openjdk.org/jdk/pull/25094 From adinn at openjdk.org Thu May 8 08:11:54 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 8 May 2025 08:11:54 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Thu, 8 May 2025 02:14:41 GMT, Vladimir Kozlov wrote: > One suggestion I have is to skip using only new runtime blobs you are adding when bases do not match because adapters don't have this issue. c2i adapters do suffer this issue. They call out to the barrier set to plant a c2i entry barrier. The barrier performs a weak reference load through field `ClassLoaderData::holder` (type `WeakHandle`). Method `resolve_weak_handle` calls `decode_heap_oop`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2862157506 From chagedorn at openjdk.org Thu May 8 08:18:58 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 8 May 2025 08:18:58 GMT Subject: RFR: 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list [v4] In-Reply-To: References: Message-ID: <8nLXv4E9uO3iolWrosCOU1dXhFjkW3OhHd3qlwRVnl0=.37f9054a-dabe-45e8-bcbb-f16995444628@github.com> On Wed, 7 May 2025 11:21:45 GMT, Saranya Natarajan wrote: >> Issue: The assertion failure , `assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list`, occurs when [loop striping mining ](https://bugs.openjdk.org/browse/JDK-8186027)may create a [MaxL](https://bugs.openjdk.org/browse/JDK-8324655) after macro expansion. >> >> Analysis : Before the macro nodes are expanded in` expand_macro_nodes`, there is a process where nodes from the macro list are eliminated. This also includes elimination of any `OuterStripMinedLoop` node in the macro list. The bug occurs due to the refining of the strip mined loop in `adjust_strip_mined_loop` function just before it is eliminated. In this case, a` MaxL` node is added to the macro list in `adjust_strip_mined_loop`. >> >> Fix: The fix involves performing the refining of the strip mined loop before elimination process. More specifically, moving the `adjust_strip_mined_loop` function outside the elimination loop. >> >> Improvement: The process of eliminating macro nodes by calling `eliminate_macro_nodes` and performing additional Opaque and LoopLimit nodes elimination in ` expand_macro_nodes` is unintuitive as suggested in [JDK-8325478 ](https://bugs.openjdk.org/browse/JDK-8325478) and the current fix should be moved along with the other elimination code. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review comments Two more minor things but then it's good to go from my side :-) src/hotspot/share/opto/macro.cpp line 2353: > 2351: } > 2352: > 2353: //------------------refine_strip_mined_loop_macro_node------------------- Those headers were used in legacy code and you still see quite some of them around. But nowadays, we usually remove them when touching these methods or at least we don't add them for new methods. Suggestion: src/hotspot/share/opto/macro.cpp line 2471: > 2469: // Returns true if a failure occurred. > 2470: bool PhaseMacroExpand::expand_macro_nodes() { > 2471: // Perform refining of strip mined loop before expanding macro nodes. You can probably now remove this comment since the method name is expressive enough. Suggestion: ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24890#pullrequestreview-2824220024 PR Review Comment: https://git.openjdk.org/jdk/pull/24890#discussion_r2079148056 PR Review Comment: https://git.openjdk.org/jdk/pull/24890#discussion_r2079149958 From duke at openjdk.org Thu May 8 08:23:54 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Thu, 8 May 2025 08:23:54 GMT Subject: RFR: 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list [v4] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:21:45 GMT, Saranya Natarajan wrote: >> Issue: The assertion failure , `assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list`, occurs when [loop striping mining ](https://bugs.openjdk.org/browse/JDK-8186027)may create a [MaxL](https://bugs.openjdk.org/browse/JDK-8324655) after macro expansion. >> >> Analysis : Before the macro nodes are expanded in` expand_macro_nodes`, there is a process where nodes from the macro list are eliminated. This also includes elimination of any `OuterStripMinedLoop` node in the macro list. The bug occurs due to the refining of the strip mined loop in `adjust_strip_mined_loop` function just before it is eliminated. In this case, a` MaxL` node is added to the macro list in `adjust_strip_mined_loop`. >> >> Fix: The fix involves performing the refining of the strip mined loop before elimination process. More specifically, moving the `adjust_strip_mined_loop` function outside the elimination loop. >> >> Improvement: The process of eliminating macro nodes by calling `eliminate_macro_nodes` and performing additional Opaque and LoopLimit nodes elimination in ` expand_macro_nodes` is unintuitive as suggested in [JDK-8325478 ](https://bugs.openjdk.org/browse/JDK-8325478) and the current fix should be moved along with the other elimination code. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review comments Thank you for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24890#issuecomment-2862183220 From duke at openjdk.org Thu May 8 08:23:55 2025 From: duke at openjdk.org (duke) Date: Thu, 8 May 2025 08:23:55 GMT Subject: RFR: 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list [v4] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:21:45 GMT, Saranya Natarajan wrote: >> Issue: The assertion failure , `assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list`, occurs when [loop striping mining ](https://bugs.openjdk.org/browse/JDK-8186027)may create a [MaxL](https://bugs.openjdk.org/browse/JDK-8324655) after macro expansion. >> >> Analysis : Before the macro nodes are expanded in` expand_macro_nodes`, there is a process where nodes from the macro list are eliminated. This also includes elimination of any `OuterStripMinedLoop` node in the macro list. The bug occurs due to the refining of the strip mined loop in `adjust_strip_mined_loop` function just before it is eliminated. In this case, a` MaxL` node is added to the macro list in `adjust_strip_mined_loop`. >> >> Fix: The fix involves performing the refining of the strip mined loop before elimination process. More specifically, moving the `adjust_strip_mined_loop` function outside the elimination loop. >> >> Improvement: The process of eliminating macro nodes by calling `eliminate_macro_nodes` and performing additional Opaque and LoopLimit nodes elimination in ` expand_macro_nodes` is unintuitive as suggested in [JDK-8325478 ](https://bugs.openjdk.org/browse/JDK-8325478) and the current fix should be moved along with the other elimination code. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > addressing review comments @sarannat Your change (at version e8bcbc9b07ff2310d747e112b692910158bf5432) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24890#issuecomment-2862187928 From chagedorn at openjdk.org Thu May 8 08:39:52 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 8 May 2025 08:39:52 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function In-Reply-To: References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> Message-ID: On Wed, 7 May 2025 07:24:30 GMT, Christian Hagedorn wrote: >> I wrote a test to check if every C2 IR node has correct size_of() function. And I found some of them are missed. They added new fields and not add size_of() to reflect new size. In linux, it does not cause issue so far, because gcc allocate more space for alignment and can keep these additional `bool` flags. But it will report failure on windows. And if anyone modified base class, it will cause problem. >> >> PS, My test is in https://github.com/openjdk/jdk/compare/master...kuaiwei:jdk:test/check_node_size , but it has many hack on IR nodes to make test to run. > > Good catch! It would currently only be a problem when we clone nodes which is probably hard to check statically (could, for example, be part of a loop body and then be cloned). > > Some questions: > - Have you also checked the Mach nodes? > - Have you also checked that `cmp()` is overridden in case `hash()` is not `NO_HASH` for those nodes that specify at least one field? > > Just a side node, you can also just use `sizeof(*this)` which is often done in the code. > @chhagedorn I checked `machnode.hpp` manually and found some of them still miss `size_of()` . I added them in new patch. Thanks. Nice! > I checked node list in share/opto/classes.hpp, so MachNode/MachNullCheckNode/MachProjNode are checked. For mach nodes created by adlc, I found adlc will always add size_of function. Thanks for checking that. > I haven't checked cmp() and hash() , I will check if my test can cover these. Sounds good, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25081#issuecomment-2862231039 From syan at openjdk.org Thu May 8 09:16:52 2025 From: syan at openjdk.org (SendaoYan) Date: Thu, 8 May 2025 09:16:52 GMT Subject: RFR: 8356453: C2: assert(!vbox->is_Phi()) during vector box expansion In-Reply-To: References: Message-ID: On Thu, 8 May 2025 01:48:17 GMT, Vladimir Ivanov wrote: > Some Vector API tests fail with an assert during `PhaseVector::expand_vbox_node()`. The assert itself is the culprit, since it doesn't cover the case when VBox node is already expanded. > > Proposed fix adjusts the assert. > > Also, extended `PhaseVector::optimize_vector_boxes()` with `StressMacroExpansion` support. > > Testing: hs-tier1 - hs-tier5 Changes requested by syan (Committer). test/hotspot/jtreg/compiler/vectorapi/VectorBoxExpandTest.java line 44: > 42: private static int[] iarr = new int[ARR_LEN]; > 43: private static IntVector g; > 44: private static int acc = 0; Maybe we should update the copyright year. ------------- PR Review: https://git.openjdk.org/jdk/pull/25110#pullrequestreview-2824388298 PR Review Comment: https://git.openjdk.org/jdk/pull/25110#discussion_r2079251986 From epeter at openjdk.org Thu May 8 09:42:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 May 2025 09:42:55 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v6] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 08:55:30 GMT, Hannes Greule wrote: >> This change implements constant folding for ReverseBytes nodes. >> >> Currently, `byteswap` is included transitively by `reverse_bits.hpp`. I'm not sure if this is fine or if I need to add an explicit include there. >> >> I appreciate any reviews and comments. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > correct driver path It looks good to me. But before you integrate: I also launched testing again, just in case. Please ping me again in 24h for the results :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24382#pullrequestreview-2824490373 From duke at openjdk.org Thu May 8 09:47:11 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Thu, 8 May 2025 09:47:11 GMT Subject: RFR: 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list [v5] In-Reply-To: References: Message-ID: > Issue: The assertion failure , `assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list`, occurs when [loop striping mining ](https://bugs.openjdk.org/browse/JDK-8186027)may create a [MaxL](https://bugs.openjdk.org/browse/JDK-8324655) after macro expansion. > > Analysis : Before the macro nodes are expanded in` expand_macro_nodes`, there is a process where nodes from the macro list are eliminated. This also includes elimination of any `OuterStripMinedLoop` node in the macro list. The bug occurs due to the refining of the strip mined loop in `adjust_strip_mined_loop` function just before it is eliminated. In this case, a` MaxL` node is added to the macro list in `adjust_strip_mined_loop`. > > Fix: The fix involves performing the refining of the strip mined loop before elimination process. More specifically, moving the `adjust_strip_mined_loop` function outside the elimination loop. > > Improvement: The process of eliminating macro nodes by calling `eliminate_macro_nodes` and performing additional Opaque and LoopLimit nodes elimination in ` expand_macro_nodes` is unintuitive as suggested in [JDK-8325478 ](https://bugs.openjdk.org/browse/JDK-8325478) and the current fix should be moved along with the other elimination code. Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: removing redundant comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24890/files - new: https://git.openjdk.org/jdk/pull/24890/files/e8bcbc9b..afc2ef7e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24890&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24890&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24890/head:pull/24890 PR: https://git.openjdk.org/jdk/pull/24890 From mdoerr at openjdk.org Thu May 8 09:48:53 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 8 May 2025 09:48:53 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: <60EKzMgxd8YXL1DaKBnEYxGpf-WYhiqOFTzqCvJcYzk=.b923d1ee-8698-44e1-9876-afa708d05cd2@github.com> On Thu, 8 May 2025 05:06:04 GMT, Amit Kumar wrote: > Hi Martin, So what will be next step here ? Should I put this question in community mailing list ? You can try. `java.nio` and FFM API may have some requirements and expectations for such `Unsafe` operations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2862434689 From chagedorn at openjdk.org Thu May 8 11:13:08 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 8 May 2025 11:13:08 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v16] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 13:30:35 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 43 commits: > > - review > - Merge branch 'master' into JDK-8342692 > - merge fix > - Merge branch 'master' into JDK-8342692 > - merge fix > - Merge branch 'master' into JDK-8342692 > - merge > - Merge branch 'master' into JDK-8342692 > - Merge branch 'master' into JDK-8342692 > - whitespace > - ... and 33 more: https://git.openjdk.org/jdk/compare/4458719a...ed774a56 Interesting work and results with the benchmark! I have a few comments. Will have another look again later. src/hotspot/share/opto/castnode.hpp line 146: > 144: bool cmp_used_at_inner_loop_exit_test(Node* cmp); > 145: > 146: bool used_at_inner_loop_exit_test(); I suggest to compress them: Suggestion: bool inner_loop_backedge(Node* proj); bool cmp_used_at_inner_loop_exit_test(Node* cmp); bool used_at_inner_loop_exit_test(); src/hotspot/share/opto/graphKit.cpp line 4055: > 4053: // Will narrow the limit down with a cast node. Predicates added later may depend on the cast so should be last when > 4054: // starting from the loop. > 4055: add_parse_predicate(Deoptimization::Reason_short_running_long_loop, nargs); Same here, we should probably guard this creation with `ShortRunningLongLoop`? src/hotspot/share/opto/loopTransform.cpp line 122: > 120: Node* limit_n = cl->limit(); > 121: if (init_n != nullptr && limit_n != nullptr) { > 122: // Use longs to avoid integer overflow. This comment is outdated now since we now also face long overflows. Can you update it and maybe quickly summarize the tricks we do to do that computation with `jlong` without running into overflows? src/hotspot/share/opto/loopnode.cpp line 1094: > 1092: SafePointNode* cloned_sfpt = old_new[safepoint->_idx]->as_SafePoint(); > 1093: > 1094: add_parse_predicate(Deoptimization::Reason_short_running_long_loop, inner_head, outer_ilt, cloned_sfpt); We should probably guard this creation with `ShortRunningLongLoop`? src/hotspot/share/opto/loopnode.cpp line 1178: > 1176: > 1177: // If bounds are known is the loop doesn't need an outer loop or profile data indicates it runs for less than > 1178: // ShortLoopIter, don't create the outer loop Since `ShortLoopIter` was removed, this should also be updated. Can you also add some more details here as written in the PR description? You could also add some information from the follow-up discussion, for example, why we don't want a ShortLoopIter flag. src/hotspot/share/opto/loopnode.cpp line 1184: > 1182: } > 1183: Node* x = loop->_head; > 1184: BaseCountedLoopNode* head = x->as_BaseCountedLoop(); You can probably merge them together since you don't seem to reuse `x` again: Suggestion: BaseCountedLoopNode* head = loop->_head->as_BaseCountedLoop(); src/hotspot/share/opto/loopnode.cpp line 1195: > 1193: loop->compute_profile_trip_cnt(this); > 1194: if (StressShortRunningLongLoop) { > 1195: profile_short_running_loop = true; Suggestion: profile_short_running_loop = true; src/hotspot/share/opto/loopnode.cpp line 1219: > 1217: const Type* new_phi_t = TypeInt::INT; > 1218: if (profile_short_running_loop) { > 1219: // Add a short_limit predicate. It's the last predicate when coming from the loop because a cast that's control I suggest to be consistent with the names to avoid confusion. We could name this "Short Running Long Loop (Parse) Predicate" to be aligned with `Deoptimization::Reason_short_running_long_loop` and the other suggestion in `predicates.hpp` about the predicate block name. What do you think? src/hotspot/share/opto/loopnode.cpp line 1222: > 1220: // dependent on the short_limit predicate is added to narrow the limit and future predicates may be dependent on the > 1221: // new limit (so have to be between the loop and short_limit predicate). The current limit could, itself, be > 1222: // dependent on an existing predicate. Clone parse predicates below existing predicates to get proper ordering of You should also mention the cloning of Template Assertion Predicates here. src/hotspot/share/opto/loopnode.cpp line 1223: > 1221: // new limit (so have to be between the loop and short_limit predicate). The current limit could, itself, be > 1222: // dependent on an existing predicate. Clone parse predicates below existing predicates to get proper ordering of > 1223: // predicates when coming from the loop: future predicates, short_limit predicate, existing predicates. Maybe be more explicit: Suggestion: // predicates when walking from the loop up: future predicates, short_limit predicate, existing predicates. A visualization might also help here to quickly grasp the structure: // Existing Hoisted // Check Predicates // | // New Short Running Long // Loop Predicate // | // Cloned Parse Predicates and // Template Assertion Predicates // | // Loop src/hotspot/share/opto/loopnode.hpp line 1832: > 1830: Node* ensure_node_and_inputs_are_above_pre_end(CountedLoopEndNode* pre_end, Node* node); > 1831: > 1832: bool short_running_loop(IdealLoopTree* loop, jint stride_con, const Node_List &range_checks, uint iters_limit); Suggestion: bool short_running_loop(IdealLoopTree* loop, jint stride_con, const Node_List& range_checks, uint iters_limit); src/hotspot/share/opto/predicates.hpp line 790: > 788: } > 789: PredicateBlockIterator short_running_loop_predicate_iterator(current_node, Deoptimization::Reason_short_running_long_loop); > 790: return short_running_loop_predicate_iterator.for_each(predicate_visitor); General note: Can you also add a description of this new predicate to the header comment of `predicates.hpp` where we list and explains all the different predicates in C2? I've noticed that the new Auto Vectorization Parse Predicate should also be added there @eme64 (separately). src/hotspot/share/opto/predicates.hpp line 957: > 955: const PredicateBlock _profiled_loop_predicate_block; > 956: const PredicateBlock _loop_predicate_block; > 957: const PredicateBlock _short_running_loop_predicate_block; I suggest to align the `Deoptimization::Reason_short_running_long_loop` name with the block name here: Suggestion: const PredicateBlock _short_running_long_loop_predicate_block; src/hotspot/share/runtime/deoptimization.hpp line 122: > 120: Reason_short_running_long_loop, // profile reports loop runs for small number of iterations > 121: #if INCLUDE_JVMCI > 122: Reason_aliasing = Reason_short_running_long_loop, // optimistic assumption about aliasing failed Why is that required? src/hotspot/share/utilities/globalDefinitions.hpp line 785: > 783: } > 784: > 785: Suggestion: ------------- PR Review: https://git.openjdk.org/jdk/pull/21630#pullrequestreview-2824686810 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2079474327 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2079475375 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2079477325 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2079464801 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2079453537 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2079486077 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2079455041 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2079492232 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2079503060 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2079505062 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2079456614 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2079468192 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2079463198 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2079459794 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2079460274 From chagedorn at openjdk.org Thu May 8 11:13:09 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 8 May 2025 11:13:09 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v16] In-Reply-To: References: Message-ID: On Thu, 8 May 2025 10:58:42 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 43 commits: >> >> - review >> - Merge branch 'master' into JDK-8342692 >> - merge fix >> - Merge branch 'master' into JDK-8342692 >> - merge fix >> - Merge branch 'master' into JDK-8342692 >> - merge >> - Merge branch 'master' into JDK-8342692 >> - Merge branch 'master' into JDK-8342692 >> - whitespace >> - ... and 33 more: https://git.openjdk.org/jdk/compare/4458719a...ed774a56 > > src/hotspot/share/opto/loopnode.cpp line 1219: > >> 1217: const Type* new_phi_t = TypeInt::INT; >> 1218: if (profile_short_running_loop) { >> 1219: // Add a short_limit predicate. It's the last predicate when coming from the loop because a cast that's control > > I suggest to be consistent with the names to avoid confusion. We could name this "Short Running Long Loop (Parse) Predicate" to be aligned with `Deoptimization::Reason_short_running_long_loop` and the other suggestion in `predicates.hpp` about the predicate block name. What do you think? I suggest to switch and say: It's the first predicate in the predicate chain before entering a loop [...] ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2079494161 From epeter at openjdk.org Thu May 8 11:29:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 May 2025 11:29:01 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 6 May 2025 13:28:28 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). @roberto Thanks a lot for taking the time to explain how implicit null checks work, and giving me some background for the PR :) Below, I have mostly code style / naming suggestions, that you are welcome to use as inspiration. But you do not have to apply any of them, it is totally up to you :) I'm definitely not an expert here, but your approach seems reasonable to me. The opt-in annotation `ins_is_late_expanded_null_check_candidate` makes sure we only do the optimization when we are sure it is ok. It is a limitation that we require the first operation to be the memory access. But the alternative would probably be significantly more complicated, i.e. to track the location of all the memory locations. In our offline discussion, I had some hesitation about the case where the load is at the beginning, but the barrier may have more loads. I wondered: what if the first load does not trigger the NullPointerException, but a later load then encounters the null pointer. But I suppose that cannot happen, because the GC only moves the pointer, so if the old pointer was non-null, the new pointer must be non-null as well. Maybe that was so trivial that you did not even understand my question there ? But it could be helpful to write that down somewhere, just to make sure people are aware of this. I think I was also worried that we would re-load the pointer itself. Then the old pointer may be non-null, but once we load the pointer again it may be null because another thread changed the reference. But now I thought about that again: that would really violate the Java Memory Model, you cannot duplicate the load of the pointer. So I suppose rather we got the old pointer from somewhere, and then we check if that old pointer is still valid in the barrier, and if not, we somehow directly translate the old pointer to a new pointer? Is that what the oop map is used for? src/hotspot/cpu/aarch64/gc/z/z_aarch64.ad line 130: > 128: Address::offset_ok_for_immed(ref_addr.offset(), exact_log2(size)), > 129: "an instruction that can be used for implicit null checking should emit the candidate memory access first"); > 130: ref_addr = __ legitimize_address(ref_addr, size, rscratch2); For context: 132 /* Sometimes we get misaligned loads and stores, usually from Unsafe 133 accesses, and these can exceed the offset range. */ 134 Address legitimize_address(const Address &a, int size, Register scratch) { 135 if (a.getMode() == Address::base_plus_offset) { 136 if (! Address::offset_ok_for_immed(a.offset(), exact_log2(size))) { 137 block_comment("legitimize_address {"); 138 lea(scratch, a); 139 block_comment("} legitimize_address"); 140 return Address(scratch); 141 } 142 } 143 return a; 144 } I wonder if it might be worth to create a `legitimize_address_requires_lea` that does the checks. Then you could refactor `legitimize_address` with it, and also use it here. Not sure if it is worth it, but it could ensure that the checks stay in sync. Up to you. src/hotspot/share/opto/block.hpp line 468: > 466: > 467: // If necessary, hoist orphan node n into the end of block b. > 468: void maybe_hoist_into(Node* n, Block* b); Hmm. It is "if necessary" or "if possible"? I wonder if we could come up with a name that is a little longer and expresses this condition? src/hotspot/share/opto/lcm.cpp line 79: > 77: } > 78: > 79: void PhaseCFG::move_into(Node* n, Block* b) { Suggestion: void PhaseCFG::move_node_and_its_projections_to_block(Node* n, Block* b) { src/hotspot/share/opto/lcm.cpp line 89: > 87: if (!out->is_MachProj()) { > 88: continue; > 89: } What about the `MachTemp`? Also: how specific to implicit null checks are your methods `move_into` and `maybe_hoist_into`? If they are not reusable elsewhere, it may be good to give them a more specific name. src/hotspot/share/opto/lcm.cpp line 105: > 103: "need for recursive hoisting not expected"); > 104: move_into(n, b); > 105: } Do I understand this right: You are looking at some input `n` here, and want to make sure that it is located at `b` or before? Suggestion to make it a bit more clear: Suggestion: // We want to ensure that n happens at b or before, i.e. at a block that dominates b. void PhaseCFG::ensure_node_is_at_block_or_before(Node* n, Block* b) { Block* current = get_block_for_node(n); if (current->dominates(b)) { return; // n already happens before b, do nothing. } // We only expect nodes without further inputs, like MachTemp or load Base. assert(n->req() == 0 || (n->req() == 1 && n->in(0) == (Node*)C->root()), "need for recursive hoisting not expected"); assert(b->dominates(current), "precondition: can only move n to b if b dominates n"); move_node_and_its_projections_to_block(n, b); } I did not understand what this meant: `sanity check: temp node placement`... Ah, I suppose we are assuming that `n` is a `MachTemp`, and this would have to be placed in a block dominated by b? But could `n` not also be a `load Base`? Could that be a `MachProj`? Just a little confused here. Maybe moving the `b->dominates(current)` assert down helps give good context? But in a sense, it is also a precondition, we can only move `n` up to `b` if `b` dominates `n`... Do you have a better idea? src/hotspot/share/opto/lcm.cpp line 356: > 354: if (mach->in(j)->is_MachTemp()) { > 355: assert(mach->in(j)->outcnt() == 1, "MachTemp nodes should not be shared"); > 356: // Ignore MachTemp inputs, they can be safely hoisted with the candidate. Suggestion: // Ignore MachTemp inputs, they can be safely hoisted with the candidate. // MachTemp have no inputs themselves and are only there to reserve a scratch // register for the GC barrier of the memory operation. That was what you told me in our offline meeting, I thought it was helpful context information. src/hotspot/share/opto/lcm.cpp line 428: > 426: maybe_hoist_into(val->in(i), block); > 427: } > 428: move_into(val, block); Suggestion: // Inputs of val may already be early enough, but if not move them together with val. ensure_node_is_at_block_or_before(val->in(i), block); } move_node_and_its_projections_to_block(val, block); src/hotspot/share/opto/lcm.cpp line 437: > 435: if (n == nullptr || !n->is_MachTemp()) { > 436: continue; > 437: } Do you want to check that all other nodes already dominate `block`? src/hotspot/share/opto/lcm.cpp line 439: > 437: } > 438: maybe_hoist_into(n, block); > 439: } It seems to me this is definitely new code, ensuring that we move the `MachTemp`. We did not do that before, at least not here. Correct? src/hotspot/share/opto/lcm.cpp line 441: > 439: map_node_to_block(n, block); > 440: } > 441: } This now happens in `move_into`, right? src/hotspot/share/opto/machnode.hpp line 391: > 389: > 390: // Whether this node is expanded during code emission into a sequence of > 391: // instructions and the first instruction can perform an implicit null check. You may want to put a warning / reasoning here, in case there are multiple loads. You explained to me offline that a `zLoadP` may have a load at the beginning, but then need to load again if the GC moved the object. I suppose if it was moved, then it cannot be null, and so that should be safe... maybe that is a sufficient argument, what do you think? test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 51: > 49: * @requires vm.gc.Z > 50: * @run driver compiler.gcbarriers.TestImplicitNullChecks Z > 51: */ Do you think there would be any value in having a run without requirements? Just for general result verification... i.e. that we get the correct NullPointerException. Of course, you would have to probably add `applyIf` to the `@IR` rules. test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 119: > 117: testLoad(o); > 118: } catch (NullPointerException e) { nullPointerException = true; } > 119: Asserts.assertTrue(nullPointerException); Suggestion: try { testLoad(o); throw new RuntimeException("Should have thrown NullPointerException"); } catch (NullPointerException e) { /* expected */} Could be a shorter alternative. Up to you. Maybe there is a benefit to `Asserts.assertTrue` I am also not aware of? But totally optional, as your approach works anyway :) test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 140: > 138: // G1 and ZGC stores cannot be currently used to implement implicit null > 139: // checks, because they expand into multiple memory access instructions that > 140: // are not necessarily located at the initial instruction start address. Very random idea, no idea if it is any good: Why not do the implicit null-check with a fake Load? No idea on the implications here. I suppose it would be extra code, but at least not branching code? ------------- PR Review: https://git.openjdk.org/jdk/pull/25066#pullrequestreview-2824535603 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079357655 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079437197 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079476518 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079430920 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079473986 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079420601 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079480978 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079486097 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079509053 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079488019 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079491319 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079493683 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079500275 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079505342 From epeter at openjdk.org Thu May 8 11:29:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 May 2025 11:29:02 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 8 May 2025 10:21:14 GMT, Emanuel Peter wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > src/hotspot/share/opto/block.hpp line 468: > >> 466: >> 467: // If necessary, hoist orphan node n into the end of block b. >> 468: void maybe_hoist_into(Node* n, Block* b); > > Hmm. It is "if necessary" or "if possible"? > I wonder if we could come up with a name that is a little longer and expresses this condition? Ah no, I'm starting to understand that it is rather a `if necessary`... > src/hotspot/share/opto/lcm.cpp line 428: > >> 426: maybe_hoist_into(val->in(i), block); >> 427: } >> 428: move_into(val, block); > > Suggestion: > > // Inputs of val may already be early enough, but if not move them together with val. > ensure_node_is_at_block_or_before(val->in(i), block); > } > move_node_and_its_projections_to_block(val, block); It's a little hard to see here: did you just refactor this code, or make any changes? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079450181 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079507708 From epeter at openjdk.org Thu May 8 11:29:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 May 2025 11:29:02 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 8 May 2025 10:29:17 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/block.hpp line 468: >> >>> 466: >>> 467: // If necessary, hoist orphan node n into the end of block b. >>> 468: void maybe_hoist_into(Node* n, Block* b); >> >> Hmm. It is "if necessary" or "if possible"? >> I wonder if we could come up with a name that is a little longer and expresses this condition? > > Ah no, I'm starting to understand that it is rather a `if necessary`... See further comments at `maybe_hoist_into` and my suggestions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079512983 From thartmann at openjdk.org Thu May 8 11:29:53 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 May 2025 11:29:53 GMT Subject: RFR: 8355674: C2: Partial Peeling should not introduce Phi nodes above OpaqueInitializedAssertionPredicate nodes [v2] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 17:21:55 GMT, Christian Hagedorn wrote: >> In the test case, we have two Initialized Assertion Predicate that share the same `Bool` which is perfectly fine: >> >> ![image](https://github.com/user-attachments/assets/a7a1673f-b5df-49e8-8a22-c35aa0ee1693) >> >> These Initialized Assertion Predicates were created for loops that have been folded away. They then end up in a new inner most loop which is partial peeled. Partial Peeling finds that we need to do the cut between `580 IfTrue` and `581 If`. This means, that the Initialized Assertion Predicate `569 RangeCheck` with its `535 OpaqueInitializedAssertionPredicate` is in the peel set and the second Initialized Assertion Predicate `504 RangeCheck` with its `503 OpaqueInitializedAssertionPredicate` is in the not peel set. As a result of that, we are introducing a `Phi` node between an `OpaqueInitializedAssertionPredicate` and a `Bool` node: >> >> ![image](https://github.com/user-attachments/assets/3bbc2b88-300a-4c40-99ac-056cbeab822a) >> >> We eventually remove the `OpaqueInitializedAssertionPredicate` and are left with the following graph shape >> >> ![image](https://github.com/user-attachments/assets/e839366f-119b-4411-b422-84639c68aa80) >> >> which cannot be handled by the backend. >> >> The fix I propose is to prohibit Partial Peeling from inserting such a `Phi` node by updating `clone_for_special_use_inside_loop()` which takes care of not inserting phis for an `If/Bool`. We need to also special case `OpaqueInitializedAssertionPredicate`. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > add run with Xbatch Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25006#pullrequestreview-2824820288 From chagedorn at openjdk.org Thu May 8 11:36:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 8 May 2025 11:36:07 GMT Subject: RFR: 8355674: C2: Partial Peeling should not introduce Phi nodes above OpaqueInitializedAssertionPredicate nodes [v2] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 17:21:55 GMT, Christian Hagedorn wrote: >> In the test case, we have two Initialized Assertion Predicate that share the same `Bool` which is perfectly fine: >> >> ![image](https://github.com/user-attachments/assets/a7a1673f-b5df-49e8-8a22-c35aa0ee1693) >> >> These Initialized Assertion Predicates were created for loops that have been folded away. They then end up in a new inner most loop which is partial peeled. Partial Peeling finds that we need to do the cut between `580 IfTrue` and `581 If`. This means, that the Initialized Assertion Predicate `569 RangeCheck` with its `535 OpaqueInitializedAssertionPredicate` is in the peel set and the second Initialized Assertion Predicate `504 RangeCheck` with its `503 OpaqueInitializedAssertionPredicate` is in the not peel set. As a result of that, we are introducing a `Phi` node between an `OpaqueInitializedAssertionPredicate` and a `Bool` node: >> >> ![image](https://github.com/user-attachments/assets/3bbc2b88-300a-4c40-99ac-056cbeab822a) >> >> We eventually remove the `OpaqueInitializedAssertionPredicate` and are left with the following graph shape >> >> ![image](https://github.com/user-attachments/assets/e839366f-119b-4411-b422-84639c68aa80) >> >> which cannot be handled by the backend. >> >> The fix I propose is to prohibit Partial Peeling from inserting such a `Phi` node by updating `clone_for_special_use_inside_loop()` which takes care of not inserting phis for an `If/Bool`. We need to also special case `OpaqueInitializedAssertionPredicate`. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > add run with Xbatch Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25006#issuecomment-2862721208 From chagedorn at openjdk.org Thu May 8 11:36:08 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 8 May 2025 11:36:08 GMT Subject: Integrated: 8355674: C2: Partial Peeling should not introduce Phi nodes above OpaqueInitializedAssertionPredicate nodes In-Reply-To: References: Message-ID: On Fri, 2 May 2025 13:44:10 GMT, Christian Hagedorn wrote: > In the test case, we have two Initialized Assertion Predicate that share the same `Bool` which is perfectly fine: > > ![image](https://github.com/user-attachments/assets/a7a1673f-b5df-49e8-8a22-c35aa0ee1693) > > These Initialized Assertion Predicates were created for loops that have been folded away. They then end up in a new inner most loop which is partial peeled. Partial Peeling finds that we need to do the cut between `580 IfTrue` and `581 If`. This means, that the Initialized Assertion Predicate `569 RangeCheck` with its `535 OpaqueInitializedAssertionPredicate` is in the peel set and the second Initialized Assertion Predicate `504 RangeCheck` with its `503 OpaqueInitializedAssertionPredicate` is in the not peel set. As a result of that, we are introducing a `Phi` node between an `OpaqueInitializedAssertionPredicate` and a `Bool` node: > > ![image](https://github.com/user-attachments/assets/3bbc2b88-300a-4c40-99ac-056cbeab822a) > > We eventually remove the `OpaqueInitializedAssertionPredicate` and are left with the following graph shape > > ![image](https://github.com/user-attachments/assets/e839366f-119b-4411-b422-84639c68aa80) > > which cannot be handled by the backend. > > The fix I propose is to prohibit Partial Peeling from inserting such a `Phi` node by updating `clone_for_special_use_inside_loop()` which takes care of not inserting phis for an `If/Bool`. We need to also special case `OpaqueInitializedAssertionPredicate`. > > Thanks, > Christian This pull request has now been integrated. Changeset: b47b2062 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/b47b2062a2232694eb01473054a468ad9a6a2507 Stats: 59 lines in 2 files changed: 58 ins; 0 del; 1 mod 8355674: C2: Partial Peeling should not introduce Phi nodes above OpaqueInitializedAssertionPredicate nodes Reviewed-by: epeter, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/25006 From chagedorn at openjdk.org Thu May 8 11:37:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 8 May 2025 11:37:59 GMT Subject: Integrated: 8356084: C2: Data is wrongly rewired to Initialized Assertion Predicates instead of Template Assertion Predicates In-Reply-To: <9OvLoKhN1DbjqW9IpXAeQ-Xt-MwyBZXrseoIyXmjNqo=.eed50620-9c5d-4f2e-876c-c51e42fd8c2e@github.com> References: <9OvLoKhN1DbjqW9IpXAeQ-Xt-MwyBZXrseoIyXmjNqo=.eed50620-9c5d-4f2e-876c-c51e42fd8c2e@github.com> Message-ID: On Fri, 2 May 2025 14:14:26 GMT, Christian Hagedorn wrote: > Before the Assertion Predicate refactorings, we rewired data dependencies either to the newly created Initialized Assertion Predicates (for Loop Peeling) or to the zero trip guard (for main and post loops). Both was incomplete when we further split a loop - we missed to update these data dependencies accordingly. > > Now that the (almost) complete Assertion Predicate fix is in with [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577), we are now finally able to fix this by always rewiring the data dependencies to the Template Assertion Predicates which will be kept until either no more loop splitting can be done for a loop or until loop opts are over. > > We could have already fixed that with JDK-8350577 but it was simply missed. As an intermediate solution, we always rewired the data dependencies to the Initialized Assertion Predicates which only worked in some cases when the Initialized Assertion Predicates were folded away: They ended up at the Template Assertion Predicates above and from there we could update the data dependencies further. But if that did not happen, we could not find these data dependencies at the Template Assertion Predicates and failed to further update them when the loop was split again. As a result, we could perform some loads too early and crash (not observable, though). > > How we could end up with such a crash is described in the newly added regression test `testPeelingThreeTimesDataUpdate()`. Here is a snippet from the graph after applying Loop Peeling several times without the patch: > > ![image](https://github.com/user-attachments/assets/c40f5918-3ef4-4c4b-ab7d-dc4fdbf41fdf) > > All `LoadN` data dependencies are piled up at an Initialized Assertion Predicate from where we can no longer update them in further loop splitting optimizations because we only look at Template Assertion Predicates for that. By correctly rewiring the data dependencies to Template Assertion Predicates, we fix this which is proposed with this patch. > > This was found by a new stress peeling mode ([JDK-8355488](https://bugs.openjdk.org/browse/JDK-8355488)) @marc-chevalier > is currently working on. I was able to come up with a reproducer that does not use the new stressing but it shows that the new stressing is useful in finding hard to discover bugs. > > Thanks, > Christian This pull request has now been integrated. Changeset: ad07426f Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/ad07426fab3396caefd7c08d924e085c1f6f61ba Stats: 94 lines in 3 files changed: 79 ins; 1 del; 14 mod 8356084: C2: Data is wrongly rewired to Initialized Assertion Predicates instead of Template Assertion Predicates Reviewed-by: epeter, kvn ------------- PR: https://git.openjdk.org/jdk/pull/25007 From thartmann at openjdk.org Thu May 8 11:48:58 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 May 2025 11:48:58 GMT Subject: RFR: 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list [v5] In-Reply-To: References: Message-ID: On Thu, 8 May 2025 09:47:11 GMT, Saranya Natarajan wrote: >> Issue: The assertion failure , `assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list`, occurs when [loop striping mining ](https://bugs.openjdk.org/browse/JDK-8186027)may create a [MaxL](https://bugs.openjdk.org/browse/JDK-8324655) after macro expansion. >> >> Analysis : Before the macro nodes are expanded in` expand_macro_nodes`, there is a process where nodes from the macro list are eliminated. This also includes elimination of any `OuterStripMinedLoop` node in the macro list. The bug occurs due to the refining of the strip mined loop in `adjust_strip_mined_loop` function just before it is eliminated. In this case, a` MaxL` node is added to the macro list in `adjust_strip_mined_loop`. >> >> Fix: The fix involves performing the refining of the strip mined loop before elimination process. More specifically, moving the `adjust_strip_mined_loop` function outside the elimination loop. >> >> Improvement: The process of eliminating macro nodes by calling `eliminate_macro_nodes` and performing additional Opaque and LoopLimit nodes elimination in ` expand_macro_nodes` is unintuitive as suggested in [JDK-8325478 ](https://bugs.openjdk.org/browse/JDK-8325478) and the current fix should be moved along with the other elimination code. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > removing redundant comment Looks good to me otherwise. src/hotspot/share/opto/macro.cpp line 2354: > 2352: > 2353: //------------------refine_strip_mined_loop_macro_node------------------- > 2354: // Perform refining of strip mined loop node in the macro nodes list. Suggestion: // Perform refining of strip mined loop nodes in the macro nodes list. src/hotspot/share/opto/macro.cpp line 2355: > 2353: //------------------refine_strip_mined_loop_macro_node------------------- > 2354: // Perform refining of strip mined loop node in the macro nodes list. > 2355: void PhaseMacroExpand::refine_strip_mined_loop_macro_node() { Suggestion: void PhaseMacroExpand::refine_strip_mined_loop_macro_nodes() { src/hotspot/share/opto/macro.cpp line 2471: > 2469: // Returns true if a failure occurred. > 2470: bool PhaseMacroExpand::expand_macro_nodes() { > 2471: refine_strip_mined_loop_macro_node(); Suggestion: refine_strip_mined_loop_macro_nodes(); src/hotspot/share/opto/macro.hpp line 206: > 204: } > 205: > 206: void refine_strip_mined_loop_macro_node(); Suggestion: void refine_strip_mined_loop_macro_nodes(); ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24890#pullrequestreview-2824859469 PR Review Comment: https://git.openjdk.org/jdk/pull/24890#discussion_r2079556192 PR Review Comment: https://git.openjdk.org/jdk/pull/24890#discussion_r2079556743 PR Review Comment: https://git.openjdk.org/jdk/pull/24890#discussion_r2079556616 PR Review Comment: https://git.openjdk.org/jdk/pull/24890#discussion_r2079557080 From thartmann at openjdk.org Thu May 8 11:49:00 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 May 2025 11:49:00 GMT Subject: RFR: 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list [v4] In-Reply-To: <8nLXv4E9uO3iolWrosCOU1dXhFjkW3OhHd3qlwRVnl0=.37f9054a-dabe-45e8-bcbb-f16995444628@github.com> References: <8nLXv4E9uO3iolWrosCOU1dXhFjkW3OhHd3qlwRVnl0=.37f9054a-dabe-45e8-bcbb-f16995444628@github.com> Message-ID: On Thu, 8 May 2025 08:13:46 GMT, Christian Hagedorn wrote: >> Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: >> >> addressing review comments > > src/hotspot/share/opto/macro.cpp line 2353: > >> 2351: } >> 2352: >> 2353: //------------------refine_strip_mined_loop_macro_node------------------- > > Those headers were used in legacy code and you still see quite some of them around. But nowadays, we usually remove them when touching these methods or at least we don't add them for new methods. > > Suggestion: I agree, please remove this header. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24890#discussion_r2079555444 From duke at openjdk.org Thu May 8 11:57:22 2025 From: duke at openjdk.org (kuaiwei) Date: Thu, 8 May 2025 11:57:22 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function [v3] In-Reply-To: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> Message-ID: > I wrote a test to check if every C2 IR node has correct size_of() function. And I found some of them are missed. They added new fields and not add size_of() to reflect new size. In linux, it does not cause issue so far, because gcc allocate more space for alignment and can keep these additional `bool` flags. But it will report failure on windows. And if anyone modified base class, it will cause problem. > > PS, My test is in https://github.com/openjdk/jdk/compare/master...kuaiwei:jdk:test/check_node_size , but it has many hack on IR nodes to make test to run. kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Add cmp()/hash() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25081/files - new: https://git.openjdk.org/jdk/pull/25081/files/1eb11ad0..9356c51d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25081&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25081&range=01-02 Stats: 14 lines in 3 files changed: 14 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25081.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25081/head:pull/25081 PR: https://git.openjdk.org/jdk/pull/25081 From duke at openjdk.org Thu May 8 12:04:52 2025 From: duke at openjdk.org (kuaiwei) Date: Thu, 8 May 2025 12:04:52 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function [v2] In-Reply-To: References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> Message-ID: On Thu, 8 May 2025 06:51:08 GMT, Tobias Hartmann wrote: >> src/hotspot/share/opto/intrinsicnode.hpp line 201: >> >>> 199: virtual Node* Ideal(PhaseGVN* phase, bool can_reshape); >>> 200: virtual const Type* Value(PhaseGVN* phase) const; >>> 201: virtual uint size_of() const { return sizeof(EncodeISOArrayNode); } >> >> Shouldn't nodes that define fields also override `Node::cmp` and `Node::hash`? > > Ah, I see that Christian already asked that :) I checked some nodes. `cmp/hash` are not always updated for new fields. It looks "nice to have". I have added `cmp/hash` for ?EncodeISOArrayNode/ClearArrayNode/OpaqueMultiversioningNode" . ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25081#discussion_r2079582101 From thartmann at openjdk.org Thu May 8 12:17:57 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 May 2025 12:17:57 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: <7XtX737NV9bjyQWKxZK0rjNzQ1ye2IpbsuWTtI8Rh1s=.7e6bb289-50a1-45e2-906a-44348848a281@github.com> On Tue, 6 May 2025 10:21:54 GMT, Jatin Bhateja wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding All tests passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24919#issuecomment-2862849381 From thartmann at openjdk.org Thu May 8 12:19:51 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 8 May 2025 12:19:51 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function [v2] In-Reply-To: References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> Message-ID: On Thu, 8 May 2025 12:01:58 GMT, kuaiwei wrote: >> Ah, I see that Christian already asked that :) > > I checked some nodes. `cmp/hash` are not always updated for new fields. It looks "nice to have". I have added `cmp/hash` for ?EncodeISOArrayNode/ClearArrayNode/OpaqueMultiversioningNode" . Thanks for doing that but it's not only nice to have, right? GVN might otherwise incorrectly common two different nodes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25081#discussion_r2079603397 From duke at openjdk.org Thu May 8 12:25:11 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Thu, 8 May 2025 12:25:11 GMT Subject: RFR: 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list [v6] In-Reply-To: References: Message-ID: > Issue: The assertion failure , `assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list`, occurs when [loop striping mining ](https://bugs.openjdk.org/browse/JDK-8186027)may create a [MaxL](https://bugs.openjdk.org/browse/JDK-8324655) after macro expansion. > > Analysis : Before the macro nodes are expanded in` expand_macro_nodes`, there is a process where nodes from the macro list are eliminated. This also includes elimination of any `OuterStripMinedLoop` node in the macro list. The bug occurs due to the refining of the strip mined loop in `adjust_strip_mined_loop` function just before it is eliminated. In this case, a` MaxL` node is added to the macro list in `adjust_strip_mined_loop`. > > Fix: The fix involves performing the refining of the strip mined loop before elimination process. More specifically, moving the `adjust_strip_mined_loop` function outside the elimination loop. > > Improvement: The process of eliminating macro nodes by calling `eliminate_macro_nodes` and performing additional Opaque and LoopLimit nodes elimination in ` expand_macro_nodes` is unintuitive as suggested in [JDK-8325478 ](https://bugs.openjdk.org/browse/JDK-8325478) and the current fix should be moved along with the other elimination code. Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: removing header and modifying method name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24890/files - new: https://git.openjdk.org/jdk/pull/24890/files/afc2ef7e..a50d9f4f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24890&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24890&range=04-05 Stats: 5 lines in 2 files changed: 0 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24890/head:pull/24890 PR: https://git.openjdk.org/jdk/pull/24890 From duke at openjdk.org Thu May 8 12:25:11 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Thu, 8 May 2025 12:25:11 GMT Subject: RFR: 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list [v4] In-Reply-To: References: <8nLXv4E9uO3iolWrosCOU1dXhFjkW3OhHd3qlwRVnl0=.37f9054a-dabe-45e8-bcbb-f16995444628@github.com> Message-ID: On Thu, 8 May 2025 11:44:09 GMT, Tobias Hartmann wrote: >> src/hotspot/share/opto/macro.cpp line 2353: >> >>> 2351: } >>> 2352: >>> 2353: //------------------refine_strip_mined_loop_macro_node------------------- >> >> Those headers were used in legacy code and you still see quite some of them around. But nowadays, we usually remove them when touching these methods or at least we don't add them for new methods. >> >> Suggestion: > > I agree, please remove this header. Sorry, I missed this comment. I have fixed this in the latest commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24890#discussion_r2079611868 From shade at openjdk.org Thu May 8 12:39:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 May 2025 12:39:54 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 20:30:00 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Move to oops > > src/hotspot/share/runtime/vmStructs.cpp line 1266: > >> 1264: declare_toplevel_type(CDSFileMapRegion) \ >> 1265: declare_toplevel_type(UpcallStub::FrameData) \ >> 1266: declare_toplevel_type(UnloadableMethodHandle) \ > > So are these left for the async profiler? Yes, see https://github.com/async-profiler/async-profiler/issues/1260 that is filed already. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2079634276 From shade at openjdk.org Thu May 8 12:42:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 May 2025 12:42:54 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 20:28:10 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Move to oops > > src/hotspot/share/oops/unloadableMethodHandle.inline.hpp line 35: > >> 33: #include "oops/weakHandle.inline.hpp" >> 34: >> 35: inline UnloadableMethodHandle::UnloadableMethodHandle(Method* method) { > > This should initialize method in the ctor initializer list. Maybe, but the field is not `const`, so there seem to be no point? We also assign after assert checks `method` for us. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2079637960 From shade at openjdk.org Thu May 8 12:50:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 May 2025 12:50:54 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 19:54:04 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Move to oops > > src/hotspot/share/oops/unloadableMethodHandle.inline.hpp line 51: > >> 49: // Method holder class cannot be unloaded. >> 50: return nullptr; >> 51: } > > This is nice that this doesn't require creating a jni handle for unloadable class loaders with this change. Right? Wasteful to even go through all this dance for compiling JDK methods :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2079651140 From duke at openjdk.org Thu May 8 13:30:42 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 8 May 2025 13:30:42 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock Message-ID: # Issue Summary This PR addresses an `assert(bb->is_reachable())` that is triggered in the code for `-XX:+VerifyStack` after a deoptimization with reason `null_assert_or_unreached0` at a `getstatic` bytecode. Following the `getstatic` is an `areturn` and then an unreachable bytecode. When the code for `VerifyStack` tries to compute an oop map for the basic block of the unreachable bytecode, the assert triggers: getstatic Field A.val:"LB"; // if class B is not loaded, C2 deopts with reason "null_assert_or_unreached0" areturn; // The following is unreachable iconst_0; This is a similar problem to [JDK-8271055](https://bugs.openjdk.org/browse/JDK-8271055) (#7331), but this particular deopt with reason `null_assert_or_unreached0` at `getstatic` of a field containing an object reference [deopts at the next bytecode](https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/opto/parse3.cpp#L176-L199). The aforementioned issue introduced a check to skip stack verification of the next bytecode in the code if the execution after the deopted bytecode does not continue at the next bytecode in the code, i.e. falls through to the next bytecode. Unfortunately, this check did not include `areturn` as a bytecode that does not fall-through: https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/runtime/deoptimization.cpp#L845-L856 # Change Summary To fix the immediate issue described above, this PR adds `areturn` to the list of bytecodes that does not fall through. However, all return bytecodes exhibit the same behavior and might be susceptible to a similar issue. Even though I was not able to reproduce the same crash with `{d,f,i,l}return` because I could not get those or the preceding bytecode to deopt, I also added them to the `falls_through()` function. For the remaining bytecodes in `falls_through()` with the exception of `athrow` I wrote a regression test. # Testing - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14595928439) - [ ] tier1 through tier3 on Oracle supported platforms and OSs plus Oracle internal testing # Acknowledgements Special thanks to @eme64 for his hard work on reducing a reproducer that works on all platforms. ------------- Commit messages: - Mark more bytecodes as non-fallthrough - Add regression tests for non-fallthrough bytecodes Changes: https://git.openjdk.org/jdk/pull/25118/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25118&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336906 Stats: 232 lines in 5 files changed: 227 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25118.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25118/head:pull/25118 PR: https://git.openjdk.org/jdk/pull/25118 From jbhateja at openjdk.org Thu May 8 13:49:22 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 8 May 2025 13:49:22 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v16] In-Reply-To: References: Message-ID: <9Luwvte-huLN0cjCqBAdvitAE6ZwqPjmiLJSOEpFt04=.b9d7f325-0e85-44a9-ae18-2f770260c4f6@github.com> > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Reveiw suggestions incorporated ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/cfc09d05..8acbd7a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=14-15 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From shade at openjdk.org Thu May 8 14:33:02 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 8 May 2025 14:33:02 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: <7z9_pstIUOpdc3pzP49bmS4itCp75RlnKFuQ6-HQzWE=.082f8aaf-3134-4489-a8ad-71754338f8cb@github.com> On Wed, 7 May 2025 20:18:29 GMT, Coleen Phillimore wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Move to oops > > src/hotspot/share/compiler/compileBroker.cpp line 1697: > >> 1695: JavaThread* thread = JavaThread::current(); >> 1696: >> 1697: methodHandle method(thread, task->method()); > > I think this is safe because the Method* is in the CompileTask and redefinition will find it there. Being unsure of this is why this is here in a handle. Ah, that reminds me, thanks. I removed this because I caught method to be in unsafe (unloaded) state, so `method()` asserted on me. `compiler/c1/TestConcurrentPatching.java` seems to intermittently crash on it. On this code path, I think we might be plausibly waiting on unloaded compile task, and we "only" wait for notification that task got purged from the queue. Handelizing broken `Method*` is awkward, to say the least! Then again, I am not sure if removing this handle is safe enough. So out of abundance of caution, we can actually handelize `Method*` after checking for task status. But now that I do this: methodHandle method(thread, task->is_unloaded() ? nullptr : task->method()); ...the test still fails on the same assert! Which makes no sense to me, as we are supposed to be guarded by `is_unloaded` check before it. Something is off, I'll investigate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2079838894 From jbhateja at openjdk.org Thu May 8 14:44:43 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 8 May 2025 14:44:43 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v17] In-Reply-To: References: Message-ID: <8tz0nbg5nt0WR_9Y_Zd_G2I26Dl8D4a5wBd0wBbrRQY=.2c71f9e8-8aa7-4a04-88df-d2ef018d73a8@github.com> > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Code re-factoring from Vladimir ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/8acbd7a6..1a3bce93 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=15-16 Stats: 21 lines in 3 files changed: 7 ins; 7 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From yzheng at openjdk.org Thu May 8 14:49:42 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 8 May 2025 14:49:42 GMT Subject: RFR: 8353735: [JVMCI] Allow specifying storage kind of the callee save register [v2] In-Reply-To: References: Message-ID: <8jZWccxTMyrcHsQEiyaf6_TmGLBXIGdfW2bJWcVHMaU=.98eb7ab1-dc5c-4611-a2a9-4ca04d606836@github.com> > Windows x64 ABI considers the upper portions of YMM0-YMM15 and ZMM0-ZMM15 volatile, that is, destroyed on function calls. This PR allows `RegisterConfig` implementations to refine the storage kind of callee save register, such that JVMCI compiler can exploit this information to avoid saving full width of these registers. Yudi Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Update javadoc - Merge remote-tracking branch 'upstream/master' into JDK-8353735 - [JVMCI] Allow specifying storage kind of the callee save register ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24451/files - new: https://git.openjdk.org/jdk/pull/24451/files/339b72ef..fcdfd10d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24451&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24451&range=00-01 Stats: 315273 lines in 3080 files changed: 101272 ins; 201200 del; 12801 mod Patch: https://git.openjdk.org/jdk/pull/24451.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24451/head:pull/24451 PR: https://git.openjdk.org/jdk/pull/24451 From epeter at openjdk.org Thu May 8 14:56:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 May 2025 14:56:56 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock In-Reply-To: References: Message-ID: <0W1NJ3CRAdeugnJTYVGVtomqYJEX5QdVEua9XPSWn5g=.d5b30054-2805-4b8c-a9f9-5b1cdbc12d2a@github.com> On Thu, 8 May 2025 13:22:55 GMT, Manuel H?ssig wrote: > # Issue Summary > > This PR addresses an `assert(bb->is_reachable())` that is triggered in the code for `-XX:+VerifyStack` after a deoptimization with reason `null_assert_or_unreached0` at a `getstatic` bytecode. Following the `getstatic` is an `areturn` and then an unreachable bytecode. When the code for `VerifyStack` tries to compute an oop map for the basic block of the unreachable bytecode, the assert triggers: > > getstatic Field A.val:"LB"; // if class B is not loaded, C2 deopts with reason "null_assert_or_unreached0" > areturn; > // The following is unreachable > iconst_0; > > > This is a similar problem to [JDK-8271055](https://bugs.openjdk.org/browse/JDK-8271055) (#7331), but this particular deopt with reason `null_assert_or_unreached0` at `getstatic` of a field containing an object reference [deopts at the next bytecode](https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/opto/parse3.cpp#L176-L199). The aforementioned issue introduced a check to skip stack verification of the next bytecode in the code if the execution after the deopted bytecode does not continue at the next bytecode in the code, i.e. falls through to the next bytecode. Unfortunately, this check did not include `areturn` as a bytecode that does not fall-through: > https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/runtime/deoptimization.cpp#L845-L856 > > # Change Summary > > To fix the immediate issue described above, this PR adds `areturn` to the list of bytecodes that does not fall through. However, all return bytecodes exhibit the same behavior and might be susceptible to a similar issue. Even though I was not able to reproduce the same crash with `{d,f,i,l}return` because I could not get those or the preceding bytecode to deopt, I also added them to the `falls_through()` function. For the remaining bytecodes in `falls_through()` with the exception of `athrow` I wrote a regression test. > > # Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14595928439) > - [ ] tier1 through tier3 on Oracle supported platforms and OSs plus Oracle internal testing > > # Acknowledgements > Special thanks to @eme64 for his hard work on reducing a reproducer that works on all platforms. @mhaessig Thanks for looking into this! A few comments: 1) Did you go through all bytecodes we support here? I quickly scanned the wiki page, and found the (depricated but still present) `jsr` opcodes that also are essencially a goto, so do not fall-through, I think. At least it seems to me we could also have unreachable code after a `jsr`, right? And then there is a `ret` bytecode that does the symmetrical thing. Not sure if we even handle these in such a way that the bug could be reproduced, but worth a check! Can you go over all bytecodes, and make sure we are not missing any? Because this is already the second bug of this kind, would be good to fix it once and for all now ;) 2) > Even though I was not able to reproduce the same crash with {d,f,i,l}return because I could not get those or the preceding bytecode to deopt, I also added them to the falls_through() function. Hmm ok, I see. Might be worth investing just a little more time to see if we cannot get that done. Or else argue why it CANNOT be done. But then we might as well put an assert inside `falls_through` for those cases, to check that we actually never deopt like that at such an opcode, and revisit that assumption if we ever hit the assert. What do you think? 3) For the test: It's a bit of a shame to have lots of separate files. Especially because the directory name `test/hotspot/jtreg/compiler/interpreter/verifyStack/` may at some point have more tests, and then it gets a little confusing. I wonder if `A` and `B` could be nested classes in the java file? And the java and jasm file could be named very similarly, so that it is directly clear that they belong together when browsing the test files. An alternative: create a subdirectory that has a very unique name, so that we could separate things that way. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25118#pullrequestreview-2825371630 From yzheng at openjdk.org Thu May 8 14:57:10 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 8 May 2025 14:57:10 GMT Subject: RFR: 8353735: [JVMCI] Allow specifying storage kind of the callee save register [v3] In-Reply-To: References: Message-ID: <_8_bdUwiZc5xZqStJm2XfneFUTdCEx4c_uDsKJcMkTc=.1df612b0-30c8-4ae3-8706-bd634dd9fbc4@github.com> > Windows x64 ABI considers the upper portions of YMM0-YMM15 and ZMM0-ZMM15 volatile, that is, destroyed on function calls. This PR allows `RegisterConfig` implementations to refine the storage kind of callee save register, such that JVMCI compiler can exploit this information to avoid saving full width of these registers. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: Update javadoc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24451/files - new: https://git.openjdk.org/jdk/pull/24451/files/fcdfd10d..bc900518 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24451&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24451&range=01-02 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24451.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24451/head:pull/24451 PR: https://git.openjdk.org/jdk/pull/24451 From dnsimon at openjdk.org Thu May 8 14:57:11 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 8 May 2025 14:57:11 GMT Subject: RFR: 8353735: [JVMCI] Allow specifying storage kind of the callee save register [v3] In-Reply-To: <_8_bdUwiZc5xZqStJm2XfneFUTdCEx4c_uDsKJcMkTc=.1df612b0-30c8-4ae3-8706-bd634dd9fbc4@github.com> References: <_8_bdUwiZc5xZqStJm2XfneFUTdCEx4c_uDsKJcMkTc=.1df612b0-30c8-4ae3-8706-bd634dd9fbc4@github.com> Message-ID: On Thu, 8 May 2025 14:54:36 GMT, Yudi Zheng wrote: >> Windows x64 ABI considers the upper portions of YMM0-YMM15 and ZMM0-ZMM15 volatile, that is, destroyed on function calls. This PR allows `RegisterConfig` implementations to refine the storage kind of callee save register, such that JVMCI compiler can exploit this information to avoid saving full width of these registers. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Update javadoc Still good. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24451#pullrequestreview-2825424244 From epeter at openjdk.org Thu May 8 15:04:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 May 2025 15:04:53 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock In-Reply-To: <0W1NJ3CRAdeugnJTYVGVtomqYJEX5QdVEua9XPSWn5g=.d5b30054-2805-4b8c-a9f9-5b1cdbc12d2a@github.com> References: <0W1NJ3CRAdeugnJTYVGVtomqYJEX5QdVEua9XPSWn5g=.d5b30054-2805-4b8c-a9f9-5b1cdbc12d2a@github.com> Message-ID: On Thu, 8 May 2025 14:54:05 GMT, Emanuel Peter wrote: > Even though I was not able to reproduce the same crash with {d,f,i,l}return because I could not get those or the preceding bytecode to deopt, I also added them to the falls_through() function. Basically, there are 2 cases: - opcodes that deopt and retry: these were already there, as far as I know, and @dean-long added them in his previous patch. So here we could only take opcodes that: deopt, retry, and do not have fall-through. - opcodes that deopt but do not retry, but skip forward to the next op, that we then have to check for fall-through. For the deopting opcode, there are 2 categories: - Those that put something on the stack, like `getstatic` that puts whatever it got on the stack. This constrains what opcode comes after. If it returns an object/null, you can only do `return` (ignore stack value) or `areturn` (return that stack value). But you cannot do `ireturn` because the value on the stack is an object, not int. - Those that put nothing on the stack. Here we would not be constrained, and could push whatever we need on the stack before that opcode. E.g. we could push an int before that opcode, and then do `ireturn`. But I'm not sure if there are any such opcodes that deopt but push nothing on the stack. Worth checking though! Hope this case distinction helps a little, I'm not sure it is particularly clear or accurate, but these are the things I would look into if I were working on this bug :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25118#issuecomment-2863377063 From epeter at openjdk.org Thu May 8 15:09:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 8 May 2025 15:09:53 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock In-Reply-To: References: Message-ID: On Thu, 8 May 2025 13:22:55 GMT, Manuel H?ssig wrote: > # Issue Summary > > This PR addresses an `assert(bb->is_reachable())` that is triggered in the code for `-XX:+VerifyStack` after a deoptimization with reason `null_assert_or_unreached0` at a `getstatic` bytecode. Following the `getstatic` is an `areturn` and then an unreachable bytecode. When the code for `VerifyStack` tries to compute an oop map for the basic block of the unreachable bytecode, the assert triggers: > > getstatic Field A.val:"LB"; // if class B is not loaded, C2 deopts with reason "null_assert_or_unreached0" > areturn; > // The following is unreachable > iconst_0; > > > This is a similar problem to [JDK-8271055](https://bugs.openjdk.org/browse/JDK-8271055) (#7331), but this particular deopt with reason `null_assert_or_unreached0` at `getstatic` of a field containing an object reference [deopts at the next bytecode](https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/opto/parse3.cpp#L176-L199). The aforementioned issue introduced a check to skip stack verification of the next bytecode in the code if the execution after the deopted bytecode does not continue at the next bytecode in the code, i.e. falls through to the next bytecode. Unfortunately, this check did not include `areturn` as a bytecode that does not fall-through: > https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/runtime/deoptimization.cpp#L845-L856 > > # Change Summary > > To fix the immediate issue described above, this PR adds `areturn` to the list of bytecodes that does not fall through. However, all return bytecodes exhibit the same behavior and might be susceptible to a similar issue. Even though I was not able to reproduce the same crash with `{d,f,i,l}return` because I could not get those or the preceding bytecode to deopt, I also added them to the `falls_through()` function. For the remaining bytecodes in `falls_through()` with the exception of `athrow` I wrote a regression test. > > # Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14595928439) > - [ ] tier1 through tier3 on Oracle supported platforms and OSs plus Oracle internal testing > > # Acknowledgements > Special thanks to @eme64 for his hard work on reducing a reproducer that works on all platforms. Quickly scanning, I see these that also may or may not have a fall-through: `lookupswitch`, `tableswitch` ------------- PR Comment: https://git.openjdk.org/jdk/pull/25118#issuecomment-2863398487 From kvn at openjdk.org Thu May 8 15:14:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 8 May 2025 15:14:54 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: <40ZOuLCtxa6ytFKxGHY5mHY_SI_e1AxrXSUrpmNB9Lk=.17f141ca-5b1e-4ead-8416-86f5b7382598@github.com> On Tue, 6 May 2025 18:57:14 GMT, Roberto Casta?eda Lozano wrote: > > Why the attribute is not set for `zLoadP` on x64? > > `ins_is_late_expanded_null_check_candidate` is set to `true` for `zLoadP` in [src/hotspot/cpu/x86/gc/z/z_x86_64.ad (line 121)](https://github.com/openjdk/jdk/pull/25066/files#diff-183d5784f9317f5582b267d82e7afa4e23ae137671fab8ba9cb5b502dae52b3dR121), or did I misunderstand your question? Somehow I missed this change. Good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2863416833 From dlunden at openjdk.org Thu May 8 15:21:34 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 8 May 2025 15:21:34 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v3] In-Reply-To: References: Message-ID: > The current documentation for `PhaseCFG::insert_anti_dependences` is difficult to follow and sometimes even misleading. We should ensure the method is appropriately documented. > > ### Changeset > > - Rename `PhaseCFG::insert_anti_dependences` to `PhaseCFG::raise_above_anti_dependences`. The purpose of `PhaseCFG::raise_above_anti_dependences` is twofold: raise the load's LCA so that the load is scheduled before anti-dependent stores, and if necessary add anti-dependence edges between the load and certain anti-dependent stores (to ensure we later "raise" the load before anti-dependent stores in LCM). The name `PhaseCFG::insert_anti_dependences` suggests that we only add anti-dependence edges. The name `PhaseCFG::raise_above_anti_dependences`, therefore, seems more appropriate. > - Significantly add to and revise the source code documentation of `PhaseCFG::raise_above_anti_dependences`. > - Add, move, and revise `assert`s in `PhaseCFG::raise_above_anti_dependences`, including improved `assert` messages in a few places. > - In the main worklist loop of `PhaseCFG::raise_above_anti_dependences`: > - Clean up how we identify the search root (avoid mutation). > - Add a missing early exit for `Phi` nodes when `LCA == early`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14706896111) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Updates after comments from Roberto ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24926/files - new: https://git.openjdk.org/jdk/pull/24926/files/3d11b554..1ba0dcff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24926&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24926&range=01-02 Stats: 29 lines in 3 files changed: 3 ins; 7 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/24926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24926/head:pull/24926 PR: https://git.openjdk.org/jdk/pull/24926 From dlunden at openjdk.org Thu May 8 15:21:34 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 8 May 2025 15:21:34 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v2] In-Reply-To: References: <95YBckz3m_3L4DtOY38G7BjOFvljWoqGRqV3EIJi2-8=.f06b86d4-6e2f-4cb6-b5e2-382c7831b4d3@github.com> Message-ID: On Wed, 7 May 2025 14:29:07 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates after reviews > > src/hotspot/share/opto/gcm.cpp line 667: > >> 665: >> 666: //------------------------raise_above_anti_dependences--------------------------- >> 667: // The argument load has a current scheduling range in the dominator tree that > > Could you start this long comment with a one-sentence summary of what the function does? I.e. something like "Enforce a scheduling of the given load where its input memory state is not overwritten by an anti-dependent store". Sure! I went with // Enforce a scheduling of the argument load that ensures anti-dependent stores // do not overwrite the load's input memory state before the load executes. > src/hotspot/share/opto/gcm.cpp line 710: > >> 708: // B4, which means that the updated LCA is B2. Now, consider the store in B2. >> 709: // Raising the LCA above B2 has no effect, because B2 is on the dominator tree >> 710: // branch between early and the current LCA (in fact, B2 is the current LCA). > > I found this sentence a bit unclear, could you clarify what you mean by "has no effect"? Now simplified, does it look better? What I meant is that we do not need to raise the LCA because it is already high enough. > src/hotspot/share/opto/gcm.cpp line 723: > >> 721: // edges back to the load. The caller is expected to eventually schedule the >> 722: // load in the LCA, but may also hoist the load above the LCA, if it is not the >> 723: // early block. > > What code expects the caller to schedule the load in the LCA? Maybe rephrase into something more relaxed like e.g. "The caller may schedule the load in the LCA, or it may hoist the load above the LCA, if it is not the early block.". Good point, I applied your suggestion as is. This is an old comment, not sure why it was formulated in this way. > src/hotspot/share/opto/gcm.cpp line 758: > >> 756: >> 757: // Note the earliest legal placement of 'load', as determined by >> 758: // by the unique point in the dominator tree where all memory effects > > Suggestion: > > // the unique point in the dominator tree where all memory effects Thanks, fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2079929902 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2079931275 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2079933910 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2079934375 From kvn at openjdk.org Thu May 8 15:24:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 8 May 2025 15:24:53 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 6 May 2025 13:28:28 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). src/hotspot/share/opto/lcm.cpp line 95: > 93: } > 94: > 95: void PhaseCFG::maybe_hoist_into(Node* n, Block* b) { Consider adding asserts into these 2 new methods to make sure that they operate only on **data** and not control nodes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2079942627 From dlunden at openjdk.org Thu May 8 15:25:53 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 8 May 2025 15:25:53 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v2] In-Reply-To: References: <95YBckz3m_3L4DtOY38G7BjOFvljWoqGRqV3EIJi2-8=.f06b86d4-6e2f-4cb6-b5e2-382c7831b4d3@github.com> Message-ID: On Wed, 7 May 2025 14:31:57 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates after reviews > > src/hotspot/share/opto/gcm.cpp line 779: > >> 777: ResourceArea* area = Thread::current()->resource_area(); >> 778: >> 779: // Bookkeeping of possibly anti-dependent stores that we find outside of the > > Suggestion: > > // Bookkeeping of possibly anti-dependent stores that we find below the Technically, "outside of" is more appropriate here, because the stores that we bookkeep are not necessarily dominated by ("below") early. Since the search starts from `initial_mem`, which can be in a much earlier block than early, stores that we bookkeep can also be above early, or on completely distinct control-flow paths that do not even go through early. But, you are correct that only stores below early matter in the end. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2079940985 From kvn at openjdk.org Thu May 8 15:28:59 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 8 May 2025 15:28:59 GMT Subject: RFR: 8356453: C2: assert(!vbox->is_Phi()) during vector box expansion In-Reply-To: References: Message-ID: <6lKWcfOXGmLPgKy-LWq7WmB1tInsfUbv_rVVBL_qzDA=.d9dd055d-42dd-49db-9312-f49487eaecf4@github.com> On Thu, 8 May 2025 01:48:17 GMT, Vladimir Ivanov wrote: > Some Vector API tests fail with an assert during `PhaseVector::expand_vbox_node()`. The assert itself is the culprit, since it doesn't cover the case when VBox node is already expanded. > > Proposed fix adjusts the assert. > > Also, extended `PhaseVector::optimize_vector_boxes()` with `StressMacroExpansion` support. > > Testing: hs-tier1 - hs-tier5 Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25110#pullrequestreview-2825531981 From dlunden at openjdk.org Thu May 8 15:31:42 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 8 May 2025 15:31:42 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v4] In-Reply-To: References: Message-ID: > The current documentation for `PhaseCFG::insert_anti_dependences` is difficult to follow and sometimes even misleading. We should ensure the method is appropriately documented. > > ### Changeset > > - Rename `PhaseCFG::insert_anti_dependences` to `PhaseCFG::raise_above_anti_dependences`. The purpose of `PhaseCFG::raise_above_anti_dependences` is twofold: raise the load's LCA so that the load is scheduled before anti-dependent stores, and if necessary add anti-dependence edges between the load and certain anti-dependent stores (to ensure we later "raise" the load before anti-dependent stores in LCM). The name `PhaseCFG::insert_anti_dependences` suggests that we only add anti-dependence edges. The name `PhaseCFG::raise_above_anti_dependences`, therefore, seems more appropriate. > - Significantly add to and revise the source code documentation of `PhaseCFG::raise_above_anti_dependences`. > - Add, move, and revise `assert`s in `PhaseCFG::raise_above_anti_dependences`, including improved `assert` messages in a few places. > - In the main worklist loop of `PhaseCFG::raise_above_anti_dependences`: > - Clean up how we identify the search root (avoid mutation). > - Add a missing early exit for `Phi` nodes when `LCA == early`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14706896111) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Rename must_raise_LCA to must_raise_LCA_above_marks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24926/files - new: https://git.openjdk.org/jdk/pull/24926/files/1ba0dcff..e4fb8a0d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24926&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24926&range=02-03 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24926/head:pull/24926 PR: https://git.openjdk.org/jdk/pull/24926 From dlunden at openjdk.org Thu May 8 15:31:43 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 8 May 2025 15:31:43 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v2] In-Reply-To: References: <95YBckz3m_3L4DtOY38G7BjOFvljWoqGRqV3EIJi2-8=.f06b86d4-6e2f-4cb6-b5e2-382c7831b4d3@github.com> Message-ID: On Wed, 7 May 2025 14:33:16 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates after reviews > > src/hotspot/share/opto/gcm.cpp line 785: > >> 783: Node_List non_early_stores(area); >> 784: >> 785: // Flag that indicates if we must attempt to raise the LCA after the main > > Suggestion: > > // Whether we must attempt to raise the LCA after the main > > Also, could you clarify what you mean by "attempt"? Could LCA raising fail somehow? Thanks, poor choice of wording here. What I mean is that we may call `raise_LCA_above_marks`, but the result could still be an unchanged LCA. I just simplified this description now and do not mention "attempt". Come to think of it, a less ambiguous name for this variable is `must_raise_LCA_above_marks`. I'll update it. > src/hotspot/share/opto/gcm.cpp line 803: > >> 801: // MergeMems do not modify the memory state. Anti-dependent stores or memory >> 802: // Phis may, however, exist downstream of MergeMems. Therefore, we must >> 803: // permit the search to continue through MergeMems. Memory-state-modifying > > Now that you have already explained that "memory-state-modifying nodes" are also referred to as "stores", you could stick to using "stores" for brevity. Yes, good point. Updated. > src/hotspot/share/opto/gcm.cpp line 850: > >> 848: // - just past a MergeMem with the edge (MergeMem, use_mem_state). >> 849: // we have passed a MergeMem and are now at an edge >> 850: // (MergeMem, use_mem_state). > > Are these two lines intended to be here? Oops, must be a leftover from an edit. Removed, thanks. > src/hotspot/share/opto/gcm.cpp line 853: > >> 851: assert(def_mem_state == nullptr || def_mem_state == initial_mem || >> 852: def_mem_state->is_MergeMem(), >> 853: "invariant failed"); > > Suggestion: > > "unexpected memory state"); Thanks, fixed > src/hotspot/share/opto/gcm.cpp line 892: > >> 890: >> 891: // At this point, use_mem_state is either a store or a memory Phi. >> 892: assert(!use_mem_state->is_MergeMem(), "invariant failed"); > > Suggestion: > > assert(!use_mem_state->is_MergeMem(), > "use_mem_state should be either a store or a memory Phi"); Thanks, fixed > src/hotspot/share/opto/gcm.cpp line 951: > >> 949: // which we must raise the LCA above (set_raise_LCA_mark), and keep >> 950: // track of nodes that potentially need anti-dependence edges >> 951: // (non_early_stores). The only exceptions to this is if we > > Suggestion: > > // (non_early_stores). The only exceptions to this are if we Thanks, fixed > src/hotspot/share/opto/gcm.cpp line 957: > >> 955: // >> 956: // After the worklist loop, we perform an efficient combined LCA-raising >> 957: // operation over all marks and then only add anti-dependence edges where > > Suggestion: > > // operation over all marks and only then add anti-dependence edges where Thanks, fixed > src/hotspot/share/opto/gcm.cpp line 1014: > >> 1012: pred_block->set_raise_LCA_mark(load_index); >> 1013: must_raise_LCA = true; >> 1014: } else /* if (pred_block == early */ { > > Suggestion: > > } else /* if (pred_block == early) */ { Thanks, fixed > src/hotspot/share/opto/gcm.cpp line 1052: > >> 1050: } >> 1051: } >> 1052: // (Worklist is now empty; we have visited all possible anti-dependences.) > > Suggestion: > > // Worklist is now empty; we have visited all possible anti-dependences. Thanks, fixed > test/hotspot/jtreg/compiler/loopopts/TestSplitIfPinnedLoadInStripMinedLoop.java line 141: > >> 139: >> 140: // Same as test2 but with reference to inner loop induction variable 'j' and different order of instructions. >> 141: // Triggered an assert in PhaseCFG::raise_above_anti_dependences if loop strip mining verification was disabled: > > If the proposed assertions are stronger than the one in mainline, there is no need to rewrite this sentence in past tense, in my opinion. I agree, better to not change more than necessary. I've reverted the change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2079949106 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2079950578 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2079951174 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2079951703 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2079951928 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2079952140 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2079952361 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2079952544 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2079952672 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2079953516 From dlunden at openjdk.org Thu May 8 15:33:00 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 8 May 2025 15:33:00 GMT Subject: RFR: 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000) In-Reply-To: References: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> Message-ID: <0wriUZC04h3jkDi6xlcaDV-2ZtDFJLQiauul5Kq8WIs=.2e1b282d-7da6-4132-a2e1-f30d2f65463b@github.com> On Wed, 7 May 2025 12:23:46 GMT, Daniel Lund?n wrote: >>> The issue with weakening the per-iteration assertion in special cases is that we _must_ ensure that we do not grow by more than `max_live_nodes_increase_per_iteration` in a single iteration. Below is my failure analysis for [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833) which describes the issue. >> >> Fair enough, thanks for the explanation. > > Thanks for the review @robcasloz! > > @eme64 >> @dlunde Ok, then let's declare this as a "quickfix", and file a follow-up RFE. Maybe it should also be declared a lower-priority bug? > > Sounds good to me. Yes, definitely lower priority for now. We do not even know if the transformation is expected or not, although I agree it looks suspicious. > @dlunde Approved, with the assumption that you will file that follow-up RFE and link it with this issue :) @eme64 Thanks for the review! Of course, already made a note. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24960#issuecomment-2863465426 From dlunden at openjdk.org Thu May 8 15:35:53 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 8 May 2025 15:35:53 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v2] In-Reply-To: References: <95YBckz3m_3L4DtOY38G7BjOFvljWoqGRqV3EIJi2-8=.f06b86d4-6e2f-4cb6-b5e2-382c7831b4d3@github.com> Message-ID: On Wed, 7 May 2025 14:52:09 GMT, Roberto Casta?eda Lozano wrote: > Thanks for doing this, Daniel! `insert_anti_dependences` is indeed easier to understand after your proposed cleanups and additional comments. I have a few questions and suggestions. Thanks for the review @robcasloz! > Please update also the reference to `insert_anti_dependencies` in `src/hotspot/share/adlc/output_h.cpp`. Good catch, I only `grep`ped for insert_anti_dependences :slightly_smiling_face: ------------- PR Comment: https://git.openjdk.org/jdk/pull/24926#issuecomment-2863474583 From qamai at openjdk.org Thu May 8 16:56:53 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 8 May 2025 16:56:53 GMT Subject: RFR: 8356281: Fix for TestFPComparison failure due to incorrect result In-Reply-To: References: Message-ID: On Wed, 7 May 2025 16:05:53 GMT, Srinivas Vamsi Parasa wrote: > This PR fixes the cause of failure in TestFPComparison while using APX NDD instructions. > > The test passes after using this fix as shown below: > > Passed: compiler/c2/irTests/TestFPComparison.java > Test results: passed: 1 > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP > jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison.java > 1 1 0 0 0 > ============================== > TEST SUCCESS src/hotspot/cpu/x86/x86_64.ad line 6273: > 6271: ins_cost(200); > 6272: format %{ "ecmovpl $dst, $src1, $src2\n\t" > 6273: "cmovnel $dst, $src2" %} You need `effect(TEMP dst)` for these nodes, too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25101#discussion_r2080104541 From vlivanov at openjdk.org Thu May 8 17:55:58 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 8 May 2025 17:55:58 GMT Subject: RFR: 8356453: C2: assert(!vbox->is_Phi()) during vector box expansion In-Reply-To: References: Message-ID: On Thu, 8 May 2025 01:48:17 GMT, Vladimir Ivanov wrote: > Some Vector API tests fail with an assert during `PhaseVector::expand_vbox_node()`. The assert itself is the culprit, since it doesn't cover the case when VBox node is already expanded. > > Proposed fix adjusts the assert. > > Also, extended `PhaseVector::optimize_vector_boxes()` with `StressMacroExpansion` support. > > Testing: hs-tier1 - hs-tier5 Thanks for the reviews, Tobias and Vladimir. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25110#issuecomment-2863830025 From vlivanov at openjdk.org Thu May 8 17:56:00 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 8 May 2025 17:56:00 GMT Subject: RFR: 8356453: C2: assert(!vbox->is_Phi()) during vector box expansion In-Reply-To: References: Message-ID: On Thu, 8 May 2025 09:14:00 GMT, SendaoYan wrote: >> Some Vector API tests fail with an assert during `PhaseVector::expand_vbox_node()`. The assert itself is the culprit, since it doesn't cover the case when VBox node is already expanded. >> >> Proposed fix adjusts the assert. >> >> Also, extended `PhaseVector::optimize_vector_boxes()` with `StressMacroExpansion` support. >> >> Testing: hs-tier1 - hs-tier5 > > test/hotspot/jtreg/compiler/vectorapi/VectorBoxExpandTest.java line 44: > >> 42: private static int[] iarr = new int[ARR_LEN]; >> 43: private static IntVector g; >> 44: private static int acc = 0; > > Maybe we should update the copyright year. The file has non-Oracle legal notice, so I'm not eligible to change it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25110#discussion_r2080189978 From vlivanov at openjdk.org Thu May 8 17:56:01 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 8 May 2025 17:56:01 GMT Subject: Integrated: 8356453: C2: assert(!vbox->is_Phi()) during vector box expansion In-Reply-To: References: Message-ID: On Thu, 8 May 2025 01:48:17 GMT, Vladimir Ivanov wrote: > Some Vector API tests fail with an assert during `PhaseVector::expand_vbox_node()`. The assert itself is the culprit, since it doesn't cover the case when VBox node is already expanded. > > Proposed fix adjusts the assert. > > Also, extended `PhaseVector::optimize_vector_boxes()` with `StressMacroExpansion` support. > > Testing: hs-tier1 - hs-tier5 This pull request has now been integrated. Changeset: b7b437d5 Author: Vladimir Ivanov URL: https://git.openjdk.org/jdk/commit/b7b437d5bd579a7a90a90470979768cdd085728c Stats: 11 lines in 2 files changed: 9 ins; 0 del; 2 mod 8356453: C2: assert(!vbox->is_Phi()) during vector box expansion Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/25110 From jbhateja at openjdk.org Thu May 8 19:21:31 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 8 May 2025 19:21:31 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v18] In-Reply-To: References: Message-ID: <0t720cpyX-RwVGVlm0b9gNbSjeMHWy5cnF-o4xSWRgU=.130e6474-3aa2-48a8-90d1-6f3a69c135ee@github.com> > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Addressing Yudi's comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/1a3bce93..c65f0777 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=16-17 Stats: 7 lines in 5 files changed: 2 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From vlivanov at openjdk.org Thu May 8 19:23:59 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 8 May 2025 19:23:59 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v18] In-Reply-To: <0t720cpyX-RwVGVlm0b9gNbSjeMHWy5cnF-o4xSWRgU=.130e6474-3aa2-48a8-90d1-6f3a69c135ee@github.com> References: <0t720cpyX-RwVGVlm0b9gNbSjeMHWy5cnF-o4xSWRgU=.130e6474-3aa2-48a8-90d1-6f3a69c135ee@github.com> Message-ID: On Thu, 8 May 2025 19:21:31 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Addressing Yudi's comments Testing results (hs-tier1 - hs-tier4) are clean. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24329#pullrequestreview-2826156052 From duke at openjdk.org Thu May 8 19:31:43 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 8 May 2025 19:31:43 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v15] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 54 additional commits since the last revision: - Fix null check - Remove unnecessary include - Add nullptr check to relocate - Fix JVMCI nmethod data - Unexclude JVMCI methods - Add relocate_nmethod_mirror - Only hold NMethodState_lock when needed - Exclude JVMCI nmethods - Remove StressNMethodRelocation - Fix branch_range revert - ... and 44 more: https://git.openjdk.org/jdk/compare/2fd6e608...9ca3563a ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/027f5245..9ca3563a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=13-14 Stats: 406484 lines in 4932 files changed: 132671 ins; 253814 del; 19999 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From yzheng at openjdk.org Thu May 8 19:40:00 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 8 May 2025 19:40:00 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v18] In-Reply-To: <0t720cpyX-RwVGVlm0b9gNbSjeMHWy5cnF-o4xSWRgU=.130e6474-3aa2-48a8-90d1-6f3a69c135ee@github.com> References: <0t720cpyX-RwVGVlm0b9gNbSjeMHWy5cnF-o4xSWRgU=.130e6474-3aa2-48a8-90d1-6f3a69c135ee@github.com> Message-ID: On Thu, 8 May 2025 19:21:31 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Addressing Yudi's comments CPU features in Graal remain the same after this PR. Passed all Graal compiler unit tests. ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/24329#pullrequestreview-2826187636 From sparasa at openjdk.org Thu May 8 19:43:06 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 8 May 2025 19:43:06 GMT Subject: RFR: 8356281: Fix for TestFPComparison failure due to incorrect result [v2] In-Reply-To: References: Message-ID: > This PR fixes the cause of failure in TestFPComparison while using APX NDD instructions. > > The test passes after using this fix as shown below: > > Passed: compiler/c2/irTests/TestFPComparison.java > Test results: passed: 1 > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP > jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison.java > 1 1 0 0 0 > ============================== > TEST SUCCESS Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: add missing predicates for cmovP_regUCF ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25101/files - new: https://git.openjdk.org/jdk/pull/25101/files/f2588f68..ec2959b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25101&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25101&range=00-01 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25101.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25101/head:pull/25101 PR: https://git.openjdk.org/jdk/pull/25101 From sparasa at openjdk.org Thu May 8 19:43:06 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 8 May 2025 19:43:06 GMT Subject: RFR: 8356281: Fix for TestFPComparison failure due to incorrect result In-Reply-To: <1s6dHPf6iddBRe5ide_kmaws8HRKfm80gRWjE0raZ7w=.c549a64f-cebb-4632-bcf8-64c55f9ff2d0@github.com> References: <1s6dHPf6iddBRe5ide_kmaws8HRKfm80gRWjE0raZ7w=.c549a64f-cebb-4632-bcf8-64c55f9ff2d0@github.com> Message-ID: <8h2CWUkyvoaO0D9Q4wkKzYanexo_end41vw_HPZLAvc=.85c0dbbc-7cc8-4bd4-ad83-52210a69bfd7@github.com> On Thu, 8 May 2025 00:16:24 GMT, Sandhya Viswanathan wrote: > cmovP_regUCF_ndd instruct doesn't have UseAPX as predicate. Please see the missing predicates added in the updated code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25101#issuecomment-2864093107 From duke at openjdk.org Thu May 8 20:01:56 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 8 May 2025 20:01:56 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v13] In-Reply-To: References: <1_OG6abFZwl9AWbxsm5eCrL6RWq1wTnPngdDky6V3f8=.3a126cd0-22c0-45bb-9f85-3f096de116d6@github.com> Message-ID: On Fri, 25 Apr 2025 22:49:55 GMT, Chad Rakoczy wrote: >> ~I did this because during code buffer expansion `pd_set_call_destination` gets called but there is no relocation info at that time. So with debug builds it was incorrectly trying to find a trampoline stub that did not exist yet because it believed it needed to when it didn't. I agree this is probably not the best approach though and I will look for a better solution~ > > Actually the issue is not during code buffer expansion. It's called when creating a new nmethod that I can only get to occur when using the Graal compiler. So it may not be true that calls always have trampolines in the case of Graal. This _fix_ may just make the bug harder to encounter For debug builds Hotspot uses the 2M range to determine if there should be a trampoline or not for a call. Graal uses 128M regardless of debug or release builds. This means that Graal compiled methods may not have trampolines but this check will expect them too. I reverted this change as it just means there is a difference on how Graal and Hotspot determine max branch range ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2080383640 From duke at openjdk.org Thu May 8 20:28:58 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 8 May 2025 20:28:58 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v7] In-Reply-To: References: Message-ID: <72OW9wHbET022fBnWx1Wdxb_J9pbH2sLiAqlC9fGb-c=.6930c0b1-33bb-4c49-af02-11e2c79dbaf2@github.com> On Fri, 25 Apr 2025 22:31:28 GMT, Erik ?sterlund wrote: >>> Hi @fisk, >>> >>> Thank you for the very valuable comment. It has point we have not thought about. >>> >>> > I am not fond of "special nmethods" that work subtly different to normal nmethods and have their own special life cycles. >>> >>> It's not clear to me what you mean "special nmethods". IMO we don't introduce any special nmethods. From my point of view, a normal nmethod is an nmethod for a ordinary Java method. Nmethods for non-ordinary Java methods are special, e.g. native nmethods or method handle linkers(JDK-8263377). I think normal nmethods should be relocatable within CodeCache. >> >> I mean nmethods with a subtly different life cycle where usual invariants/expectations don't hold. Like method handle intrinsics and enter special intrinsics for example. Used to have a different life cycle for OSR nmethods too. >> >>> > You can't just copy oops. >>> >>> Yes, this is the main issue at the moment. Can we do this at a safepoint? >> >> I don't think it solves much. You can't stash away a pointer to the nmethod, roll to a safepoint, and expect the nmethod to not be freed. Even if you did, you still can't copy the oops. >> >> If we are to do this, I think you want to apply nmethod entry barriers first. That stabilizes the oops. >> >>> > I'm worried about copying the nmethod epoch counters >>> >>> We should clear them. If not, it is a bug. >> >> I'd like to change copying from opt-out to opt-in instead; that would make me feel more comfortable. Then perhaps you can share initialization code that sets up the initial state of the nmethod exactly in the same way as normal nmethods. >> >> I didn't check but you need to take the Compile_lock and verify dependencies too if you didn't do that, I think, so you don't race with deoptimization. >> >>> > You don't check if the nmethod is_unloading() when cloning it. >>> >>> Should such nmethods be not entrant? We don't relocate not entrant nmethods. >> >> is_not_entrant doesn't imply is_unloading. >> >>> > What are the consequences of copying the deoptimization generation? >>> >>> What do you mean? >> >> I mean is it safe to racingly copy the deoptmization generation when there is concurrent deoptimization? This is why I'd prefer copying to be opt-in rather than opt-out so we don't have to stare at every single field and wonder what will happen when a new nmethod "inherits" state from a different nmethod in interesting races. I want it to work as much as possible as normal nmethod installation, starting with a state as close as po... > >> @fisk Thank you for the valuable feedback. Here is a more detailed response to the concerns you brought up > > Thanks, it's shaping up. > >> Instead of tracking the nmethod pointer which could become stale I updated the code to use method handles. I believe the method handle should ensure the method remains valid and we can then relocate its corresponding nmethod. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/runtime/vmOperations.cpp#L106-L110) > > The safepoint is still causing more trouble than it solves. It was introduced due to oop phobia. What the oops really needed to stabilize is to run the entry barrier which you do now. The safepoint merely destabilizes the oops again while introducing latency problems and fun class redefinition interactions. It should be removed as I can't see it serves any purpose. > >> The relocated nmethod is added as a dependent nmethod on all of the MethodHandles and InstranceKlass in its dependency scope. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1543-L1564) > > My concern was about something else - a table tracks all the nmethods that have old metadata in order to speed up a walk over the code cache that finds said nmethods. > > This should be dealt with by not relocating nmethods with evol dependencies/metadata and by not safepointing, which could introduce class redefinition which populates this table. > >> The source nmethod entry barrier is now called before copying. I believe this will disarm the barrier and reset the guard value for it to be safe to copy. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1530) > > Yes and fix the oops so you don't need a safepoint. > >> Copying this value was not intentional. It should be correctly set to the default value now. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1441) > > Good. > >> I added this check to ensure the nmethod is not unloading and removed the not entrant check as is unloading implies not entrant. [Reference](https://github.com/chadrako/jdk/blob/027f5245a6829e79bd8624c1cca542c4c24ace5c/src/hotspot/share/code/nmethod.cpp#L1583-L1585) > > That's not quite true. There are two separate mechanisms that guard the entry. When sn nmethod becomes invalid due to for example a broken speculative assumpti... @fisk I believe I have addressed the remaining issues you brought up > The safepoint is still causing more trouble than it solves. It was introduced due to oop phobia. What the oops really needed to stabilize is to run the entry barrier which you do now. The safepoint merely destabilizes the oops again while introducing latency problems and fun class redefinition interactions. It should be removed as I can't see it serves any purpose. Relocation no longer occurs at a safepoint > My concern was about something else - a table tracks all the nmethods that have old metadata in order to speed up a walk over the code cache that finds said nmethods. > This should be dealt with by not relocating nmethods with evol dependencies/metadata and by not safepointing, which could introduce class redefinition which populates this table. I added the check for evol dependecies/metadata to not relocate them ([reference](https://github.com/chadrako/jdk/blob/9ca3563a0fe8e021a7a99107a4b675d2210a34b2/src/hotspot/share/code/nmethod.cpp#L1616)) > Okay. Speaking of which, seems like the NMethodState_lock is held for way too long - usually just held when setting the Method code and updating the nmethod state after the initial state is set. Keeping the lock across other things makes me worried of deadlocks. The NMethodState_lock is now only held when the state gets updated ([reference](https://github.com/chadrako/jdk/blob/9ca3563a0fe8e021a7a99107a4b675d2210a34b2/src/hotspot/share/code/nmethod.cpp#L1575)) > Have you checked what the JVMCI speculation data and JVMCI data contains and if your approach will break that? JVMCI has an nmethod mirror object that refers back to the nmethod - this is unlikely to work out of the box with cloning. The HotspotNMethod represented by the nmethod mirror is now updated to reflect the new nmethod ([reference](https://github.com/chadrako/jdk/blob/9ca3563a0fe8e021a7a99107a4b675d2210a34b2/src/hotspot/share/code/nmethod.cpp#L1582-L1583) [reference](https://github.com/chadrako/jdk/blob/9ca3563a0fe8e021a7a99107a4b675d2210a34b2/src/hotspot/share/jvmci/jvmciRuntime.cpp#L851-L863)) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2864228319 From eosterlund at openjdk.org Thu May 8 21:24:06 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 8 May 2025 21:24:06 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v15] In-Reply-To: References: Message-ID: On Thu, 8 May 2025 19:31:43 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 54 additional commits since the last revision: > > - Fix null check > - Remove unnecessary include > - Add nullptr check to relocate > - Fix JVMCI nmethod data > - Unexclude JVMCI methods > - Add relocate_nmethod_mirror > - Only hold NMethodState_lock when needed > - Exclude JVMCI nmethods > - Remove StressNMethodRelocation > - Fix branch_range revert > - ... and 44 more: https://git.openjdk.org/jdk/compare/a077da0e...9ca3563a src/hotspot/share/jvmci/jvmciRuntime.cpp line 852: > 850: > 851: void JVMCINMethodData::relocate_nmethod_mirror(nmethod* nm) { > 852: oop nmethod_mirror = get_nmethod_mirror(nm, /* phantom_ref */ false); Why is phantom false? src/hotspot/share/jvmci/jvmciRuntime.cpp line 858: > 856: > 857: JVMCIEnv* jvmciEnv = nullptr; > 858: HotSpotJVMCI::InstalledCode::set_address(jvmciEnv, nmethod_mirror, (jlong)(nm)); What's the sync story here? Any lock protecting this? If not, I wonder if readers are okay with inconsistencies. I haven't checked. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2080483634 PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2080485969 From sviswanathan at openjdk.org Thu May 8 22:20:52 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 8 May 2025 22:20:52 GMT Subject: RFR: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding [v2] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 10:21:54 GMT, Jatin Bhateja wrote: >> This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 >> >> ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] >> >> In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. >> >> This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 >> >> >> PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Looks good to me as well. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24919#pullrequestreview-2826479403 From sviswanathan at openjdk.org Fri May 9 00:03:56 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 9 May 2025 00:03:56 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v9] In-Reply-To: References: Message-ID: On Sat, 3 May 2025 07:28:04 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/vm_version_x86.cpp line 464: >> >>> 462: __ movl(rcx, 0x18000000); // cpuid1 bits osxsave | avx >>> 463: __ andl(rcx, Address(rsi, 8)); // cpuid1 bits osxsave | avx >>> 464: __ jccb(Assembler::equal, done); // jump if AVX is not supported >> >> This doesn't not have same effect as before. Consider input is 0x10000000, the andl result will not be zero with this code and so jump to done will not happen. Whereas prior to this change, the cmpl with 0x18000000 will fail for equality and so a jump to done will happen. This is the case for all the places where we are checking more than 1 set bit. > > Thanks @sviswa7 , sub-optimality was mainly around single-bit comparisons, where we could save redundant CMP after AND, and by flipping the predicate of subsequent flag-consuming JMP, multibits compares should remain unaltered. This and all the following places with multi-bit check still need to be fixed. If you walk through stock and new code in this PR when Address(rsi, 8) on line 468 has 0x10000000, you will observe that stock code will jump to done and new code will not jump to done. Let me know if I am missing something. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2080592979 From sviswanathan at openjdk.org Fri May 9 00:03:58 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 9 May 2025 00:03:58 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v18] In-Reply-To: <0t720cpyX-RwVGVlm0b9gNbSjeMHWy5cnF-o4xSWRgU=.130e6474-3aa2-48a8-90d1-6f3a69c135ee@github.com> References: <0t720cpyX-RwVGVlm0b9gNbSjeMHWy5cnF-o4xSWRgU=.130e6474-3aa2-48a8-90d1-6f3a69c135ee@github.com> Message-ID: On Thu, 8 May 2025 19:21:31 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Addressing Yudi's comments test/hotspot/jtreg/serviceability/sa/ClhsdbLongConstant.java line 108: > 106: checkLongValue("VM_Version::CPU_SHA ", > 107: longConstantOutput, > 108: 34L); Need to change the comment on line 94 as well. test/lib-test/jdk/test/whitebox/CPUInfoTest.java line 69: > 67: "f16c", "pku", "ospke", "cet_ibt", > 68: "cet_ss", "avx512_ifma", "serialize", "avx_ifma", > 69: "apx_f", "avx10_1", "avx10_2" A minor nit, in between spacing could match previous statement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2080650055 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2080648091 From dlong at openjdk.org Fri May 9 01:01:03 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 9 May 2025 01:01:03 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock In-Reply-To: References: Message-ID: On Thu, 8 May 2025 13:22:55 GMT, Manuel H?ssig wrote: > # Issue Summary > > This PR addresses an `assert(bb->is_reachable())` that is triggered in the code for `-XX:+VerifyStack` after a deoptimization with reason `null_assert_or_unreached0` at a `getstatic` bytecode. Following the `getstatic` is an `areturn` and then an unreachable bytecode. When the code for `VerifyStack` tries to compute an oop map for the basic block of the unreachable bytecode, the assert triggers: > > getstatic Field A.val:"LB"; // if class B is not loaded, C2 deopts with reason "null_assert_or_unreached0" > areturn; > // The following is unreachable > iconst_0; > > > This is a similar problem to [JDK-8271055](https://bugs.openjdk.org/browse/JDK-8271055) (#7331), but this particular deopt with reason `null_assert_or_unreached0` at `getstatic` of a field containing an object reference [deopts at the next bytecode](https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/opto/parse3.cpp#L176-L199). The aforementioned issue introduced a check to skip stack verification of the next bytecode in the code if the execution after the deopted bytecode does not continue at the next bytecode in the code, i.e. falls through to the next bytecode. Unfortunately, this check did not include `areturn` as a bytecode that does not fall-through: > https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/runtime/deoptimization.cpp#L845-L856 > > # Change Summary > > To fix the immediate issue described above, this PR adds `areturn` to the list of bytecodes that does not fall through. However, all return bytecodes exhibit the same behavior and might be susceptible to a similar issue. Even though I was not able to reproduce the same crash with `{d,f,i,l}return` because I could not get those or the preceding bytecode to deopt, I also added them to the `falls_through()` function. For the remaining bytecodes in `falls_through()` with the exception of `athrow` I wrote a regression test. > > # Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14595928439) > - [ ] tier1 through tier3 on Oracle supported platforms and OSs plus Oracle internal testing > > # Acknowledgements > Special thanks to @eme64 for his hard work on reducing a reproducer that works on all platforms. Making falls_through() handle all cases, including lookupswitch and tableswitch seems like the right fix. When I added it originally, I was not aware that C2 could set the bci to the next instruction instead of the current instruction. I think this means almost any instruction could be encountered at the "next next" bci. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25118#issuecomment-2864799977 From duke at openjdk.org Fri May 9 02:45:48 2025 From: duke at openjdk.org (kuaiwei) Date: Fri, 9 May 2025 02:45:48 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function [v4] In-Reply-To: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> Message-ID: <1AMdU-khBdc9AMeh3PxdmDPLAKvNdEggLO0478nxODw=.23a032ef-1081-4e88-b65f-e075023e5905@github.com> > I wrote a test to check if every C2 IR node has correct size_of() function. And I found some of them are missed. They added new fields and not add size_of() to reflect new size. In linux, it does not cause issue so far, because gcc allocate more space for alignment and can keep these additional `bool` flags. But it will report failure on windows. And if anyone modified base class, it will cause problem. > > PS, My test is in https://github.com/openjdk/jdk/compare/master...kuaiwei:jdk:test/check_node_size , but it has many hack on IR nodes to make test to run. kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Remove cmp()/hash() for Opaque node ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25081/files - new: https://git.openjdk.org/jdk/pull/25081/files/9356c51d..03f84eb7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25081&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25081&range=02-03 Stats: 6 lines in 1 file changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25081.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25081/head:pull/25081 PR: https://git.openjdk.org/jdk/pull/25081 From jbhateja at openjdk.org Fri May 9 05:31:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 9 May 2025 05:31:57 GMT Subject: Integrated: 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding In-Reply-To: References: Message-ID: On Mon, 28 Apr 2025 12:28:55 GMT, Jatin Bhateja wrote: > This is a follow-up PR that fixes the crashes seen after the integration of PR #24664 > > ZGC bookkeeps multiple place holders in barrier code snippets through relocations, these are later used to patch appropriate contents (mostly immediate values) in instruction encoding to save costly comparisons against global state [1]. While most of the relocation records the patching offsets from the end of the instruction, SHL/R instructions used for pointer coloring/uncoloring, compute the patching offset from the starting address of the instruction. This was done to prevent accidental sharing of relocation information with subsequent relocatable instructions, e.g., static call. [2] > > In case the destination register operand of SHL/R instruction is an extended GPR register, we miss accounting additional REX2 prefix byte in the patch offset, thereby corrupting the encoding since runtime patches the primary opcode byte, resulting in an ILLEGAL instruction exception. > > This patch fixes reported failures by computing the relocation offset of the SHL/R instruction from the end of the instruction, thereby making the patch offset agnostic to the REX/REX2 prefix. To be safe, we emit a NOP instruction between the SHL/R and the subsequent relocatable instruction. > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://openjdk.org/jeps/439#:~:text=we%20reduce%20this,changes%20phase%3B > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L1873 > > > PS: Validations were performed using the latest Intel Software Development Emulator after modifying the static register allocation order in x86_64.ad file giving preference to EGPRs. This pull request has now been integrated. Changeset: 53ad4b2a Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/53ad4b2ad2664e5056c113543dfaa26647d6ce26 Stats: 7 lines in 1 file changed: 6 ins; 0 del; 1 mod 8355364: [REDO] Missing REX2 prefix accounting in ZGC barriers leads to incorrect encoding Co-authored-by: Axel Boldt-Christmas Reviewed-by: aboldtch, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/24919 From duke at openjdk.org Fri May 9 06:07:52 2025 From: duke at openjdk.org (kuaiwei) Date: Fri, 9 May 2025 06:07:52 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function [v2] In-Reply-To: References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> Message-ID: On Thu, 8 May 2025 12:16:45 GMT, Tobias Hartmann wrote: >> I checked some nodes. `cmp/hash` are not always updated for new fields. It looks "nice to have". I have added `cmp/hash` for ?EncodeISOArrayNode/ClearArrayNode/OpaqueMultiversioningNode" . > > Thanks for doing that but it's not only nice to have, right? GVN might otherwise incorrectly common two different nodes. I found `OpaqueNode` can not have `cmp` and `hash`, it will fail some tests. I think these nodes can not be optimized by GVN. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25081#discussion_r2080993730 From thartmann at openjdk.org Fri May 9 06:12:57 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 9 May 2025 06:12:57 GMT Subject: RFR: 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list [v6] In-Reply-To: References: Message-ID: On Thu, 8 May 2025 12:25:11 GMT, Saranya Natarajan wrote: >> Issue: The assertion failure , `assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list`, occurs when [loop striping mining ](https://bugs.openjdk.org/browse/JDK-8186027)may create a [MaxL](https://bugs.openjdk.org/browse/JDK-8324655) after macro expansion. >> >> Analysis : Before the macro nodes are expanded in` expand_macro_nodes`, there is a process where nodes from the macro list are eliminated. This also includes elimination of any `OuterStripMinedLoop` node in the macro list. The bug occurs due to the refining of the strip mined loop in `adjust_strip_mined_loop` function just before it is eliminated. In this case, a` MaxL` node is added to the macro list in `adjust_strip_mined_loop`. >> >> Fix: The fix involves performing the refining of the strip mined loop before elimination process. More specifically, moving the `adjust_strip_mined_loop` function outside the elimination loop. >> >> Improvement: The process of eliminating macro nodes by calling `eliminate_macro_nodes` and performing additional Opaque and LoopLimit nodes elimination in ` expand_macro_nodes` is unintuitive as suggested in [JDK-8325478 ](https://bugs.openjdk.org/browse/JDK-8325478) and the current fix should be moved along with the other elimination code. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > removing header and modifying method name Looks good, thanks! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24890#pullrequestreview-2827126929 From chagedorn at openjdk.org Fri May 9 06:43:52 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 9 May 2025 06:43:52 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function [v2] In-Reply-To: References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> Message-ID: On Fri, 9 May 2025 06:05:31 GMT, kuaiwei wrote: >> Thanks for doing that but it's not only nice to have, right? GVN might otherwise incorrectly common two different nodes. > > I found `OpaqueNode` can not have `cmp` and `hash`, it will fail some tests. I think these nodes can not be optimized by GVN. `OpaqueMultiversioningNode` inherits from `Opaque1` which defines `NO_HASH`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25081#discussion_r2081034451 From duke at openjdk.org Fri May 9 07:16:02 2025 From: duke at openjdk.org (Anjian-Wen) Date: Fri, 9 May 2025 07:16:02 GMT Subject: RFR: 8356593: RISC-V: Small improvement to array fill stub Message-ID: When working on [JDK-8351140](https://bugs.openjdk.org/browse/JDK-8351140), I witnessed possible misaligned memory access in array fill stub. We fill by element for short arrays (< 8 bytes), which assumes a heapword alignment[1]. But that is not guaranteed. This issue could be reproduced by running: `make test TEST="micro:vm.compiler.ArrayFill"` with `@Param("5") private int size;` on riscv platforms with slow misalgned memory accesses. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L2141 This fixes this issue by using a small loop to fill the elements for short arrays. Tier1-2 tested on linux-riscv64 platform. JMH result on P550 SBC for reference: 1. (`@Param("5") private int size;`): Before: Benchmark (size) Mode Cnt Score Error Units ArrayFill.fillByteArray 5 avgt 12 558.781 ? 1.396 ns/op ArrayFill.fillIntArray 5 avgt 12 29.346 ? 0.003 ns/op ArrayFill.fillShortArray 5 avgt 12 30.779 ? 0.004 ns/op ArrayFill.zeroByteArray 5 avgt 12 559.249 ? 1.909 ns/op ArrayFill.zeroIntArray 5 avgt 12 29.346 ? 0.002 ns/op ArrayFill.zeroShortArray 5 avgt 12 30.777 ? 0.006 ns/op After: Benchmark (size) Mode Cnt Score Error Units ArrayFill.fillByteArray 5 avgt 12 23.977 ? 0.004 ns/op ArrayFill.fillIntArray 5 avgt 12 29.343 ? 0.004 ns/op ArrayFill.fillShortArray 5 avgt 12 30.776 ? 0.005 ns/op ArrayFill.zeroByteArray 5 avgt 12 23.977 ? 0.002 ns/op ArrayFill.zeroIntArray 5 avgt 12 29.345 ? 0.005 ns/op ArrayFill.zeroShortArray 5 avgt 12 30.776 ? 0.004 ns/op 2. (`@Param("3") private int size;`): Before: Benchmark (size) Mode Cnt Score Error Units ArrayFill.fillByteArray 3 avgt 12 428.923 ? 0.409 ns/op ArrayFill.fillIntArray 3 avgt 12 28.629 ? 0.005 ns/op ArrayFill.fillShortArray 3 avgt 12 558.872 ? 2.641 ns/op ArrayFill.zeroByteArray 3 avgt 12 429.744 ? 2.049 ns/op ArrayFill.zeroIntArray 3 avgt 12 28.628 ? 0.002 ns/op ArrayFill.zeroShortArray 3 avgt 12 557.682 ? 1.661 ns/op After: Benchmark (size) Mode Cnt Score Error Units ArrayFill.fillByteArray 3 avgt 12 21.471 ? 0.002 ns/op ArrayFill.fillIntArray 3 avgt 12 28.631 ? 0.003 ns/op ArrayFill.fillShortArray 3 avgt 12 20.436 ? 0.288 ns/op ArrayFill.zeroByteArray 3 avgt 12 21.472 ? 0.002 ns/op ArrayFill.zeroIntArray 3 avgt 12 28.633 ? 0.003 ns/op ArrayFill.zeroShortArray 3 avgt 12 20.203 ? 0.137 ns/op ------------- Commit messages: - RISC-V: Small improvement to array fill stub Changes: https://git.openjdk.org/jdk/pull/25135/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25135&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356593 Stats: 63 lines in 1 file changed: 37 ins; 21 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25135.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25135/head:pull/25135 PR: https://git.openjdk.org/jdk/pull/25135 From xgong at openjdk.org Fri May 9 07:44:27 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 9 May 2025 07:44:27 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API Message-ID: JDK-8318650 introduced hotspot intrinsification of subword gather load APIs for X86 platforms [1]. However, the current implementation is not optimal for AArch64 SVE platform, which natively supports vector instructions for subword gather load operations using an int vector for indices (see [2][3]). Two key areas require improvement: 1. At the Java level, vector indices generated for range validation could be reused for the subsequent gather load operation on architectures with native vector instructions like AArch64 SVE. However, the current implementation prevents compiler reuse of these index vectors due to divergent control flow, potentially impacting performance. 2. At the compiler IR level, the additional `offset` input for `LoadVectorGather`/`LoadVectorGatherMasked` with subword types increases IR complexity and complicates backend implementation. Furthermore, generating `add` instructions before each memory access negatively impacts performance. This patch refactors the implementation at both the Java level and compiler mid-end to improve efficiency and maintainability across different architectures. Main changes: 1. Java-side API refactoring: - Explicitly passes generated index vectors to hotspot, eliminating duplicate index vectors for gather load instructions on architectures like AArch64. 2. C2 compiler IR refactoring: - Refactors `LoadVectorGather`/`LoadVectorGatherMasked` IR for subword types by removing the memory offset input and incorporating it into the memory base `addr` at the IR level. This simplifies backend implementation, reduces add operations, and unifies the IR across all types. 3. Backend changes: - Streamlines X86 implementation of subword gather operations following the removal of the offset input from the IR level. Performance: The performance of the relative JMH improves up to 27% on a X86 AVX512 system. Please see the data below: Benchmark Mode Cnt Unit SIZE Before After Gain GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 64 53682.012 52650.325 0.98 GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 256 14484.252 14255.156 0.98 GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 1024 3664.900 3595.615 0.98 GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 4096 908.312 935.269 1.02 GatherOperationsBenchmark.microByteGather128_MASK thrpt 30 ops/ms 64 43040.148 44605.580 1.03 GatherOperationsBenchmark.microByteGather128_MASK thrpt 30 ops/ms 256 12445.650 12928.102 1.03 GatherOperationsBenchmark.microByteGather128_MASK thrpt 30 ops/ms 1024 3143.728 3294.173 1.04 GatherOperationsBenchmark.microByteGather128_MASK thrpt 30 ops/ms 4096 801.516 842.951 1.05 GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF thrpt 30 ops/ms 64 40379.343 45255.490 1.12 GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF thrpt 30 ops/ms 256 11103.537 12971.581 1.16 GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF thrpt 30 ops/ms 1024 2767.870 3299.453 1.19 GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF thrpt 30 ops/ms 4096 704.610 840.908 1.19 GatherOperationsBenchmark.microByteGather128_NZ_OFF thrpt 30 ops/ms 64 49066.340 53365.591 1.08 GatherOperationsBenchmark.microByteGather128_NZ_OFF thrpt 30 ops/ms 256 14063.326 14286.067 1.01 GatherOperationsBenchmark.microByteGather128_NZ_OFF thrpt 30 ops/ms 1024 3617.992 3621.272 1.00 GatherOperationsBenchmark.microByteGather128_NZ_OFF thrpt 30 ops/ms 4096 861.026 938.055 1.08 GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms 64 55844.814 48311.847 0.86 GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms 256 15139.459 13009.848 0.85 GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms 1024 3861.834 3284.944 0.85 GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms 4096 938.665 817.673 0.87 GatherOperationsBenchmark.microByteGather256_MASK thrpt 30 ops/ms 64 43942.924 43144.065 0.98 GatherOperationsBenchmark.microByteGather256_MASK thrpt 30 ops/ms 256 12461.170 11580.981 0.92 GatherOperationsBenchmark.microByteGather256_MASK thrpt 30 ops/ms 1024 3168.598 2945.698 0.92 GatherOperationsBenchmark.microByteGather256_MASK thrpt 30 ops/ms 4096 803.515 738.049 0.91 GatherOperationsBenchmark.microByteGather256_MASK_NZ_OFF thrpt 30 ops/ms 64 42197.440 43209.913 1.02 GatherOperationsBenchmark.microByteGather256_MASK_NZ_OFF thrpt 30 ops/ms 256 11456.761 11713.265 1.02 GatherOperationsBenchmark.microByteGather256_MASK_NZ_OFF thrpt 30 ops/ms 1024 2732.576 2949.724 1.07 GatherOperationsBenchmark.microByteGather256_MASK_NZ_OFF thrpt 30 ops/ms 4096 726.062 744.774 1.02 GatherOperationsBenchmark.microByteGather256_NZ_OFF thrpt 30 ops/ms 64 52915.781 49520.027 0.93 GatherOperationsBenchmark.microByteGather256_NZ_OFF thrpt 30 ops/ms 256 14481.921 13496.835 0.93 GatherOperationsBenchmark.microByteGather256_NZ_OFF thrpt 30 ops/ms 1024 3632.065 3362.372 0.92 GatherOperationsBenchmark.microByteGather256_NZ_OFF thrpt 30 ops/ms 4096 892.825 845.809 0.94 GatherOperationsBenchmark.microByteGather512 thrpt 30 ops/ms 64 54528.404 54478.751 0.99 GatherOperationsBenchmark.microByteGather512 thrpt 30 ops/ms 256 15018.181 14673.727 0.97 GatherOperationsBenchmark.microByteGather512 thrpt 30 ops/ms 1024 3824.690 3589.530 0.93 GatherOperationsBenchmark.microByteGather512 thrpt 30 ops/ms 4096 923.601 906.245 0.98 GatherOperationsBenchmark.microByteGather512_MASK thrpt 30 ops/ms 64 41248.192 42201.455 1.02 GatherOperationsBenchmark.microByteGather512_MASK thrpt 30 ops/ms 256 11481.408 11559.655 1.00 GatherOperationsBenchmark.microByteGather512_MASK thrpt 30 ops/ms 1024 2901.592 2912.954 1.00 GatherOperationsBenchmark.microByteGather512_MASK thrpt 30 ops/ms 4096 732.899 730.381 0.99 GatherOperationsBenchmark.microByteGather512_MASK_NZ_OFF thrpt 30 ops/ms 64 42287.123 43779.227 1.03 GatherOperationsBenchmark.microByteGather512_MASK_NZ_OFF thrpt 30 ops/ms 256 11486.167 11448.966 0.99 GatherOperationsBenchmark.microByteGather512_MASK_NZ_OFF thrpt 30 ops/ms 1024 2888.047 2928.612 1.01 GatherOperationsBenchmark.microByteGather512_MASK_NZ_OFF thrpt 30 ops/ms 4096 731.056 738.300 1.00 GatherOperationsBenchmark.microByteGather512_NZ_OFF thrpt 30 ops/ms 64 51777.670 54368.797 1.05 GatherOperationsBenchmark.microByteGather512_NZ_OFF thrpt 30 ops/ms 256 14558.532 14662.164 1.00 GatherOperationsBenchmark.microByteGather512_NZ_OFF thrpt 30 ops/ms 1024 3726.910 3714.448 0.99 GatherOperationsBenchmark.microByteGather512_NZ_OFF thrpt 30 ops/ms 4096 907.863 903.544 0.99 GatherOperationsBenchmark.microByteGather64 thrpt 30 ops/ms 64 52980.507 54970.689 1.03 GatherOperationsBenchmark.microByteGather64 thrpt 30 ops/ms 256 15044.443 15828.237 1.05 GatherOperationsBenchmark.microByteGather64 thrpt 30 ops/ms 1024 3869.028 4098.172 1.05 GatherOperationsBenchmark.microByteGather64 thrpt 30 ops/ms 4096 912.372 1002.065 1.09 GatherOperationsBenchmark.microByteGather64_MASK thrpt 30 ops/ms 64 44267.641 45864.381 1.03 GatherOperationsBenchmark.microByteGather64_MASK thrpt 30 ops/ms 256 12303.206 12920.113 1.05 GatherOperationsBenchmark.microByteGather64_MASK thrpt 30 ops/ms 1024 3100.867 3115.636 1.00 GatherOperationsBenchmark.microByteGather64_MASK thrpt 30 ops/ms 4096 792.004 832.623 1.05 GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF thrpt 30 ops/ms 64 40417.638 45844.634 1.13 GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF thrpt 30 ops/ms 256 11628.508 12913.170 1.11 GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF thrpt 30 ops/ms 1024 2911.508 3260.388 1.11 GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF thrpt 30 ops/ms 4096 709.017 835.084 1.17 GatherOperationsBenchmark.microByteGather64_NZ_OFF thrpt 30 ops/ms 64 48868.987 53585.210 1.09 GatherOperationsBenchmark.microByteGather64_NZ_OFF thrpt 30 ops/ms 256 13617.963 15754.029 1.15 GatherOperationsBenchmark.microByteGather64_NZ_OFF thrpt 30 ops/ms 1024 3504.745 3857.926 1.10 GatherOperationsBenchmark.microByteGather64_NZ_OFF thrpt 30 ops/ms 4096 818.439 958.751 1.17 GatherOperationsBenchmark.microShortGather128 thrpt 30 ops/ms 64 41351.719 44337.947 1.07 GatherOperationsBenchmark.microShortGather128 thrpt 30 ops/ms 256 11175.501 12302.557 1.10 GatherOperationsBenchmark.microShortGather128 thrpt 30 ops/ms 1024 2854.546 3158.973 1.10 GatherOperationsBenchmark.microShortGather128 thrpt 30 ops/ms 4096 744.816 790.304 1.06 GatherOperationsBenchmark.microShortGather128_MASK thrpt 30 ops/ms 64 35012.934 35728.068 1.02 GatherOperationsBenchmark.microShortGather128_MASK thrpt 30 ops/ms 256 9408.162 9854.849 1.04 GatherOperationsBenchmark.microShortGather128_MASK thrpt 30 ops/ms 1024 2352.723 2489.161 1.05 GatherOperationsBenchmark.microShortGather128_MASK thrpt 30 ops/ms 4096 595.827 634.225 1.06 GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF thrpt 30 ops/ms 64 31405.646 35728.077 1.13 GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF thrpt 30 ops/ms 256 8459.702 9865.482 1.16 GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF thrpt 30 ops/ms 1024 2095.461 2489.927 1.18 GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF thrpt 30 ops/ms 4096 535.715 631.614 1.17 GatherOperationsBenchmark.microShortGather128_NZ_OFF thrpt 30 ops/ms 64 39996.604 43811.259 1.09 GatherOperationsBenchmark.microShortGather128_NZ_OFF thrpt 30 ops/ms 256 11058.636 12261.463 1.10 GatherOperationsBenchmark.microShortGather128_NZ_OFF thrpt 30 ops/ms 1024 2847.482 3157.450 1.10 GatherOperationsBenchmark.microShortGather128_NZ_OFF thrpt 30 ops/ms 4096 712.089 790.143 1.10 GatherOperationsBenchmark.microShortGather256 thrpt 30 ops/ms 64 51893.730 51975.295 1.00 GatherOperationsBenchmark.microShortGather256 thrpt 30 ops/ms 256 14226.104 14720.390 1.03 GatherOperationsBenchmark.microShortGather256 thrpt 30 ops/ms 1024 3491.958 3714.266 1.06 GatherOperationsBenchmark.microShortGather256 thrpt 30 ops/ms 4096 852.278 905.330 1.06 GatherOperationsBenchmark.microShortGather256_MASK thrpt 30 ops/ms 64 38736.351 41797.516 1.07 GatherOperationsBenchmark.microShortGather256_MASK thrpt 30 ops/ms 256 10250.508 11790.235 1.15 GatherOperationsBenchmark.microShortGather256_MASK thrpt 30 ops/ms 1024 2558.449 2956.936 1.15 GatherOperationsBenchmark.microShortGather256_MASK thrpt 30 ops/ms 4096 648.882 745.885 1.14 GatherOperationsBenchmark.microShortGather256_MASK_NZ_OFF thrpt 30 ops/ms 64 38315.594 39547.847 1.03 GatherOperationsBenchmark.microShortGather256_MASK_NZ_OFF thrpt 30 ops/ms 256 10471.955 11779.499 1.12 GatherOperationsBenchmark.microShortGather256_MASK_NZ_OFF thrpt 30 ops/ms 1024 2618.623 2679.970 1.02 GatherOperationsBenchmark.microShortGather256_MASK_NZ_OFF thrpt 30 ops/ms 4096 655.803 760.392 1.15 GatherOperationsBenchmark.microShortGather256_NZ_OFF thrpt 30 ops/ms 64 47674.080 51325.185 1.07 GatherOperationsBenchmark.microShortGather256_NZ_OFF thrpt 30 ops/ms 256 13446.700 14438.516 1.07 GatherOperationsBenchmark.microShortGather256_NZ_OFF thrpt 30 ops/ms 1024 3371.433 3664.720 1.08 GatherOperationsBenchmark.microShortGather256_NZ_OFF thrpt 30 ops/ms 4096 814.540 895.182 1.09 GatherOperationsBenchmark.microShortGather512 thrpt 30 ops/ms 64 48183.553 48374.790 1.01 GatherOperationsBenchmark.microShortGather512 thrpt 30 ops/ms 256 13669.806 12940.433 0.94 GatherOperationsBenchmark.microShortGather512 thrpt 30 ops/ms 1024 3371.708 3318.627 0.98 GatherOperationsBenchmark.microShortGather512 thrpt 30 ops/ms 4096 847.620 805.313 0.95 GatherOperationsBenchmark.microShortGather512_MASK thrpt 30 ops/ms 64 39566.443 42845.296 1.08 GatherOperationsBenchmark.microShortGather512_MASK thrpt 30 ops/ms 256 11926.440 10308.223 0.86 GatherOperationsBenchmark.microShortGather512_MASK thrpt 30 ops/ms 1024 3008.542 2546.197 0.84 GatherOperationsBenchmark.microShortGather512_MASK thrpt 30 ops/ms 4096 764.497 647.276 0.84 GatherOperationsBenchmark.microShortGather512_MASK_NZ_OFF thrpt 30 ops/ms 64 38106.800 42835.120 1.12 GatherOperationsBenchmark.microShortGather512_MASK_NZ_OFF thrpt 30 ops/ms 256 10405.171 11125.164 1.06 GatherOperationsBenchmark.microShortGather512_MASK_NZ_OFF thrpt 30 ops/ms 1024 2526.827 2799.209 1.10 GatherOperationsBenchmark.microShortGather512_MASK_NZ_OFF thrpt 30 ops/ms 4096 655.044 715.519 1.09 GatherOperationsBenchmark.microShortGather512_NZ_OFF thrpt 30 ops/ms 64 48108.682 46654.427 0.96 GatherOperationsBenchmark.microShortGather512_NZ_OFF thrpt 30 ops/ms 256 13197.197 12957.497 0.98 GatherOperationsBenchmark.microShortGather512_NZ_OFF thrpt 30 ops/ms 1024 3397.959 3244.415 0.95 GatherOperationsBenchmark.microShortGather512_NZ_OFF thrpt 30 ops/ms 4096 824.034 820.536 0.99 GatherOperationsBenchmark.microShortGather64 thrpt 30 ops/ms 64 44815.622 46913.289 1.04 GatherOperationsBenchmark.microShortGather64 thrpt 30 ops/ms 256 12317.166 13536.731 1.09 GatherOperationsBenchmark.microShortGather64 thrpt 30 ops/ms 1024 3157.683 3539.991 1.12 GatherOperationsBenchmark.microShortGather64 thrpt 30 ops/ms 4096 775.626 878.304 1.13 GatherOperationsBenchmark.microShortGather64_MASK thrpt 30 ops/ms 64 37064.157 35649.776 0.96 GatherOperationsBenchmark.microShortGather64_MASK thrpt 30 ops/ms 256 10120.291 9403.1319 0.92 GatherOperationsBenchmark.microShortGather64_MASK thrpt 30 ops/ms 1024 2546.723 2642.781 1.03 GatherOperationsBenchmark.microShortGather64_MASK thrpt 30 ops/ms 4096 644.270 648.432 1.00 GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF thrpt 30 ops/ms 64 34386.819 37883.550 1.10 GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF thrpt 30 ops/ms 256 9316.097 10500.473 1.12 GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF thrpt 30 ops/ms 1024 2344.570 2643.114 1.12 GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF thrpt 30 ops/ms 4096 594.445 595.301 1.00 GatherOperationsBenchmark.microShortGather64_NZ_OFF thrpt 30 ops/ms 64 40240.772 48435.477 1.20 GatherOperationsBenchmark.microShortGather64_NZ_OFF thrpt 30 ops/ms 256 11082.392 13736.985 1.23 GatherOperationsBenchmark.microShortGather64_NZ_OFF thrpt 30 ops/ms 1024 2777.065 3549.704 1.27 GatherOperationsBenchmark.microShortGather64_NZ_OFF thrpt 30 ops/ms 4096 697.671 877.411 1.25 Note that this patch is splitted from https://github.com/openjdk/jdk/pull/24679. A follow-up PR will implement the SVE subword gather load operations after this PR is merged. [1] https://bugs.openjdk.org/browse/JDK-8318650 [2] https://developer.arm.com/documentation/ddi0602/2024-12/SVE-Instructions/LD1B--scalar-plus-vector---Gather-load-unsigned-bytes-to-vector--vector-index--?lang=en [3] https://developer.arm.com/documentation/ddi0602/2024-12/SVE-Instructions/LD1H--scalar-plus-vector---Gather-load-unsigned-halfwords-to-vector--vector-index--?lang=en ------------- Commit messages: - 8355563: VectorAPI: Refactor current implementation of subword gather load API Changes: https://git.openjdk.org/jdk/pull/25138/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25138&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355563 Stats: 441 lines in 15 files changed: 105 ins; 176 del; 160 mod Patch: https://git.openjdk.org/jdk/pull/25138.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25138/head:pull/25138 PR: https://git.openjdk.org/jdk/pull/25138 From xgong at openjdk.org Fri May 9 07:44:27 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 9 May 2025 07:44:27 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API In-Reply-To: References: Message-ID: On Fri, 9 May 2025 07:35:41 GMT, Xiaohong Gong wrote: > JDK-8318650 introduced hotspot intrinsification of subword gather load APIs for X86 platforms [1]. However, the current implementation is not optimal for AArch64 SVE platform, which natively supports vector instructions for subword gather load operations using an int vector for indices (see [2][3]). > > Two key areas require improvement: > 1. At the Java level, vector indices generated for range validation could be reused for the subsequent gather load operation on architectures with native vector instructions like AArch64 SVE. However, the current implementation prevents compiler reuse of these index vectors due to divergent control flow, potentially impacting performance. > 2. At the compiler IR level, the additional `offset` input for `LoadVectorGather`/`LoadVectorGatherMasked` with subword types increases IR complexity and complicates backend implementation. Furthermore, generating `add` instructions before each memory access negatively impacts performance. > > This patch refactors the implementation at both the Java level and compiler mid-end to improve efficiency and maintainability across different architectures. > > Main changes: > 1. Java-side API refactoring: > - Explicitly passes generated index vectors to hotspot, eliminating duplicate index vectors for gather load instructions on > architectures like AArch64. > 2. C2 compiler IR refactoring: > - Refactors `LoadVectorGather`/`LoadVectorGatherMasked` IR for subword types by removing the memory offset input and incorporating it into the memory base `addr` at the IR level. This simplifies backend implementation, reduces add operations, and unifies the IR across all types. > 3. Backend changes: > - Streamlines X86 implementation of subword gather operations following the removal of the offset input from the IR level. > > Performance: > The performance of the relative JMH improves up to 27% on a X86 AVX512 system. Please see the data below: > > Benchmark Mode Cnt Unit SIZE Before After Gain > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 64 53682.012 52650.325 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 256 14484.252 14255.156 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 1024 3664.900 3595.615 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 4096 908.312 935.269 1.02 > GatherOperationsBenchmark.micr... Hi @eme64 , could you please help take a look at this PR, which is a part of https://github.com/openjdk/jdk/pull/24679 ? Thanks a lot in advance! Hi @jatin-bhateja , could you please kindly review this PR, especially the X86 codegen part? Thanks a lot in advance! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-2865493287 PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-2865495716 From yzheng at openjdk.org Fri May 9 08:42:06 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Fri, 9 May 2025 08:42:06 GMT Subject: RFR: 8353735: [JVMCI] Allow specifying storage kind of the callee save register [v3] In-Reply-To: <_8_bdUwiZc5xZqStJm2XfneFUTdCEx4c_uDsKJcMkTc=.1df612b0-30c8-4ae3-8706-bd634dd9fbc4@github.com> References: <_8_bdUwiZc5xZqStJm2XfneFUTdCEx4c_uDsKJcMkTc=.1df612b0-30c8-4ae3-8706-bd634dd9fbc4@github.com> Message-ID: On Thu, 8 May 2025 14:57:10 GMT, Yudi Zheng wrote: >> Windows x64 ABI considers the upper portions of YMM0-YMM15 and ZMM0-ZMM15 volatile, that is, destroyed on function calls. This PR allows `RegisterConfig` implementations to refine the storage kind of callee save register, such that JVMCI compiler can exploit this information to avoid saving full width of these registers. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Update javadoc Tier1-3 passed. Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24451#issuecomment-2865675256 From yzheng at openjdk.org Fri May 9 08:42:07 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Fri, 9 May 2025 08:42:07 GMT Subject: Integrated: 8353735: [JVMCI] Allow specifying storage kind of the callee save register In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 14:47:39 GMT, Yudi Zheng wrote: > Windows x64 ABI considers the upper portions of YMM0-YMM15 and ZMM0-ZMM15 volatile, that is, destroyed on function calls. This PR allows `RegisterConfig` implementations to refine the storage kind of callee save register, such that JVMCI compiler can exploit this information to avoid saving full width of these registers. This pull request has now been integrated. Changeset: 74e981e8 Author: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/74e981e85509ca072b2a45d529dab3a9883613a2 Stats: 11 lines in 1 file changed: 10 ins; 0 del; 1 mod 8353735: [JVMCI] Allow specifying storage kind of the callee save register Reviewed-by: dnsimon, cslucas ------------- PR: https://git.openjdk.org/jdk/pull/24451 From rcastanedalo at openjdk.org Fri May 9 08:56:53 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 9 May 2025 08:56:53 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v2] In-Reply-To: References: <95YBckz3m_3L4DtOY38G7BjOFvljWoqGRqV3EIJi2-8=.f06b86d4-6e2f-4cb6-b5e2-382c7831b4d3@github.com> Message-ID: <_aHKGu0ncIbByHUDzqW4xfhPBtN6f2XKy_tJLW0KM30=.a341f11d-e29b-4186-b31d-b33d1e4e2ebc@github.com> On Thu, 8 May 2025 15:16:28 GMT, Daniel Lund?n wrote: > Now simplified, does it look better? Yes, thanks! >> src/hotspot/share/opto/gcm.cpp line 779: >> >>> 777: ResourceArea* area = Thread::current()->resource_area(); >>> 778: >>> 779: // Bookkeeping of possibly anti-dependent stores that we find outside of the >> >> Suggestion: >> >> // Bookkeeping of possibly anti-dependent stores that we find below the > > Technically, "outside of" is more appropriate here, because the stores that we bookkeep are not necessarily dominated by ("below") early. Since the search starts from `initial_mem`, which can be in a much earlier block than early, stores that we bookkeep can also be above early, or on completely distinct control-flow paths that do not even go through early. But, you are correct that only stores below early matter in the end. Thanks for the clarification. I think it would be good to add this note to the comment ("Note that stores in non_early_stores are not necessarily dominated by early. Since the search starts from ..." ). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2081240077 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2081235028 From roland at openjdk.org Fri May 9 09:00:36 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 9 May 2025 09:00:36 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v17] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/loopnode.hpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21630/files - new: https://git.openjdk.org/jdk/pull/21630/files/ed774a56..b56a2649 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=15-16 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From rcastanedalo at openjdk.org Fri May 9 09:05:53 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 9 May 2025 09:05:53 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v4] In-Reply-To: References: Message-ID: On Thu, 8 May 2025 15:31:42 GMT, Daniel Lund?n wrote: >> The current documentation for `PhaseCFG::insert_anti_dependences` is difficult to follow and sometimes even misleading. We should ensure the method is appropriately documented. >> >> ### Changeset >> >> - Rename `PhaseCFG::insert_anti_dependences` to `PhaseCFG::raise_above_anti_dependences`. The purpose of `PhaseCFG::raise_above_anti_dependences` is twofold: raise the load's LCA so that the load is scheduled before anti-dependent stores, and if necessary add anti-dependence edges between the load and certain anti-dependent stores (to ensure we later "raise" the load before anti-dependent stores in LCM). The name `PhaseCFG::insert_anti_dependences` suggests that we only add anti-dependence edges. The name `PhaseCFG::raise_above_anti_dependences`, therefore, seems more appropriate. >> - Significantly add to and revise the source code documentation of `PhaseCFG::raise_above_anti_dependences`. >> - Add, move, and revise `assert`s in `PhaseCFG::raise_above_anti_dependences`, including improved `assert` messages in a few places. >> - In the main worklist loop of `PhaseCFG::raise_above_anti_dependences`: >> - Clean up how we identify the search root (avoid mutation). >> - Add a missing early exit for `Phi` nodes when `LCA == early`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14706896111) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Rename must_raise_LCA to must_raise_LCA_above_marks Thanks for addressing my comments, looks good! I just have an (optional) suggestion (https://github.com/openjdk/jdk/pull/24926/files#r2081235028). ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24926#pullrequestreview-2827607995 From roland at openjdk.org Fri May 9 09:08:41 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 9 May 2025 09:08:41 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v18] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/castnode.hpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/utilities/globalDefinitions.hpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21630/files - new: https://git.openjdk.org/jdk/pull/21630/files/b56a2649..9798bc97 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=16-17 Stats: 5 lines in 3 files changed: 0 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From roland at openjdk.org Fri May 9 09:13:25 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 9 May 2025 09:13:25 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v19] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21630/files - new: https://git.openjdk.org/jdk/pull/21630/files/9798bc97..1289e211 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=17-18 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From duke at openjdk.org Fri May 9 09:40:54 2025 From: duke at openjdk.org (erifan) Date: Fri, 9 May 2025 09:40:54 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 02:10:56 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Refactor code > > Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this > optimization, making the code more modular. > - Merge branch 'master' into JDK-8354242 > - Update the jtreg test > - Merge branch 'master' into JDK-8354242 > - Addressed some review comments > > 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. > 2. Improve code comments. > - Merge branch 'master' into JDK-8354242 > - Merge branch 'master' into JDK-8354242 > - 8354242: VectorAPI: combine vector not operation with compare > > This patch optimizes the following patterns: > For integer types: > ``` > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > ``` > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > ``` > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > ``` > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: > With option `-XX:UseSVE=2`: > ``` > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4... I'll update the code next week, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2865870304 From duke at openjdk.org Fri May 9 09:40:55 2025 From: duke at openjdk.org (erifan) Date: Fri, 9 May 2025 09:40:55 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5] In-Reply-To: References: Message-ID: On Thu, 8 May 2025 01:49:45 GMT, Xiaohong Gong wrote: >> Yes, that's the right approach. For this PR, I think you can mix some test points covering compare, xor(maskAll(true)). > > Yes, converting `VectorMask.fromLong(SPECIES, -1L)` to `MaskAll()` would be better, and that will benefit AArch64 as well, since `MaskAll()` is much more cheaper than `fromLong()` on AArch64. We can add such a transformation with another PR. Ok, I'll extend the test to xor(maskAll(true) in the next commit, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2081313613 From amitkumar at openjdk.org Fri May 9 10:46:06 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 9 May 2025 10:46:06 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: <60EKzMgxd8YXL1DaKBnEYxGpf-WYhiqOFTzqCvJcYzk=.b923d1ee-8698-44e1-9876-afa708d05cd2@github.com> References: <60EKzMgxd8YXL1DaKBnEYxGpf-WYhiqOFTzqCvJcYzk=.b923d1ee-8698-44e1-9876-afa708d05cd2@github.com> Message-ID: On Thu, 8 May 2025 09:46:14 GMT, Martin Doerr wrote: > > Hi Martin, So what will be next step here ? Should I put this question in community mailing list ? > > You can try. `java.nio` and FFM API may have some requirements and expectations for such `Unsafe` operations. I couldn't see any failure @TheRealMDoerr ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2866076757 From hgreule at openjdk.org Fri May 9 10:49:56 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 9 May 2025 10:49:56 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v4] In-Reply-To: References: Message-ID: On Thu, 17 Apr 2025 09:22:42 GMT, Emanuel Peter wrote: >> Looks good. > > @iwanowww I see you did some internal testing, but not for what version. Should we re-run testing? Thanks for testing again @eme64. Are the results in? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24382#issuecomment-2866086310 From shade at openjdk.org Fri May 9 11:23:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 9 May 2025 11:23:55 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: <7z9_pstIUOpdc3pzP49bmS4itCp75RlnKFuQ6-HQzWE=.082f8aaf-3134-4489-a8ad-71754338f8cb@github.com> References: <7z9_pstIUOpdc3pzP49bmS4itCp75RlnKFuQ6-HQzWE=.082f8aaf-3134-4489-a8ad-71754338f8cb@github.com> Message-ID: <_8y_DYl9Q4P1scTtA_J8ilWw_GP0kdSL37bAmYb4dEM=.ea34a76f-0236-459f-b99c-a8d6129c3a67@github.com> On Thu, 8 May 2025 14:29:56 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/compiler/compileBroker.cpp line 1697: >> >>> 1695: JavaThread* thread = JavaThread::current(); >>> 1696: >>> 1697: methodHandle method(thread, task->method()); >> >> I think this is safe because the Method* is in the CompileTask and redefinition will find it there. Being unsure of this is why this is here in a handle. > > Ah, that reminds me, thanks. > > I removed this because I caught method to be in unsafe (unloaded) state, so `method()` asserted on me. `compiler/c1/TestConcurrentPatching.java` seems to intermittently crash on it. On this code path, I think we might be plausibly waiting on unloaded compile task, and we "only" wait for notification that task got purged from the queue. Handelizing broken `Method*` is awkward, to say the least! > > Then again, I am not sure if removing this handle is safe enough. So out of abundance of caution, we can actually handelize `Method*` after checking for task status. But now that I do this: > > > methodHandle method(thread, task->is_unloaded() ? nullptr : task->method()); > > > ...the test still fails on the same assert! Which makes no sense to me, as we are supposed to be guarded by `is_unloaded` check before it. Something is off, I'll investigate. I understand now. There are TOCTOU-s under concurrent `block_unloading`. The most egregious one is here: `is_unloaded` checks in two steps: `!_weak_handle.is_empty() && _weak_handle.peek() == nullptr;`. So when `block_unloading` comes in concurrently and resets weak to empty (since we have strong handle now), it might be possible that first predicate is still `true`, but evaluation of second predicate calls `peek` on empty `_weak_handle`, oops. We could technically claim that `UnloadableMethodHandle` is not thread-safe, but it does not solve current compiler uses, and it is very unsatisfactory for the utility class. I'll look into ways to make it resilient under concurrent updates. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24018#discussion_r2081467353 From roland at openjdk.org Fri May 9 12:12:59 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 9 May 2025 12:12:59 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v16] In-Reply-To: References: Message-ID: On Thu, 8 May 2025 10:36:33 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 43 commits: >> >> - review >> - Merge branch 'master' into JDK-8342692 >> - merge fix >> - Merge branch 'master' into JDK-8342692 >> - merge fix >> - Merge branch 'master' into JDK-8342692 >> - merge >> - Merge branch 'master' into JDK-8342692 >> - Merge branch 'master' into JDK-8342692 >> - whitespace >> - ... and 33 more: https://git.openjdk.org/jdk/compare/4458719a...ed774a56 > > src/hotspot/share/runtime/deoptimization.hpp line 122: > >> 120: Reason_short_running_long_loop, // profile reports loop runs for small number of iterations >> 121: #if INCLUDE_JVMCI >> 122: Reason_aliasing = Reason_short_running_long_loop, // optimistic assumption about aliasing failed > > Why is that required? Otherwise, this assert: assert((1 << _reason_bits) >= Reason_LIMIT, "enough bits"); fails. Rather than tweak the allocation of bits to `_action_bits`, `_reason_bits`, `_debug_id_bits`, to extend `_reason_bits`, I thought it was simpler to have c2 and graal share the encoding of a reason given graal doesn't use the new `Reason_short_running_long_loop` and c2 doesn't use the jvmci specific `Reason_aliasing`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2081538237 From epeter at openjdk.org Fri May 9 12:26:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 9 May 2025 12:26:57 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v4] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 10:47:02 GMT, Hannes Greule wrote: >> @iwanowww I see you did some internal testing, but not for what version. Should we re-run testing? > > Thanks for testing again @eme64. Are the results in? @SirYwell They are **almost** completed, but so far no failure :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24382#issuecomment-2866354170 From fyang at openjdk.org Fri May 9 12:34:50 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 9 May 2025 12:34:50 GMT Subject: RFR: 8356593: RISC-V: Small improvement to array fill stub In-Reply-To: References: Message-ID: On Fri, 9 May 2025 03:19:11 GMT, Anjian-Wen wrote: > When working on [JDK-8351140](https://bugs.openjdk.org/browse/JDK-8351140), I witnessed possible misaligned memory access in array fill stub. > We fill by element for short arrays (< 8 bytes), which assumes a heapword alignment[1]. But that is not guaranteed. This issue could be reproduced by running: `make test TEST="micro:vm.compiler.ArrayFill"` > with `@Param("5") private int size;` on riscv platforms with slow misalgned memory accesses. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L2141 > > This fixes this issue by using a small loop to fill the elements for short arrays. > Tier1-2 tested on linux-riscv64 platform. JMH result on P550 SBC for reference: > 1. (`@Param("5") private int size;`): > > Before: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 5 avgt 12 558.781 ? 1.396 ns/op > ArrayFill.fillIntArray 5 avgt 12 29.346 ? 0.003 ns/op > ArrayFill.fillShortArray 5 avgt 12 30.779 ? 0.004 ns/op > ArrayFill.zeroByteArray 5 avgt 12 559.249 ? 1.909 ns/op > ArrayFill.zeroIntArray 5 avgt 12 29.346 ? 0.002 ns/op > ArrayFill.zeroShortArray 5 avgt 12 30.777 ? 0.006 ns/op > > After: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 5 avgt 12 23.977 ? 0.004 ns/op > ArrayFill.fillIntArray 5 avgt 12 29.343 ? 0.004 ns/op > ArrayFill.fillShortArray 5 avgt 12 30.776 ? 0.005 ns/op > ArrayFill.zeroByteArray 5 avgt 12 23.977 ? 0.002 ns/op > ArrayFill.zeroIntArray 5 avgt 12 29.345 ? 0.005 ns/op > ArrayFill.zeroShortArray 5 avgt 12 30.776 ? 0.004 ns/op > > 2. (`@Param("3") private int size;`): > > Before: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 3 avgt 12 428.923 ? 0.409 ns/op > ArrayFill.fillIntArray 3 avgt 12 28.629 ? 0.005 ns/op > ArrayFill.fillShortArray 3 avgt 12 558.872 ? 2.641 ns/op > ArrayFill.zeroByteArray 3 avgt 12 429.744 ? 2.049 ns/op > ArrayFill.zeroIntArray 3 avgt 12 28.628 ? 0.002 ns/op > ArrayFill.zeroShortArray 3 avgt 12 557.682 ? 1.661 ns/op > > After: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 3 avgt 12 21.471 ? 0.002 ns/op > ArrayFill.fillIntArray 3 avgt 12 28.631 ? 0.003 ns/op > ArrayFill.fillShortArray 3 avgt 12 20.436 ? 0.288 n... Looks reasonable. Thanks for finding this. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25135#pullrequestreview-2828141831 From roland at openjdk.org Fri May 9 12:41:14 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 9 May 2025 12:41:14 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v20] In-Reply-To: References: Message-ID: <8qCb90dePuowml3bmtaa-dWvdY57rYEg1MfFHRIRAro=.da8f8cc2-96aa-486c-a616-dbbdb123a003@github.com> > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - Emanuel's review - Christian's review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21630/files - new: https://git.openjdk.org/jdk/pull/21630/files/1289e211..223c9481 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=18-19 Stats: 77 lines in 6 files changed: 55 ins; 1 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From roland at openjdk.org Fri May 9 12:41:14 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 9 May 2025 12:41:14 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v16] In-Reply-To: References: Message-ID: On Thu, 8 May 2025 11:09:57 GMT, Christian Hagedorn wrote: > Interesting work and results with the benchmark! I have a few comments. Will have another look again later. Thanks for reviewing this @chhagedorn I think new commit addresses all your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2866384327 From roland at openjdk.org Fri May 9 12:41:14 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 9 May 2025 12:41:14 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v16] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 12:36:01 GMT, Roland Westrelin wrote: >> Interesting work and results with the benchmark! I have a few comments. Will have another look again later. > >> Interesting work and results with the benchmark! I have a few comments. Will have another look again later. > > Thanks for reviewing this @chhagedorn > I think new commit addresses all your comments. > @rwestrel Sorry it took so long for me to look at this. > > Did you do some benchmarking to prove that `ShortLoopIter = 1000` is reasonable? As mentioned in one of my replies to your questions, I removed that command line argument. > The Benchmark you published earlier does not seem to do that, right? [#21630 (comment)](https://github.com/openjdk/jdk/pull/21630#issuecomment-2587016221) > > I would also like to see that benchmark integrated. If you are using a benchmark to demonstrate the performance, it should be integrate so others can easily verify on their platform :) Right. But it's Maurizio's benchmark. I think it would make sense to integrate it separately. What do you think @mcimadamore ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2866389669 From roland at openjdk.org Fri May 9 12:41:14 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 9 May 2025 12:41:14 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v15] In-Reply-To: References: <-rQb2ZR6hrzt-7Q0EwQqlxjvVuDQQOgYqzX3tZVPL38=.2577f4e0-c35f-434e-88d1-f0db41bb5364@github.com> Message-ID: On Wed, 7 May 2025 21:24:40 GMT, Emanuel Peter wrote: >> Wouldn't I then need to duplicate every `@run` line in the test i.e.: >> >> @run driver compiler.loopopts.superword.TestMemorySegment ByteArray >> @run driver compiler.loopopts.superword.TestMemorySegment ByteArray AlignVector >> >> >> would become: >> >> >> @run driver compiler.loopopts.superword.TestMemorySegment ByteArray >> @run driver compiler.loopopts.superword.TestMemorySegment ByteArray AlignVector >> @run driver compiler.loopopts.superword.TestMemorySegment ByteArray ShortLoop >> @run driver compiler.loopopts.superword.TestMemorySegment ByteArray AlignVector ShortLoop >> >> >> Same for `CharArray` etc... >> That seems like a lot of extra complexity. Or would it be sufficient to only add it for `ByteArray` to have the non short loop case at least minimally covered? > > Yeah, I would only do it for one or two cases. Doing it for all would be a little excessive, and eventually we have too many combinations. Sounds reasonable. Done in new commits. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2081580237 From bkilambi at openjdk.org Fri May 9 13:22:26 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 9 May 2025 13:22:26 GMT Subject: RFR: 8355708: Two Float16 IR tests fail after JDK-8345125 Message-ID: Two FP16 tests fail due to IR verification failure in JTREG. Increased the warmup time to 10000 to make sure it is being compiled by c2 and the expected IR is being generated. Testing: Tested both the testcases with and without these options - `"-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation"` and they pass successfully on aarch64. ------------- Commit messages: - 8355708: Two Float16 IR tests fail after JDK-8345125 Changes: https://git.openjdk.org/jdk/pull/25141/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25141&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355708 Stats: 5 lines in 3 files changed: 2 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25141.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25141/head:pull/25141 PR: https://git.openjdk.org/jdk/pull/25141 From mdoerr at openjdk.org Fri May 9 13:32:56 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 9 May 2025 13:32:56 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: <2NUKCBO7aaoQYPLVWn_rJ4nL28qtgm1OqeD6Zhil2mQ=.f5eca835-22bf-44c1-a2e1-71bdf1cd9401@github.com> On Wed, 23 Apr 2025 06:09:25 GMT, Amit Kumar wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > improved mvc implementation Thanks! That sounds like mvc should better not be used for `Unsafe` operations. Seeing no failures in some tests doesn't prove that it's safe. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2866557839 From duke at openjdk.org Fri May 9 13:40:54 2025 From: duke at openjdk.org (Ulrich Weigand) Date: Fri, 9 May 2025 13:40:54 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: <60EKzMgxd8YXL1DaKBnEYxGpf-WYhiqOFTzqCvJcYzk=.b923d1ee-8698-44e1-9876-afa708d05cd2@github.com> Message-ID: On Fri, 9 May 2025 10:43:40 GMT, Amit Kumar wrote: >>> Hi Martin, So what will be next step here ? Should I put this question in community mailing list ? >> >> You can try. `java.nio` and FFM API may have some requirements and expectations for such `Unsafe` operations. > >> > Hi Martin, So what will be next step here ? Should I put this question in community mailing list ? >> >> You can try. `java.nio` and FFM API may have some requirements and expectations for such `Unsafe` operations. > > I couldn't see any failure @TheRealMDoerr @offamitkumar asked me to comment here w.r.t. s390x architecture questions. I'm not sure what exactly the requirements are this function needs to guarantee, but the s390x MVC instruction has the following properties: - It is a single operation as observed by the CPU that executes it, that is it either completes fully or not at all (e.g. in the case of accessing unmapped memory addresses). In that respect it behaves similarly to a single store instruction. - As observed by *other* CPUs, MVC does not make any guarantee of atomicity (at least not the case where source and destination overlap, which we have in this example). That is other CPUs may observe in memory either the state before the operation, after the operation, or any intermediate state. [ In that respect, MVC behaves _differently_ from a store instruction, where e.g. an 8 byte store to a naturally aligned address is guaranteed to be atomic: other CPUs will either see the old value or the new, but not anything in between. ] So whether or not MVC is the correct choice here depends on the assumptions the callers make on this routine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2866326026 From dlunden at openjdk.org Fri May 9 13:46:09 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 9 May 2025 13:46:09 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v5] In-Reply-To: References: Message-ID: > The current documentation for `PhaseCFG::insert_anti_dependences` is difficult to follow and sometimes even misleading. We should ensure the method is appropriately documented. > > ### Changeset > > - Rename `PhaseCFG::insert_anti_dependences` to `PhaseCFG::raise_above_anti_dependences`. The purpose of `PhaseCFG::raise_above_anti_dependences` is twofold: raise the load's LCA so that the load is scheduled before anti-dependent stores, and if necessary add anti-dependence edges between the load and certain anti-dependent stores (to ensure we later "raise" the load before anti-dependent stores in LCM). The name `PhaseCFG::insert_anti_dependences` suggests that we only add anti-dependence edges. The name `PhaseCFG::raise_above_anti_dependences`, therefore, seems more appropriate. > - Significantly add to and revise the source code documentation of `PhaseCFG::raise_above_anti_dependences`. > - Add, move, and revise `assert`s in `PhaseCFG::raise_above_anti_dependences`, including improved `assert` messages in a few places. > - In the main worklist loop of `PhaseCFG::raise_above_anti_dependences`: > - Clean up how we identify the search root (avoid mutation). > - Add a missing early exit for `Phi` nodes when `LCA == early`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14706896111) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Revise explanation of non_early_stores ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24926/files - new: https://git.openjdk.org/jdk/pull/24926/files/e4fb8a0d..a4a6a778 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24926&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24926&range=03-04 Stats: 9 lines in 1 file changed: 6 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24926/head:pull/24926 PR: https://git.openjdk.org/jdk/pull/24926 From dlunden at openjdk.org Fri May 9 13:48:53 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 9 May 2025 13:48:53 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v2] In-Reply-To: <_aHKGu0ncIbByHUDzqW4xfhPBtN6f2XKy_tJLW0KM30=.a341f11d-e29b-4186-b31d-b33d1e4e2ebc@github.com> References: <95YBckz3m_3L4DtOY38G7BjOFvljWoqGRqV3EIJi2-8=.f06b86d4-6e2f-4cb6-b5e2-382c7831b4d3@github.com> <_aHKGu0ncIbByHUDzqW4xfhPBtN6f2XKy_tJLW0KM30=.a341f11d-e29b-4186-b31d-b33d1e4e2ebc@github.com> Message-ID: <5pqZqfKvgPrvogeuZuaB8JLRKD0RcIEnWtEsds2A1oU=.9bc469c3-b0c3-4450-ab37-9903821373ee@github.com> On Fri, 9 May 2025 08:50:54 GMT, Roberto Casta?eda Lozano wrote: >> Technically, "outside of" is more appropriate here, because the stores that we bookkeep are not necessarily dominated by ("below") early. Since the search starts from `initial_mem`, which can be in a much earlier block than early, stores that we bookkeep can also be above early, or on completely distinct control-flow paths that do not even go through early. But, you are correct that only stores below early matter in the end. > > Thanks for the clarification. I think it would be good to add this note to the comment ("Note that stores in non_early_stores are not necessarily dominated by early. Since the search starts from ..." ). Yes, I agree. I've now revised this comment: // Bookkeeping of possibly anti-dependent stores that we find outside of the // early block and that may need anti-dependence edges. Note that stores in // non_early_stores are not necessarily dominated by early. The search starts // from initial_mem, which can reside in a block that dominates early, and // therefore, stores we find may be in blocks that are on completely distinct // control-flow paths compared to early. However, in the end, only stores in // blocks dominated by early matters. The reason for bookkeeping not only // relevant stores is efficiency: we lazily record all possible // anti-dependent stores and add anti-dependence edges only to the relevant // ones at the very end of this method when we know the final updated LCA. I wrote in my first comment that stores we find can be above (i.e., dominate) early, but this is not actually correct. It doesn't make sense that we overwrite the memory of a load before we even reach early. But, it is still very common that we find stores on completely distinct control-flow paths compared to early. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2081733516 From aph at openjdk.org Fri May 9 14:16:06 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 9 May 2025 14:16:06 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory Message-ID: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). Benchmark (aligned) (size) Mode Cnt Score Error Units MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op ------------- Commit messages: - 8354674: AArch64: Intrinsify Unsafe::setMemory - New test - Cleanup - Fixes - next cut - First cut Changes: https://git.openjdk.org/jdk/pull/25147/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25147&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354674 Stats: 99 lines in 2 files changed: 96 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25147/head:pull/25147 PR: https://git.openjdk.org/jdk/pull/25147 From aph at openjdk.org Fri May 9 14:25:50 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 9 May 2025 14:25:50 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory In-Reply-To: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: On Fri, 9 May 2025 14:11:27 GMT, Andrew Haley wrote: > This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). > > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op > MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op > MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op > MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op Apple M1, small memory blocks: Benchmark (aligned) (size) Mode Cnt Score Error Units MemorySegmentFillUnsafe.panama true 1 avgt 10 1.731 ? 0.001 ns/op MemorySegmentFillUnsafe.panama true 2 avgt 10 1.570 ? 0.001 ns/op MemorySegmentFillUnsafe.panama true 3 avgt 10 1.583 ? 0.014 ns/op MemorySegmentFillUnsafe.panama true 4 avgt 10 1.734 ? 0.014 ns/op MemorySegmentFillUnsafe.panama true 5 avgt 10 1.736 ? 0.001 ns/op MemorySegmentFillUnsafe.panama true 6 avgt 10 1.731 ? 0.001 ns/op MemorySegmentFillUnsafe.panama true 7 avgt 10 1.744 ? 0.002 ns/op MemorySegmentFillUnsafe.panama true 8 avgt 10 2.365 ? 0.005 ns/op MemorySegmentFillUnsafe.panama true 15 avgt 10 2.681 ? 0.001 ns/op MemorySegmentFillUnsafe.panama true 16 avgt 10 2.503 ? 0.003 ns/op MemorySegmentFillUnsafe.panama true 63 avgt 10 3.615 ? 0.003 ns/op MemorySegmentFillUnsafe.panama true 64 avgt 10 4.701 ? 0.056 ns/op MemorySegmentFillUnsafe.panama true 255 avgt 10 4.848 ? 0.004 ns/op MemorySegmentFillUnsafe.panama true 256 avgt 10 5.003 ? 0.003 ns/op MemorySegmentFillUnsafe.panama false 1 avgt 10 1.729 ? 0.001 ns/op MemorySegmentFillUnsafe.panama false 2 avgt 10 1.571 ? 0.003 ns/op MemorySegmentFillUnsafe.panama false 3 avgt 10 1.579 ? 0.010 ns/op MemorySegmentFillUnsafe.panama false 4 avgt 10 1.728 ? 0.002 ns/op MemorySegmentFillUnsafe.panama false 5 avgt 10 1.739 ? 0.019 ns/op MemorySegmentFillUnsafe.panama false 6 avgt 10 1.731 ? 0.002 ns/op MemorySegmentFillUnsafe.panama false 7 avgt 10 1.744 ? 0.012 ns/op MemorySegmentFillUnsafe.panama false 8 avgt 10 2.367 ? 0.002 ns/op MemorySegmentFillUnsafe.panama false 15 avgt 10 2.694 ? 0.030 ns/op MemorySegmentFillUnsafe.panama false 16 avgt 10 2.517 ? 0.057 ns/op MemorySegmentFillUnsafe.panama false 63 avgt 10 3.619 ? 0.009 ns/op MemorySegmentFillUnsafe.panama false 64 avgt 10 4.708 ? 0.057 ns/op MemorySegmentFillUnsafe.panama false 255 avgt 10 5.018 ? 0.057 ns/op MemorySegmentFillUnsafe.panama false 256 avgt 10 5.038 ? 0.068 ns/op MemorySegmentFillUnsafe.unsafe true 1 avgt 10 2.815 ? 0.002 ns/op MemorySegmentFillUnsafe.unsafe true 2 avgt 10 2.821 ? 0.022 ns/op MemorySegmentFillUnsafe.unsafe true 3 avgt 10 2.502 ? 0.002 ns/op MemorySegmentFillUnsafe.unsafe true 4 avgt 10 2.815 ? 0.004 ns/op MemorySegmentFillUnsafe.unsafe true 5 avgt 10 2.502 ? 0.003 ns/op MemorySegmentFillUnsafe.unsafe true 6 avgt 10 2.505 ? 0.022 ns/op MemorySegmentFillUnsafe.unsafe true 7 avgt 10 2.193 ? 0.019 ns/op MemorySegmentFillUnsafe.unsafe true 8 avgt 10 2.190 ? 0.002 ns/op MemorySegmentFillUnsafe.unsafe true 15 avgt 10 2.043 ? 0.027 ns/op MemorySegmentFillUnsafe.unsafe true 16 avgt 10 2.191 ? 0.003 ns/op MemorySegmentFillUnsafe.unsafe true 63 avgt 10 2.061 ? 0.040 ns/op MemorySegmentFillUnsafe.unsafe true 64 avgt 10 2.196 ? 0.027 ns/op MemorySegmentFillUnsafe.unsafe true 255 avgt 10 3.756 ? 0.001 ns/op MemorySegmentFillUnsafe.unsafe true 256 avgt 10 3.752 ? 0.002 ns/op MemorySegmentFillUnsafe.unsafe false 1 avgt 10 2.813 ? 0.001 ns/op MemorySegmentFillUnsafe.unsafe false 2 avgt 10 2.817 ? 0.003 ns/op MemorySegmentFillUnsafe.unsafe false 3 avgt 10 2.502 ? 0.003 ns/op MemorySegmentFillUnsafe.unsafe false 4 avgt 10 2.816 ? 0.002 ns/op MemorySegmentFillUnsafe.unsafe false 5 avgt 10 2.507 ? 0.027 ns/op MemorySegmentFillUnsafe.unsafe false 6 avgt 10 2.507 ? 0.025 ns/op MemorySegmentFillUnsafe.unsafe false 7 avgt 10 2.195 ? 0.025 ns/op MemorySegmentFillUnsafe.unsafe false 8 avgt 10 2.192 ? 0.005 ns/op MemorySegmentFillUnsafe.unsafe false 15 avgt 10 2.050 ? 0.025 ns/op MemorySegmentFillUnsafe.unsafe false 16 avgt 10 2.188 ? 0.001 ns/op MemorySegmentFillUnsafe.unsafe false 63 avgt 10 2.051 ? 0.027 ns/op MemorySegmentFillUnsafe.unsafe false 64 avgt 10 2.196 ? 0.015 ns/op MemorySegmentFillUnsafe.unsafe false 255 avgt 10 4.619 ? 0.029 ns/op MemorySegmentFillUnsafe.unsafe false 256 avgt 10 4.618 ? 0.047 ns/op Graviton 4, small memory blocks: Benchmark (aligned) (size) Mode Cnt Score Error Units MemorySegmentFillUnsafe.panama true 1 avgt 10 1.970 ? 0.002 ns/op MemorySegmentFillUnsafe.panama true 2 avgt 10 1.966 ? 0.020 ns/op MemorySegmentFillUnsafe.panama true 3 avgt 10 1.963 ? 0.014 ns/op MemorySegmentFillUnsafe.panama true 4 avgt 10 1.989 ? 0.004 ns/op MemorySegmentFillUnsafe.panama true 5 avgt 10 2.030 ? 0.010 ns/op MemorySegmentFillUnsafe.panama true 6 avgt 10 2.027 ? 0.010 ns/op MemorySegmentFillUnsafe.panama true 7 avgt 10 2.077 ? 0.006 ns/op MemorySegmentFillUnsafe.panama true 8 avgt 10 2.557 ? 0.004 ns/op MemorySegmentFillUnsafe.panama true 15 avgt 10 3.176 ? 0.002 ns/op MemorySegmentFillUnsafe.panama true 16 avgt 10 2.779 ? 0.001 ns/op MemorySegmentFillUnsafe.panama true 63 avgt 10 4.302 ? 0.002 ns/op MemorySegmentFillUnsafe.panama true 64 avgt 10 4.292 ? 0.007 ns/op MemorySegmentFillUnsafe.panama true 255 avgt 10 6.311 ? 0.013 ns/op MemorySegmentFillUnsafe.panama true 256 avgt 10 5.394 ? 0.003 ns/op MemorySegmentFillUnsafe.panama false 1 avgt 10 1.970 ? 0.001 ns/op MemorySegmentFillUnsafe.panama false 2 avgt 10 1.937 ? 0.017 ns/op MemorySegmentFillUnsafe.panama false 3 avgt 10 1.954 ? 0.014 ns/op MemorySegmentFillUnsafe.panama false 4 avgt 10 1.985 ? 0.005 ns/op MemorySegmentFillUnsafe.panama false 5 avgt 10 2.006 ? 0.008 ns/op MemorySegmentFillUnsafe.panama false 6 avgt 10 2.015 ? 0.008 ns/op MemorySegmentFillUnsafe.panama false 7 avgt 10 2.138 ? 0.035 ns/op MemorySegmentFillUnsafe.panama false 8 avgt 10 2.553 ? 0.005 ns/op MemorySegmentFillUnsafe.panama false 15 avgt 10 3.178 ? 0.002 ns/op MemorySegmentFillUnsafe.panama false 16 avgt 10 2.775 ? 0.005 ns/op MemorySegmentFillUnsafe.panama false 63 avgt 10 4.296 ? 0.007 ns/op MemorySegmentFillUnsafe.panama false 64 avgt 10 4.290 ? 0.001 ns/op MemorySegmentFillUnsafe.panama false 255 avgt 10 6.334 ? 0.013 ns/op MemorySegmentFillUnsafe.panama false 256 avgt 10 5.472 ? 0.009 ns/op MemorySegmentFillUnsafe.unsafe true 1 avgt 10 3.218 ? 0.001 ns/op MemorySegmentFillUnsafe.unsafe true 2 avgt 10 2.860 ? 0.001 ns/op MemorySegmentFillUnsafe.unsafe true 3 avgt 10 2.860 ? 0.001 ns/op MemorySegmentFillUnsafe.unsafe true 4 avgt 10 2.860 ? 0.001 ns/op MemorySegmentFillUnsafe.unsafe true 5 avgt 10 2.860 ? 0.001 ns/op MemorySegmentFillUnsafe.unsafe true 6 avgt 10 2.503 ? 0.001 ns/op MemorySegmentFillUnsafe.unsafe true 7 avgt 10 2.860 ? 0.001 ns/op MemorySegmentFillUnsafe.unsafe true 8 avgt 10 2.145 ? 0.001 ns/op MemorySegmentFillUnsafe.unsafe true 15 avgt 10 2.886 ? 0.100 ns/op MemorySegmentFillUnsafe.unsafe true 16 avgt 10 2.145 ? 0.001 ns/op MemorySegmentFillUnsafe.unsafe true 63 avgt 10 3.781 ? 0.013 ns/op MemorySegmentFillUnsafe.unsafe true 64 avgt 10 2.735 ? 0.016 ns/op MemorySegmentFillUnsafe.unsafe true 255 avgt 10 5.079 ? 0.014 ns/op MemorySegmentFillUnsafe.unsafe true 256 avgt 10 4.007 ? 0.112 ns/op MemorySegmentFillUnsafe.unsafe false 1 avgt 10 3.218 ? 0.001 ns/op MemorySegmentFillUnsafe.unsafe false 2 avgt 10 2.860 ? 0.001 ns/op MemorySegmentFillUnsafe.unsafe false 3 avgt 10 2.861 ? 0.001 ns/op MemorySegmentFillUnsafe.unsafe false 4 avgt 10 2.864 ? 0.016 ns/op MemorySegmentFillUnsafe.unsafe false 5 avgt 10 2.860 ? 0.001 ns/op MemorySegmentFillUnsafe.unsafe false 6 avgt 10 2.503 ? 0.001 ns/op MemorySegmentFillUnsafe.unsafe false 7 avgt 10 2.860 ? 0.001 ns/op MemorySegmentFillUnsafe.unsafe false 8 avgt 10 2.145 ? 0.001 ns/op MemorySegmentFillUnsafe.unsafe false 15 avgt 10 2.571 ? 0.040 ns/op MemorySegmentFillUnsafe.unsafe false 16 avgt 10 2.146 ? 0.001 ns/op MemorySegmentFillUnsafe.unsafe false 63 avgt 10 4.531 ? 0.021 ns/op MemorySegmentFillUnsafe.unsafe false 64 avgt 10 5.134 ? 0.099 ns/op MemorySegmentFillUnsafe.unsafe false 255 avgt 10 6.603 ? 0.031 ns/op MemorySegmentFillUnsafe.unsafe false 256 avgt 10 7.148 ? 0.025 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/25147#issuecomment-2866747058 PR Comment: https://git.openjdk.org/jdk/pull/25147#issuecomment-2866751204 From jbhateja at openjdk.org Fri May 9 15:17:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 9 May 2025 15:17:17 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v19] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: - Sandhya's review comments resoultion - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352675 - Addressing Yudi's comments - Code re-factoring from Vladimir - Reveiw suggestions incorporated - Making _features_bitmap size configurable - cleanups & refactorings - build fixes for non-x86 targets - Review comments resolutions - Updating comment - ... and 9 more: https://git.openjdk.org/jdk/compare/411a63ea...f583a521 ------------- Changes: https://git.openjdk.org/jdk/pull/24329/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=18 Stats: 520 lines in 15 files changed: 271 ins; 29 del; 220 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From aph at openjdk.org Fri May 9 15:20:53 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 9 May 2025 15:20:53 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory In-Reply-To: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: On Fri, 9 May 2025 14:11:27 GMT, Andrew Haley wrote: > This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). > > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op > MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op > MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op > MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op Apple M1: Benchmark (ELEM_SIZE) Mode Cnt Score Error Units SegmentBulkFill.heapSegmentFillJava 2 avgt 10 1.727 ? 0.017 ns/op SegmentBulkFill.heapSegmentFillJava 3 avgt 10 1.721 ? 0.002 ns/op SegmentBulkFill.heapSegmentFillJava 4 avgt 10 1.876 ? 0.002 ns/op SegmentBulkFill.heapSegmentFillJava 5 avgt 10 1.876 ? 0.001 ns/op SegmentBulkFill.heapSegmentFillJava 6 avgt 10 1.876 ? 0.002 ns/op SegmentBulkFill.heapSegmentFillJava 7 avgt 10 1.876 ? 0.002 ns/op SegmentBulkFill.heapSegmentFillJava 8 avgt 10 2.502 ? 0.003 ns/op SegmentBulkFill.heapSegmentFillJava 64 avgt 10 4.064 ? 0.002 ns/op SegmentBulkFill.heapSegmentFillJava 512 avgt 10 6.601 ? 0.051 ns/op SegmentBulkFill.heapSegmentFillJava 4096 avgt 10 44.050 ? 0.076 ns/op SegmentBulkFill.heapSegmentFillJava 32768 avgt 10 330.328 ? 0.450 ns/op SegmentBulkFill.heapSegmentFillJava 262144 avgt 10 4138.154 ? 6.509 ns/op SegmentBulkFill.heapSegmentFillJava 2097152 avgt 10 33089.966 ? 48.068 ns/op SegmentBulkFill.heapSegmentFillJava 16777216 avgt 10 352669.548 ? 571.433 ns/op SegmentBulkFill.heapSegmentFillJava 134217728 avgt 10 4482510.192 ? 7177.637 ns/op SegmentBulkFill.heapSegmentFillLoop 2 avgt 10 1.977 ? 0.003 ns/op SegmentBulkFill.heapSegmentFillLoop 3 avgt 10 3.447 ? 0.002 ns/op SegmentBulkFill.heapSegmentFillLoop 4 avgt 10 4.073 ? 0.042 ns/op SegmentBulkFill.heapSegmentFillLoop 5 avgt 10 4.377 ? 0.004 ns/op SegmentBulkFill.heapSegmentFillLoop 6 avgt 10 5.337 ? 0.071 ns/op SegmentBulkFill.heapSegmentFillLoop 7 avgt 10 5.629 ? 0.004 ns/op SegmentBulkFill.heapSegmentFillLoop 8 avgt 10 5.947 ? 0.010 ns/op SegmentBulkFill.heapSegmentFillLoop 64 avgt 10 8.127 ? 0.003 ns/op SegmentBulkFill.heapSegmentFillLoop 512 avgt 10 16.045 ? 0.027 ns/op SegmentBulkFill.heapSegmentFillLoop 4096 avgt 10 46.627 ? 0.164 ns/op SegmentBulkFill.heapSegmentFillLoop 32768 avgt 10 333.233 ? 1.040 ns/op SegmentBulkFill.heapSegmentFillLoop 262144 avgt 10 4134.009 ? 11.125 ns/op SegmentBulkFill.heapSegmentFillLoop 2097152 avgt 10 33148.671 ? 322.905 ns/op SegmentBulkFill.heapSegmentFillLoop 16777216 avgt 10 343832.913 ? 233.881 ns/op SegmentBulkFill.heapSegmentFillLoop 134217728 avgt 10 4475821.911 ? 6101.380 ns/op SegmentBulkFill.heapSegmentFillUnsafe 2 avgt 10 3.133 ? 0.034 ns/op SegmentBulkFill.heapSegmentFillUnsafe 3 avgt 10 3.130 ? 0.005 ns/op SegmentBulkFill.heapSegmentFillUnsafe 4 avgt 10 3.128 ? 0.004 ns/op SegmentBulkFill.heapSegmentFillUnsafe 5 avgt 10 3.139 ? 0.030 ns/op SegmentBulkFill.heapSegmentFillUnsafe 6 avgt 10 3.135 ? 0.035 ns/op SegmentBulkFill.heapSegmentFillUnsafe 7 avgt 10 3.135 ? 0.030 ns/op SegmentBulkFill.heapSegmentFillUnsafe 8 avgt 10 2.665 ? 0.006 ns/op SegmentBulkFill.heapSegmentFillUnsafe 64 avgt 10 2.841 ? 0.032 ns/op SegmentBulkFill.heapSegmentFillUnsafe 512 avgt 10 6.246 ? 0.100 ns/op SegmentBulkFill.heapSegmentFillUnsafe 4096 avgt 10 41.241 ? 0.107 ns/op SegmentBulkFill.heapSegmentFillUnsafe 32768 avgt 10 331.001 ? 4.521 ns/op SegmentBulkFill.heapSegmentFillUnsafe 262144 avgt 10 3038.808 ? 29.750 ns/op SegmentBulkFill.heapSegmentFillUnsafe 2097152 avgt 10 21996.375 ? 2617.947 ns/op SegmentBulkFill.heapSegmentFillUnsafe 16777216 avgt 10 241814.864 ? 24300.854 ns/op SegmentBulkFill.heapSegmentFillUnsafe 134217728 avgt 10 2811655.392 ? 24737.911 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/25147#issuecomment-2866961810 From mhaessig at openjdk.org Fri May 9 15:22:56 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SO+/vXNzaWc=?=) Date: Fri, 9 May 2025 15:22:56 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock [v2] In-Reply-To: References: Message-ID: > # Issue Summary > > This PR addresses an `assert(bb->is_reachable())` that is triggered in the code for `-XX:+VerifyStack` after a deoptimization with reason `null_assert_or_unreached0` at a `getstatic` bytecode. Following the `getstatic` is an `areturn` and then an unreachable bytecode. When the code for `VerifyStack` tries to compute an oop map for the basic block of the unreachable bytecode, the assert triggers: > > getstatic Field A.val:"LB"; // if class B is not loaded, C2 deopts with reason "null_assert_or_unreached0" > areturn; > // The following is unreachable > iconst_0; > > > This is a similar problem to [JDK-8271055](https://bugs.openjdk.org/browse/JDK-8271055) (#7331), but this particular deopt with reason `null_assert_or_unreached0` at `getstatic` of a field containing an object reference [deopts at the next bytecode](https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/opto/parse3.cpp#L176-L199). The aforementioned issue introduced a check to skip stack verification of the next bytecode in the code if the execution after the deopted bytecode does not continue at the next bytecode in the code, i.e. falls through to the next bytecode. Unfortunately, this check did not include `areturn` as a bytecode that does not fall-through: > https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/runtime/deoptimization.cpp#L845-L856 > > # Change Summary > > To fix the immediate issue described above, this PR adds `areturn` to the list of bytecodes that does not fall through. However, all return bytecodes exhibit the same behavior and might be susceptible to a similar issue. Even though I was not able to reproduce the same crash with `{d,f,i,l}return` because I could not get those or the preceding bytecode to deopt, I also added them to the `falls_through()` function. For the remaining bytecodes in `falls_through()` with the exception of `athrow` I wrote a regression test. > > # Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14595928439) > - [x] tier1 through tier3 on Oracle supported platforms and OSs plus Oracle internal testing > > # Acknowledgements > Special thanks to @eme64 for his hard work on reducing a reproducer that works on all platforms. Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: - Add lookupswitch and tableswitch - Elaborate why we need that class file version - Reorganized tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25118/files - new: https://git.openjdk.org/jdk/pull/25118/files/53cec97e..564d2fca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25118&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25118&range=00-01 Stats: 443 lines in 7 files changed: 198 ins; 245 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25118.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25118/head:pull/25118 PR: https://git.openjdk.org/jdk/pull/25118 From mhaessig at openjdk.org Fri May 9 15:22:57 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SO+/vXNzaWc=?=) Date: Fri, 9 May 2025 15:22:57 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock [v2] In-Reply-To: <0W1NJ3CRAdeugnJTYVGVtomqYJEX5QdVEua9XPSWn5g=.d5b30054-2805-4b8c-a9f9-5b1cdbc12d2a@github.com> References: <0W1NJ3CRAdeugnJTYVGVtomqYJEX5QdVEua9XPSWn5g=.d5b30054-2805-4b8c-a9f9-5b1cdbc12d2a@github.com> Message-ID: <82PkAUAxT8KzDxLPJRzpexMxjpCIHkEhId71UfTojIw=.daba032d-cb7c-4220-ab22-53162487a0a2@github.com> On Thu, 8 May 2025 14:54:05 GMT, Emanuel Peter wrote: > For the test: It's a bit of a shame to have lots of separate files. I got myself confused with class visibilities. I moved the `A` and `B` into the java test file and moved both of them one directory lower into `compiler/interpreter`. That makes it a lot cleaner. > Or else argue why it CANNOT be done. I looked into it some more and there are two places where we deopt and move the `bci` over to the next bytecode: when an object of an unloaded class is returned by `getstatic` (see code in the PR description) and calls (see below) and the object reference is `null`. https://github.com/openjdk/jdk/blob/411a63ea1b0c6e8bfea219427bf1c317c5dadabf/src/hotspot/share/opto/doCall.cpp#L770-L785 Since `{d,f,i,l}return`, `{table,lookup}switch`, and `ret` require an integer on the stack but only bytecodes that push references on the stack can deopt to the next `bci`, we cannot trigger this error for those bytecodes. Now, that begs the question, whether these bytecodes should then be in `falls_through()`. I argue that they should be, since that would be the correct behavior if we deopted at such a bytecode. > Did you go through all bytecodes we support here? I did, but I missed `lookupswitch` and `tableswitch`. `jsr` just pushes the address of the next bytecode onto the stack. `ret` can jump to such an address, but I already added that to `falls_through()`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25118#issuecomment-2866965280 From aph at openjdk.org Fri May 9 15:30:53 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 9 May 2025 15:30:53 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory In-Reply-To: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: <2yBNv1pe_7PYsML-ySuw5EMb2cqamuF1wDiDgRtNP3Y=.0fa1c070-d627-472f-9255-376d3a4bd7b6@github.com> On Fri, 9 May 2025 14:11:27 GMT, Andrew Haley wrote: > This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). > > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op > MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op > MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op > MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op Graviton 4: Benchmark (ELEM_SIZE) Mode Cnt Score Error Units SegmentBulkFill.heapSegmentFillJava 2 avgt 10 2.324 ? 0.066 ns/op SegmentBulkFill.heapSegmentFillJava 3 avgt 10 2.427 ? 0.031 ns/op SegmentBulkFill.heapSegmentFillJava 4 avgt 10 2.231 ? 0.009 ns/op SegmentBulkFill.heapSegmentFillJava 5 avgt 10 2.523 ? 0.040 ns/op SegmentBulkFill.heapSegmentFillJava 6 avgt 10 2.632 ? 0.017 ns/op SegmentBulkFill.heapSegmentFillJava 7 avgt 10 2.394 ? 0.007 ns/op SegmentBulkFill.heapSegmentFillJava 8 avgt 10 3.004 ? 0.032 ns/op SegmentBulkFill.heapSegmentFillJava 64 avgt 10 4.813 ? 0.417 ns/op SegmentBulkFill.heapSegmentFillJava 512 avgt 10 9.151 ? 0.040 ns/op SegmentBulkFill.heapSegmentFillJava 4096 avgt 10 60.127 ? 0.078 ns/op SegmentBulkFill.heapSegmentFillJava 32768 avgt 10 461.292 ? 2.127 ns/op SegmentBulkFill.heapSegmentFillJava 262144 avgt 10 3666.851 ? 0.280 ns/op SegmentBulkFill.heapSegmentFillJava 2097152 avgt 10 35169.510 ? 22.507 ns/op SegmentBulkFill.heapSegmentFillJava 16777216 avgt 10 227182.710 ? 903.546 ns/op SegmentBulkFill.heapSegmentFillJava 134217728 avgt 10 1946761.410 ? 3033.447 ns/op SegmentBulkFill.heapSegmentFillLoop 2 avgt 10 2.902 ? 0.038 ns/op SegmentBulkFill.heapSegmentFillLoop 3 avgt 10 3.870 ? 0.004 ns/op SegmentBulkFill.heapSegmentFillLoop 4 avgt 10 5.438 ? 0.013 ns/op SegmentBulkFill.heapSegmentFillLoop 5 avgt 10 5.714 ? 0.033 ns/op SegmentBulkFill.heapSegmentFillLoop 6 avgt 10 5.748 ? 0.019 ns/op SegmentBulkFill.heapSegmentFillLoop 7 avgt 10 5.909 ? 0.004 ns/op SegmentBulkFill.heapSegmentFillLoop 8 avgt 10 6.330 ? 0.295 ns/op SegmentBulkFill.heapSegmentFillLoop 64 avgt 10 8.769 ? 0.003 ns/op SegmentBulkFill.heapSegmentFillLoop 512 avgt 10 16.935 ? 0.007 ns/op SegmentBulkFill.heapSegmentFillLoop 4096 avgt 10 57.822 ? 0.510 ns/op SegmentBulkFill.heapSegmentFillLoop 32768 avgt 10 376.849 ? 0.311 ns/op SegmentBulkFill.heapSegmentFillLoop 262144 avgt 10 3059.064 ? 0.419 ns/op SegmentBulkFill.heapSegmentFillLoop 2097152 avgt 10 24398.571 ? 8.618 ns/op SegmentBulkFill.heapSegmentFillLoop 16777216 avgt 10 225721.136 ? 608.041 ns/op SegmentBulkFill.heapSegmentFillLoop 134217728 avgt 10 1940987.569 ? 2156.239 ns/op SegmentBulkFill.heapSegmentFillUnsafe 2 avgt 10 3.628 ? 0.022 ns/op SegmentBulkFill.heapSegmentFillUnsafe 3 avgt 10 3.670 ? 0.011 ns/op SegmentBulkFill.heapSegmentFillUnsafe 4 avgt 10 3.583 ? 0.002 ns/op SegmentBulkFill.heapSegmentFillUnsafe 5 avgt 10 3.651 ? 0.016 ns/op SegmentBulkFill.heapSegmentFillUnsafe 6 avgt 10 3.659 ? 0.015 ns/op SegmentBulkFill.heapSegmentFillUnsafe 7 avgt 10 3.687 ? 0.016 ns/op SegmentBulkFill.heapSegmentFillUnsafe 8 avgt 10 3.193 ? 0.022 ns/op SegmentBulkFill.heapSegmentFillUnsafe 64 avgt 10 3.365 ? 0.034 ns/op SegmentBulkFill.heapSegmentFillUnsafe 512 avgt 10 6.443 ? 0.006 ns/op SegmentBulkFill.heapSegmentFillUnsafe 4096 avgt 10 48.261 ? 0.081 ns/op SegmentBulkFill.heapSegmentFillUnsafe 32768 avgt 10 389.793 ? 0.777 ns/op SegmentBulkFill.heapSegmentFillUnsafe 262144 avgt 10 3123.758 ? 1.048 ns/op SegmentBulkFill.heapSegmentFillUnsafe 2097152 avgt 10 25039.904 ? 55.467 ns/op SegmentBulkFill.heapSegmentFillUnsafe 16777216 avgt 10 223579.037 ? 306.005 ns/op SegmentBulkFill.heapSegmentFillUnsafe 134217728 avgt 10 1931370.983 ? 1110.364 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/25147#issuecomment-2867002071 From aph at openjdk.org Fri May 9 15:39:35 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 9 May 2025 15:39:35 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v2] In-Reply-To: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: > This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). > > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op > MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op > MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op > MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: generate_unsafecopy_common_error_exit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25147/files - new: https://git.openjdk.org/jdk/pull/25147/files/c3d4c414..1078ba8c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25147&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25147&range=00-01 Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25147/head:pull/25147 PR: https://git.openjdk.org/jdk/pull/25147 From mchevalier at openjdk.org Fri May 9 16:08:55 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 9 May 2025 16:08:55 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Tue, 6 May 2025 18:18:08 GMT, Vladimir Ivanov wrote: > making them expensive nodes (to avoid commoning during GVN) Good point! I still think I don't get everything. Let me try to sum up what I think I should do. For now, I don't want to mess with control, but I should prepare the field. Using general Call nodes for pure calls was pretty difficult: Call nodes have too much opinion, assumptions to easily work with for pure calls. But eventually, I want to change the nodes I'm using into a Call node, and more precisely a CallLeaf (I suspect once I'm done doing all I can do with pure calls, so in macro expansion, it's fine). To be able to do this transformation, I need to know control at this point. My goal is to start with control-less nodes, but find the late control during loop optimization, control-pin them at this point (because that's when the information is available) with both control input and output (needed for the expansion in CallLeaf), and continuing with control-pinned nodes. For now, I'm happy with the control I get from parsing. So, under my nodes, I need 2 outputs: control and data (everywhere now, and at least after control-pinning in the follow-up). I should then make ModFloating/ModD/ModF sub-classes of `MultNode` (I guess, I can make ModFloating a direct sub-class of `MultNode`. And I can introduce new node types for native math calls that would behave similarly wrt to elimination (and pinning in the future), and would also expand into `CallLeaf`. A weirdness of these nodes is that they would be CFG or not whether they are pinned already, and not depending on their type, but I'm not aware of a fundamental issue about that, as long as the change doesn't happen in the middle of a phase where it's relevant. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2867105355 From mchevalier at openjdk.org Fri May 9 16:29:07 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 9 May 2025 16:29:07 GMT Subject: RFR: 8353638: C2: deoptimization and re-execution cycle with StringBuilder Message-ID: Unlike what was assumed at first, it is quite different from [JDK-8346989](https://bugs.openjdk.org/browse/JDK-8346989). The problem is actually unrelated to `StringBuilder`, but has to do with the underlying array allocation. Here, the problem is that the array allocation function, that is throwing when given a negative length, causes a deopt rather than using the compiled exception handlers. This is an old workaround, and the flag `StressCompiledExceptionHandlers` to rather use compiled handlers instead of deopting was added in [JDK-8004741](https://bugs.openjdk.org/browse/JDK-8004741) in 2012. This flag is used in testing since october 2022. So maybe it's time to use the compiled exception handlers! I propose to turn them on by default, and instead, add a diagnostic flag to deopt instead, in case something goes wrong. Doing so improve the performance to match the ones of C1 (both for direct array allocation, and `StringBuilder` construction). For instance, with the case given in the JBS issue: Stop at level 0 CompileCommand: compileonly C.test* bool compileonly = true real 0m4,277s user 0m4,214s sys 0m0,117s Stop at level 1 CompileCommand: compileonly C.test* bool compileonly = true real 0m4,104s user 0m4,079s sys 0m0,106s Stop at level 2 CompileCommand: compileonly C.test* bool compileonly = true real 0m4,308s user 0m4,239s sys 0m0,145s Stop at level 3 CompileCommand: compileonly C.test* bool compileonly = true real 0m4,304s user 0m4,247s sys 0m0,132s Default (Stop at level 4) CompileCommand: compileonly C.test* bool compileonly = true real 0m4,086s user 0m4,059s sys 0m0,122s I've run some tests (up to tier10), it seems all fine, ignoring the usual noise. I've checked with @dougxc, it shouldn't impact Graal as it doesn't use `OptoRuntime`. ------------- Commit messages: - Change dev StressCompiledExceptionHandlers into diagnostic DeoptimizeOnAllocationException Changes: https://git.openjdk.org/jdk/pull/25149/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25149&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353638 Stats: 5 lines in 2 files changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25149.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25149/head:pull/25149 PR: https://git.openjdk.org/jdk/pull/25149 From mchevalier at openjdk.org Fri May 9 16:29:28 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 9 May 2025 16:29:28 GMT Subject: RFR: 8355488: Add stress mode for C2 loop peeling Message-ID: Adding a `StressLoopPeeling` dev flag that randomize peeling. ## Semantics For now, the direction I've taken is to randomly take a decision in case of peeling, otherwise, rely on existing heuristics. This requires to distinguish two things: - not inlining because it's not legal: see for instance ```cpp assert(cl->trip_count() > 0, "peeling a fully unrolled loop"); ``` in `PhaseIdealLoop::do_peeling` - not inlining because it doesn't seem profitable. Peeling loops without a good reason (not containing an exiting `If` whose condition is not a member of the loop) but without a concrete way to forbid it should always be allowed. Let's stress it! Peeling too many times is not a great idea either. It uses a lot of memory, of nodes... Also, it may prevent other optimisations from kicking in. And what about interaction with future stress flags? Let's limit peeling: we give a fixed number of opportunities to peel before we give up on peeling for good. That is not the same as limiting the amount of peeling we do. Indeed, if we bound the number of times we say "yes, please, peel" given enough requests, we will always reach the bound. If we limit the number of requests, we have a more evenly distributed amount of peeling, between 0 and the bound. I've tried without the bound: I couldn't find any bug without the bound that would not reproduce with the bound. It only save some legitimate memory problems. Without a bound on the number of peeling opportunities, hotspot eats a lot of memory, but all the allocations seems reasonable: it just seems we ask too much. We could limit the number of nodes, to prevent peeling before we reach the memory limit, but that would also hinder other optimizations and (future) stress flags. ## The Flag The flag is very specialized, unlike a `StressLoopOpts` would be. My idea so far is "let's see". My idea is that it's good to be able to enable stress optimizations selectively, and have a flag like `StressLoopOpts` that would turn them all: we could use the general one in testing, and the finer-grain ones when debugging. A reason for that is that I don't see a real use-case for stressing some features but not others (which would make the number of combinations explode): having (for instance) `+StressLoopUnrolling +StressLoopPeeling` would sometimes behave like `+StressLoopUnrolling -StressLoopPeeling`, and so it's not very useful to test the latter. But once again: let's see what happens. ## On the Code The field `_peeling_rounds_of_node` as an associative array is not very nice. I'd like something like a map `Node -> integer` of some sort, but I couldn't find anything that would allow sparse keys. I had to stuff it in something that would live across peeling-able phases, and there aren't that many such objects: mostly `PhaseIterGVN` and `Compile`. If anyone has another idea, I'd be happy to move it. ------------- Commit messages: - interface - Limit peeling - A simpler implementation - Peel more! - Stress loop peeling Changes: https://git.openjdk.org/jdk/pull/25140/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25140&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355488 Stats: 37 lines in 4 files changed: 35 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25140.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25140/head:pull/25140 PR: https://git.openjdk.org/jdk/pull/25140 From rcastanedalo at openjdk.org Fri May 9 17:04:53 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 9 May 2025 17:04:53 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v5] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 13:46:09 GMT, Daniel Lund?n wrote: >> The current documentation for `PhaseCFG::insert_anti_dependences` is difficult to follow and sometimes even misleading. We should ensure the method is appropriately documented. >> >> ### Changeset >> >> - Rename `PhaseCFG::insert_anti_dependences` to `PhaseCFG::raise_above_anti_dependences`. The purpose of `PhaseCFG::raise_above_anti_dependences` is twofold: raise the load's LCA so that the load is scheduled before anti-dependent stores, and if necessary add anti-dependence edges between the load and certain anti-dependent stores (to ensure we later "raise" the load before anti-dependent stores in LCM). The name `PhaseCFG::insert_anti_dependences` suggests that we only add anti-dependence edges. The name `PhaseCFG::raise_above_anti_dependences`, therefore, seems more appropriate. >> - Significantly add to and revise the source code documentation of `PhaseCFG::raise_above_anti_dependences`. >> - Add, move, and revise `assert`s in `PhaseCFG::raise_above_anti_dependences`, including improved `assert` messages in a few places. >> - In the main worklist loop of `PhaseCFG::raise_above_anti_dependences`: >> - Clean up how we identify the search root (avoid mutation). >> - Add a missing early exit for `Phi` nodes when `LCA == early`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14706896111) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Revise explanation of non_early_stores Looks good (modulo the grammar glitch), thanks! src/hotspot/share/opto/gcm.cpp line 786: > 784: // therefore, stores we find may be in blocks that are on completely distinct > 785: // control-flow paths compared to early. However, in the end, only stores in > 786: // blocks dominated by early matters. The reason for bookkeeping not only Suggestion: // blocks dominated by early matter. The reason for bookkeeping not only ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24926#pullrequestreview-2829090692 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2082116605 From shade at openjdk.org Fri May 9 17:08:42 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 9 May 2025 17:08:42 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v12] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Tracking UMH state more accurately - Rework for safer concurrency - Merge branch 'master' into JDK-8231269-compile-task-weaks - Move to oops - Improve get_method_blocker - Simplify a bit - Merge branch 'master' into JDK-8231269-compile-task-weaks - Do not accept nullptr methods - Attempt at phasing doc - Merge branch 'master' into JDK-8231269-compile-task-weaks - ... and 12 more: https://git.openjdk.org/jdk/compare/ad07426f...1cdbed2b ------------- Changes: https://git.openjdk.org/jdk/pull/24018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=11 Stats: 393 lines in 11 files changed: 331 ins; 25 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From shade at openjdk.org Fri May 9 17:08:42 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 9 May 2025 17:08:42 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v11] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 07:23:39 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Move to oops So... Following up on one forgotten `methodHandle` removal (https://github.com/openjdk/jdk/pull/24018#discussion_r2081467353) got me into a rabbit hole of making new utility class thread-safe. Otherwise, there are TOCTOU issues checking `(Weak)Handle` status, which gets us in trouble real quick. This is normally happens in current tests when external thread goes into `CompilerBroker::wait_for_compilation()` and compiler thread starts moving the `UMH` state for compilation. Relying on un-synchronized `Weak(Handle)` state is not nice either. The answer to all these problems is to track the `UMH` state more accurately, and thus trust `WeakHandle` only sporadically. This is now done in new commit. This also allows for more explicit state checks. And, this allows clearly catching when we try to access `method()` after `release()` -- that is surprisingly happens for `hot_method()` that is not re-initialized always. Chasing this bug also made my head hurt a bit about double-negating `!is_unloaded` checks. It is technically a safety check, so I renamed methods to reflect that: `is_safe`, `make_always_safe`. I will schedule weekend tests for this PR on various machines to see if more bugs fall out once I shake that particular tree. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2867309949 From dlunden at openjdk.org Fri May 9 18:10:09 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 9 May 2025 18:10:09 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v6] In-Reply-To: References: Message-ID: > The current documentation for `PhaseCFG::insert_anti_dependences` is difficult to follow and sometimes even misleading. We should ensure the method is appropriately documented. > > ### Changeset > > - Rename `PhaseCFG::insert_anti_dependences` to `PhaseCFG::raise_above_anti_dependences`. The purpose of `PhaseCFG::raise_above_anti_dependences` is twofold: raise the load's LCA so that the load is scheduled before anti-dependent stores, and if necessary add anti-dependence edges between the load and certain anti-dependent stores (to ensure we later "raise" the load before anti-dependent stores in LCM). The name `PhaseCFG::insert_anti_dependences` suggests that we only add anti-dependence edges. The name `PhaseCFG::raise_above_anti_dependences`, therefore, seems more appropriate. > - Significantly add to and revise the source code documentation of `PhaseCFG::raise_above_anti_dependences`. > - Add, move, and revise `assert`s in `PhaseCFG::raise_above_anti_dependences`, including improved `assert` messages in a few places. > - In the main worklist loop of `PhaseCFG::raise_above_anti_dependences`: > - Clean up how we identify the search root (avoid mutation). > - Add a missing early exit for `Phi` nodes when `LCA == early`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14706896111) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/gcm.cpp Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24926/files - new: https://git.openjdk.org/jdk/pull/24926/files/a4a6a778..11e37390 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24926&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24926&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24926/head:pull/24926 PR: https://git.openjdk.org/jdk/pull/24926 From dlunden at openjdk.org Fri May 9 18:10:09 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 9 May 2025 18:10:09 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v5] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 17:02:06 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Revise explanation of non_early_stores > > src/hotspot/share/opto/gcm.cpp line 786: > >> 784: // therefore, stores we find may be in blocks that are on completely distinct >> 785: // control-flow paths compared to early. However, in the end, only stores in >> 786: // blocks dominated by early matters. The reason for bookkeeping not only > > Suggestion: > > // blocks dominated by early matter. The reason for bookkeeping not only Ah, thanks. Updated! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2082244895 From kvn at openjdk.org Fri May 9 21:57:31 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 9 May 2025 21:57:31 GMT Subject: RFR: 8356192: Enable AOT code caching only on supported platforms Message-ID: <0jJANchQvZPwpb-L02B47hWK9eUIPoZviMEWx1a4Gpo=.d75f0315-bda0-48a7-bffe-4a3af898e24d@github.com> @TheRealMDoerr reported failures in `runtime/cds/appcds` testing on PPC64 after [JDK-8350209](https://bugs.openjdk.org/browse/JDK-8350209) integration. AOT code caching should be limited to supported platforms: x64 and aarch64. Testing: GHA ------------- Commit messages: - 8356192: Enable AOT code caching only on supported platforms Changes: https://git.openjdk.org/jdk/pull/25158/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25158&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356192 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25158.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25158/head:pull/25158 PR: https://git.openjdk.org/jdk/pull/25158 From mdoerr at openjdk.org Fri May 9 22:29:51 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 9 May 2025 22:29:51 GMT Subject: RFR: 8356192: Enable AOT code caching only on supported platforms In-Reply-To: <0jJANchQvZPwpb-L02B47hWK9eUIPoZviMEWx1a4Gpo=.d75f0315-bda0-48a7-bffe-4a3af898e24d@github.com> References: <0jJANchQvZPwpb-L02B47hWK9eUIPoZviMEWx1a4Gpo=.d75f0315-bda0-48a7-bffe-4a3af898e24d@github.com> Message-ID: <322ylX1ymxfEkkQWli70h-6hgjAZc9sdDZRCb6MALTA=.0a6e4775-240a-4069-8668-175dc4f81f63@github.com> On Fri, 9 May 2025 21:53:24 GMT, Vladimir Kozlov wrote: > @TheRealMDoerr reported failures in `runtime/cds/appcds` testing on PPC64 after [JDK-8350209](https://bugs.openjdk.org/browse/JDK-8350209) integration. > > AOT code caching should be limited to supported platforms: x64 and aarch64. > > Testing: GHA Fixing it should also be possible. See https://github.com/openjdk/jdk/pull/25143. On the other hand, this feature is probably not important to have on PPC64. So, disabling `AOTAdapterCaching` should be ok. However, we also need to disable the tests if we decide to do that. java.lang.RuntimeException: 'Adapters: total' missing from stdout/stderr at jdk.test.lib.process.OutputAnalyzer.shouldContain(OutputAnalyzer.java:253) at AOTCodeFlags$Tester.checkExecution(AOTCodeFlags.java:101) at jdk.test.lib.cds.CDSAppTester.executeAndCheck(CDSAppTester.java:203) at jdk.test.lib.cds.CDSAppTester.createAOTCache(CDSAppTester.java:297) at jdk.test.lib.cds.CDSAppTester.runAOTWorkflow(CDSAppTester.java:431) at jdk.test.lib.cds.CDSAppTester.run(CDSAppTester.java:407) at AOTCodeFlags.main(AOTCodeFlags.java:52) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104) at java.base/java.lang.reflect.Method.invoke(Method.java:565) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:335) at java.base/java.lang.Thread.run(Thread.java:1447) ------------- PR Review: https://git.openjdk.org/jdk/pull/25158#pullrequestreview-2829795294 From sviswanathan at openjdk.org Fri May 9 22:55:58 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 9 May 2025 22:55:58 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v19] In-Reply-To: References: Message-ID: <8y5JLR_7BMUJXmNNzPRusDpRWnJHtIPxZodVqQHrmmI=.ca53a482-8572-499f-af9f-6c255cf02896@github.com> On Fri, 9 May 2025 15:17:17 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: > > - Sandhya's review comments resoultion > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352675 > - Addressing Yudi's comments > - Code re-factoring from Vladimir > - Reveiw suggestions incorporated > - Making _features_bitmap size configurable > - cleanups & refactorings > - build fixes for non-x86 targets > - Review comments resolutions > - Updating comment > - ... and 9 more: https://git.openjdk.org/jdk/compare/411a63ea...f583a521 Rest of the PR looks good to me. src/hotspot/cpu/x86/vm_version_x86.cpp line 494: > 492: if (use_evex) { > 493: // check _cpuid_info.sef_cpuid7_ebx.bits.avx512f > 494: // OR check _cpuid_info.std_cpuid24_ebx.bits.avx10 This comment needs to be corrected: // OR check _cpuid_info.sefsl1_cpuid7_edx.bits.avx10 src/hotspot/cpu/x86/vm_version_x86.cpp line 1052: > 1050: if (is_intel()) { // Intel cpus specific settings > 1051: if (is_knights_family()) { > 1052: _features.clear_feature(CPU_VZEROUPPER); Should we be also clearing the CPU_AVX10_1 and CPU_AVX10_2 here? ------------- PR Review: https://git.openjdk.org/jdk/pull/24329#pullrequestreview-2829142420 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2082148591 PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2082570611 From kvn at openjdk.org Fri May 9 23:23:04 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 9 May 2025 23:23:04 GMT Subject: RFR: 8356192: Enable AOT code caching only on supported platforms [v2] In-Reply-To: <0jJANchQvZPwpb-L02B47hWK9eUIPoZviMEWx1a4Gpo=.d75f0315-bda0-48a7-bffe-4a3af898e24d@github.com> References: <0jJANchQvZPwpb-L02B47hWK9eUIPoZviMEWx1a4Gpo=.d75f0315-bda0-48a7-bffe-4a3af898e24d@github.com> Message-ID: > @TheRealMDoerr reported failures in `runtime/cds/appcds` testing on PPC64 after [JDK-8350209](https://bugs.openjdk.org/browse/JDK-8350209) integration. > > AOT code caching should be limited to supported platforms: x64 and aarch64. > > Testing: GHA Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Limit platforms to run AOTCode test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25158/files - new: https://git.openjdk.org/jdk/pull/25158/files/6241b222..0394bbc6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25158&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25158&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25158.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25158/head:pull/25158 PR: https://git.openjdk.org/jdk/pull/25158 From jbhateja at openjdk.org Fri May 9 23:36:16 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 9 May 2025 23:36:16 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v20] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/f583a521..b4654fa4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=18-19 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From sviswanathan at openjdk.org Fri May 9 23:36:16 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 9 May 2025 23:36:16 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v20] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 23:33:42 GMT, Jatin Bhateja wrote: >> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. >> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. >> - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. >> - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. >> >> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. >> >> The patch has been regressed through tier1 and jvmci tests >> >> Please review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24329#pullrequestreview-2829900271 From jbhateja at openjdk.org Fri May 9 23:36:16 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 9 May 2025 23:36:16 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v10] In-Reply-To: <3t1R35B9bafRtfvqfE7D2dAeLrjaDukXlDUGb-3VtaA=.46d64318-e9fb-4bf3-8a68-8dba2c2b7b26@github.com> References: <3t1R35B9bafRtfvqfE7D2dAeLrjaDukXlDUGb-3VtaA=.46d64318-e9fb-4bf3-8a68-8dba2c2b7b26@github.com> Message-ID: On Sat, 3 May 2025 08:13:11 GMT, Vladimir Ivanov wrote: >>> Ok, thanks! I wasn't sure you finished the pass. >>> >>> I'm still seeing dynamic memory allocation which IMO unnecessarily complicates the implementation. Bitmap size is fixed and well-known at compile time. It enables `VM_Feature` class to embed the array of proper size inline. And it eliminates all the problems related to undesired sharing of backed array. (Also, `pre_initialize()` is not needed as well.) >> >> Bitmap size depends on the maximum feature enum value, I made it dynamic to keep it flexible. Do you want the feature vector size to be made constant and manually bump it when we exhaust the limit? > >> Bitmap size depends on the maximum feature enum value, I made it dynamic to keep it flexible. Do you want the feature vector size to be made constant and manually bump it when we exhaust the limit? > > Yes, please. (The limit may be precise - number of elements in Feature_Flag enum - but the logic which computes the size of backing array can automatically round it and bump the size once the actual limit is reached.) > >> pre_initialize was put in place because codeCache_init() proceeds VM_Version_init() > > I wanted to say that the sole purpose of `pre_initialize` is to allocate memory. Once it goes away, there's no reason to keep it. Thanks @iwanowww , @sviswa7 , @mur47x111 , @merykitty for your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24329#issuecomment-2868092244 From jbhateja at openjdk.org Fri May 9 23:36:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 9 May 2025 23:36:17 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v19] In-Reply-To: <8y5JLR_7BMUJXmNNzPRusDpRWnJHtIPxZodVqQHrmmI=.ca53a482-8572-499f-af9f-6c255cf02896@github.com> References: <8y5JLR_7BMUJXmNNzPRusDpRWnJHtIPxZodVqQHrmmI=.ca53a482-8572-499f-af9f-6c255cf02896@github.com> Message-ID: On Fri, 9 May 2025 22:23:41 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: >> >> - Sandhya's review comments resoultion >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352675 >> - Addressing Yudi's comments >> - Code re-factoring from Vladimir >> - Reveiw suggestions incorporated >> - Making _features_bitmap size configurable >> - cleanups & refactorings >> - build fixes for non-x86 targets >> - Review comments resolutions >> - Updating comment >> - ... and 9 more: https://git.openjdk.org/jdk/compare/411a63ea...f583a521 > > src/hotspot/cpu/x86/vm_version_x86.cpp line 1052: > >> 1050: if (is_intel()) { // Intel cpus specific settings >> 1051: if (is_knights_family()) { >> 1052: _features.clear_feature(CPU_VZEROUPPER); > > Should we be also clearing the CPU_AVX10_1 and CPU_AVX10_2 here? I agree; it may help validate KNL on Diamond Rapids :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2082628062 From jbhateja at openjdk.org Fri May 9 23:36:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 9 May 2025 23:36:17 GMT Subject: Integrated: 8352675: Support Intel AVX10 converged vector ISA feature detection In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 13:57:22 GMT, Jatin Bhateja wrote: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html This pull request has now been integrated. Changeset: 3b336a9d Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/3b336a9da091c4df4373d2b845b60d2a7a4e3b1d Stats: 522 lines in 15 files changed: 273 ins; 29 del; 220 mod 8352675: Support Intel AVX10 converged vector ISA feature detection Reviewed-by: sviswanathan, vlivanov, yzheng ------------- PR: https://git.openjdk.org/jdk/pull/24329 From dlong at openjdk.org Fri May 9 23:58:51 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 9 May 2025 23:58:51 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock [v2] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 15:22:56 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> This PR addresses an `assert(bb->is_reachable())` that is triggered in the code for `-XX:+VerifyStack` after a deoptimization with reason `null_assert_or_unreached0` at a `getstatic` bytecode. Following the `getstatic` is an `areturn` and then an unreachable bytecode. When the code for `VerifyStack` tries to compute an oop map for the basic block of the unreachable bytecode, the assert triggers: >> >> getstatic Field A.val:"LB"; // if class B is not loaded, C2 deopts with reason "null_assert_or_unreached0" >> areturn; >> // The following is unreachable >> iconst_0; >> >> >> This is a similar problem to [JDK-8271055](https://bugs.openjdk.org/browse/JDK-8271055) (#7331), but this particular deopt with reason `null_assert_or_unreached0` at `getstatic` of a field containing an object reference [deopts at the next bytecode](https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/opto/parse3.cpp#L176-L199). The aforementioned issue introduced a check to skip stack verification of the next bytecode in the code if the execution after the deopted bytecode does not continue at the next bytecode in the code, i.e. falls through to the next bytecode. Unfortunately, this check did not include `areturn` as a bytecode that does not fall-through: >> https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/runtime/deoptimization.cpp#L845-L856 >> >> # Change Summary >> >> To fix the immediate issue described above, this PR adds `areturn` to the list of bytecodes that does not fall through. However, all return bytecodes exhibit the same behavior and might be susceptible to a similar issue. Even though I was not able to reproduce the same crash with `{d,f,i,l}return` because I could not get those or the preceding bytecode to deopt, I also added them to the `falls_through()` function. For the remaining bytecodes in `falls_through()` with the exception of `athrow` I wrote a regression test. >> >> # Testing >> >> - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14595928439) >> - [x] tier1 through tier3 on Oracle supported platforms and OSs plus Oracle internal testing >> >> # Acknowledgements >> Special thanks to @eme64 for his hard work on reducing a reproducer that works on all platforms. > > Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: > > - Add lookupswitch and tableswitch > - Elaborate why we need that class file version > - Reorganized tests BTW, I think determining whether the instruction after a `jsr` is reachable is not so easy. Normally if the subroutine does a `ret` then we return to the instruction after the jsr, but I think the subroutine can also do things like throw an exception or loop forever, so it's probably better to treat `jsr` the same as `goto` and add it to `falls_through`. But now that I think about it, I'm wondering if this special logic is still needed by VerifyStack. It might not be needed at all, or it might only be needed for specific situations, like an invoke instruction or an instruction with the reexecute flag set. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25118#issuecomment-2868114758 From dlong at openjdk.org Sat May 10 00:04:50 2025 From: dlong at openjdk.org (Dean Long) Date: Sat, 10 May 2025 00:04:50 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock [v2] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 15:22:56 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> This PR addresses an `assert(bb->is_reachable())` that is triggered in the code for `-XX:+VerifyStack` after a deoptimization with reason `null_assert_or_unreached0` at a `getstatic` bytecode. Following the `getstatic` is an `areturn` and then an unreachable bytecode. When the code for `VerifyStack` tries to compute an oop map for the basic block of the unreachable bytecode, the assert triggers: >> >> getstatic Field A.val:"LB"; // if class B is not loaded, C2 deopts with reason "null_assert_or_unreached0" >> areturn; >> // The following is unreachable >> iconst_0; >> >> >> This is a similar problem to [JDK-8271055](https://bugs.openjdk.org/browse/JDK-8271055) (#7331), but this particular deopt with reason `null_assert_or_unreached0` at `getstatic` of a field containing an object reference [deopts at the next bytecode](https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/opto/parse3.cpp#L176-L199). The aforementioned issue introduced a check to skip stack verification of the next bytecode in the code if the execution after the deopted bytecode does not continue at the next bytecode in the code, i.e. falls through to the next bytecode. Unfortunately, this check did not include `areturn` as a bytecode that does not fall-through: >> https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/runtime/deoptimization.cpp#L845-L856 >> >> # Change Summary >> >> To fix the immediate issue described above, this PR adds `areturn` to the list of bytecodes that does not fall through. However, all return bytecodes exhibit the same behavior and might be susceptible to a similar issue. Even though I was not able to reproduce the same crash with `{d,f,i,l}return` because I could not get those or the preceding bytecode to deopt, I also added them to the `falls_through()` function. For the remaining bytecodes in `falls_through()` with the exception of `athrow` I wrote a regression test. >> >> # Testing >> >> - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14595928439) >> - [x] tier1 through tier3 on Oracle supported platforms and OSs plus Oracle internal testing >> >> # Acknowledgements >> Special thanks to @eme64 for his hard work on reducing a reproducer that works on all platforms. > > Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: > > - Add lookupswitch and tableswitch > - Elaborate why we need that class file version > - Reorganized tests 957 // The interpreter oop map generator reports results before 958 // the current bytecode has executed except in the case of 959 // calls. It seems to be hard to tell whether the compiler 960 // has emitted debug information matching the "state before" 961 // a given bytecode or the state after, so we try both The comment justifying this logic claims the compiler can emit debug information with the "after" state instead of the "before" state. If this is not true, then we can remove this VerifyStack logic. If it is true, it would be good to understand exactly under what circumstances it can happen. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25118#issuecomment-2868120219 From vlivanov at openjdk.org Sat May 10 03:18:03 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 10 May 2025 03:18:03 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Fri, 9 May 2025 16:06:13 GMT, Marc Chevalier wrote: > My goal is to start with control-less nodes, but find the late control during loop optimization, control-pin them at this point (because that's when the information is available) with both control input and output (needed for the expansion in CallLeaf), and continuing with control-pinned nodes. If you combine lowering with pinning, you could replace a data node with a CFG node (CallLeaf in your case) at the point in CFG you choose. A single CFG node is enough to insert a CFG-only node, but you need to ensure the graph stays schedulable after the insertion. If you want to start with pinned node, the simplest way would be to make `CallPure` a subclass of `CallLeaf`, require it to be CFG-only (no memory in/out, no IO, etc) and populate only control in/out when inserting it into the graph during parsing. > For now, I'm happy with the control I get from parsing. Keep in mind that it assumes the node is pinned in CFG from the very beginning. Once the node starts in data-only mode, the control input it gained during parsing may end up too early for node's inputs to be scheduleable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2868277578 From fyang at openjdk.org Sat May 10 03:21:51 2025 From: fyang at openjdk.org (Fei Yang) Date: Sat, 10 May 2025 03:21:51 GMT Subject: RFR: 8356192: Enable AOT code caching only on supported platforms [v2] In-Reply-To: References: <0jJANchQvZPwpb-L02B47hWK9eUIPoZviMEWx1a4Gpo=.d75f0315-bda0-48a7-bffe-4a3af898e24d@github.com> Message-ID: <-TcGxaFy_u5sBfFuuCCDErFThK3ASl7IvsVk7Hiv9os=.bfe1f125-5c24-4b27-998d-f1666c7b92c3@github.com> On Fri, 9 May 2025 23:23:04 GMT, Vladimir Kozlov wrote: >> @TheRealMDoerr reported failures in `runtime/cds/appcds` testing on PPC64 after [JDK-8350209](https://bugs.openjdk.org/browse/JDK-8350209) integration. >> >> AOT code caching should be limited to supported platforms: x64 and aarch64. >> >> Testing: GHA > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Limit platforms to run AOTCode test LGTM. I am witnessing similar issues on linux-riscv as well. `make test TEST="runtime/cds/appcds"` passed on linux-riscv after this change. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25158#pullrequestreview-2830144123 From fyang at openjdk.org Sat May 10 04:17:51 2025 From: fyang at openjdk.org (Fei Yang) Date: Sat, 10 May 2025 04:17:51 GMT Subject: RFR: 8356593: RISC-V: Small improvement to array fill stub In-Reply-To: References: Message-ID: <2WA4b6KL8VLRQmDcyxsKv9RBWlX6oR7gU8gdiZRBBhU=.352278ea-a6ac-4681-95da-f9a64d6405e5@github.com> On Fri, 9 May 2025 03:19:11 GMT, Anjian-Wen wrote: > When working on [JDK-8351140](https://bugs.openjdk.org/browse/JDK-8351140), I witnessed possible misaligned memory access in array fill stub. > We fill by element for short arrays (< 8 bytes), which assumes a heapword alignment[1]. But that is not guaranteed. This issue could be reproduced by running: `make test TEST="micro:vm.compiler.ArrayFill"` > with `@Param("5") private int size;` on riscv platforms with slow misalgned memory accesses. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L2141 > > This fixes this issue by using a small loop to fill the elements for short arrays. > Tier1-2 tested on linux-riscv64 platform. JMH result on P550 SBC for reference: > 1. (`@Param("5") private int size;`): > > Before: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 5 avgt 12 558.781 ? 1.396 ns/op > ArrayFill.fillIntArray 5 avgt 12 29.346 ? 0.003 ns/op > ArrayFill.fillShortArray 5 avgt 12 30.779 ? 0.004 ns/op > ArrayFill.zeroByteArray 5 avgt 12 559.249 ? 1.909 ns/op > ArrayFill.zeroIntArray 5 avgt 12 29.346 ? 0.002 ns/op > ArrayFill.zeroShortArray 5 avgt 12 30.777 ? 0.006 ns/op > > After: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 5 avgt 12 23.977 ? 0.004 ns/op > ArrayFill.fillIntArray 5 avgt 12 29.343 ? 0.004 ns/op > ArrayFill.fillShortArray 5 avgt 12 30.776 ? 0.005 ns/op > ArrayFill.zeroByteArray 5 avgt 12 23.977 ? 0.002 ns/op > ArrayFill.zeroIntArray 5 avgt 12 29.345 ? 0.005 ns/op > ArrayFill.zeroShortArray 5 avgt 12 30.776 ? 0.004 ns/op > > 2. (`@Param("3") private int size;`): > > Before: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 3 avgt 12 428.923 ? 0.409 ns/op > ArrayFill.fillIntArray 3 avgt 12 28.629 ? 0.005 ns/op > ArrayFill.fillShortArray 3 avgt 12 558.872 ? 2.641 ns/op > ArrayFill.zeroByteArray 3 avgt 12 429.744 ? 2.049 ns/op > ArrayFill.zeroIntArray 3 avgt 12 28.628 ? 0.002 ns/op > ArrayFill.zeroShortArray 3 avgt 12 557.682 ? 1.661 ns/op > > After: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 3 avgt 12 21.471 ? 0.002 ns/op > ArrayFill.fillIntArray 3 avgt 12 28.631 ? 0.003 ns/op > ArrayFill.fillShortArray 3 avgt 12 20.436 ? 0.288 n... PS: Would you mind adding following small change while you are on it? This optimizes out three register-register moves in this array fill stub. [25135-addon.diff.txt](https://github.com/user-attachments/files/20133615/25135-addon.diff.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25135#issuecomment-2868324414 From qamai at openjdk.org Sat May 10 05:26:52 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 10 May 2025 05:26:52 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 13:18:33 GMT, Marc Chevalier wrote: > A first part toward a better support of pure functions. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! > > IR framework and IGV needed a little bit of fixing. > > Thanks, > Marc I think a very simple approach you can take is having `CallPureNode` as a pure data node. It does not have to have anything to do with `CallNode` (no lowering into a `CallNode`, no subclass from `CallNode`) and it can have its mach implementation like this: instruct pureCall1F(xmm0 dst, xmm0 src) %{ match(Set dst (CallPure src)); effect(CALL); format %{ __ call(/*something*/); %} %} ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2868400653 From duke at openjdk.org Sat May 10 05:56:46 2025 From: duke at openjdk.org (Anjian-Wen) Date: Sat, 10 May 2025 05:56:46 GMT Subject: RFR: 8356593: RISC-V: Small improvement to array fill stub [v2] In-Reply-To: References: Message-ID: <_peFZRHmDi_lsVBqofs4Is3cAyclhaFhUbT-AAmO0bE=.0dea8eb5-5961-47d2-bda6-0c626015d4cd@github.com> > When working on [JDK-8351140](https://bugs.openjdk.org/browse/JDK-8351140), I witnessed possible misaligned memory access in array fill stub. > We fill by element for short arrays (< 8 bytes), which assumes a heapword alignment[1]. But that is not guaranteed. This issue could be reproduced by running: `make test TEST="micro:vm.compiler.ArrayFill"` > with `@Param("5") private int size;` on riscv platforms with slow misalgned memory accesses. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L2141 > > This fixes this issue by using a small loop to fill the elements for short arrays. > Tier1-2 tested on linux-riscv64 platform. JMH result on P550 SBC for reference: > 1. (`@Param("5") private int size;`): > > Before: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 5 avgt 12 558.781 ? 1.396 ns/op > ArrayFill.fillIntArray 5 avgt 12 29.346 ? 0.003 ns/op > ArrayFill.fillShortArray 5 avgt 12 30.779 ? 0.004 ns/op > ArrayFill.zeroByteArray 5 avgt 12 559.249 ? 1.909 ns/op > ArrayFill.zeroIntArray 5 avgt 12 29.346 ? 0.002 ns/op > ArrayFill.zeroShortArray 5 avgt 12 30.777 ? 0.006 ns/op > > After: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 5 avgt 12 23.977 ? 0.004 ns/op > ArrayFill.fillIntArray 5 avgt 12 29.343 ? 0.004 ns/op > ArrayFill.fillShortArray 5 avgt 12 30.776 ? 0.005 ns/op > ArrayFill.zeroByteArray 5 avgt 12 23.977 ? 0.002 ns/op > ArrayFill.zeroIntArray 5 avgt 12 29.345 ? 0.005 ns/op > ArrayFill.zeroShortArray 5 avgt 12 30.776 ? 0.004 ns/op > > 2. (`@Param("3") private int size;`): > > Before: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 3 avgt 12 428.923 ? 0.409 ns/op > ArrayFill.fillIntArray 3 avgt 12 28.629 ? 0.005 ns/op > ArrayFill.fillShortArray 3 avgt 12 558.872 ? 2.641 ns/op > ArrayFill.zeroByteArray 3 avgt 12 429.744 ? 2.049 ns/op > ArrayFill.zeroIntArray 3 avgt 12 28.628 ? 0.002 ns/op > ArrayFill.zeroShortArray 3 avgt 12 557.682 ? 1.661 ns/op > > After: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 3 avgt 12 21.471 ? 0.002 ns/op > ArrayFill.fillIntArray 3 avgt 12 28.631 ? 0.003 ns/op > ArrayFill.fillShortArray 3 avgt 12 20.436 ? 0.288 n... Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: register optimize ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25135/files - new: https://git.openjdk.org/jdk/pull/25135/files/a89a0ba3..a668825b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25135&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25135&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25135.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25135/head:pull/25135 PR: https://git.openjdk.org/jdk/pull/25135 From duke at openjdk.org Sat May 10 05:56:46 2025 From: duke at openjdk.org (Anjian-Wen) Date: Sat, 10 May 2025 05:56:46 GMT Subject: RFR: 8356593: RISC-V: Small improvement to array fill stub In-Reply-To: References: Message-ID: On Fri, 9 May 2025 03:19:11 GMT, Anjian-Wen wrote: > When working on [JDK-8351140](https://bugs.openjdk.org/browse/JDK-8351140), I witnessed possible misaligned memory access in array fill stub. > We fill by element for short arrays (< 8 bytes), which assumes a heapword alignment[1]. But that is not guaranteed. This issue could be reproduced by running: `make test TEST="micro:vm.compiler.ArrayFill"` > with `@Param("5") private int size;` on riscv platforms with slow misalgned memory accesses. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L2141 > > This fixes this issue by using a small loop to fill the elements for short arrays. > Tier1-2 tested on linux-riscv64 platform. JMH result on P550 SBC for reference: > 1. (`@Param("5") private int size;`): > > Before: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 5 avgt 12 558.781 ? 1.396 ns/op > ArrayFill.fillIntArray 5 avgt 12 29.346 ? 0.003 ns/op > ArrayFill.fillShortArray 5 avgt 12 30.779 ? 0.004 ns/op > ArrayFill.zeroByteArray 5 avgt 12 559.249 ? 1.909 ns/op > ArrayFill.zeroIntArray 5 avgt 12 29.346 ? 0.002 ns/op > ArrayFill.zeroShortArray 5 avgt 12 30.777 ? 0.006 ns/op > > After: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 5 avgt 12 23.977 ? 0.004 ns/op > ArrayFill.fillIntArray 5 avgt 12 29.343 ? 0.004 ns/op > ArrayFill.fillShortArray 5 avgt 12 30.776 ? 0.005 ns/op > ArrayFill.zeroByteArray 5 avgt 12 23.977 ? 0.002 ns/op > ArrayFill.zeroIntArray 5 avgt 12 29.345 ? 0.005 ns/op > ArrayFill.zeroShortArray 5 avgt 12 30.776 ? 0.004 ns/op > > 2. (`@Param("3") private int size;`): > > Before: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 3 avgt 12 428.923 ? 0.409 ns/op > ArrayFill.fillIntArray 3 avgt 12 28.629 ? 0.005 ns/op > ArrayFill.fillShortArray 3 avgt 12 558.872 ? 2.641 ns/op > ArrayFill.zeroByteArray 3 avgt 12 429.744 ? 2.049 ns/op > ArrayFill.zeroIntArray 3 avgt 12 28.628 ? 0.002 ns/op > ArrayFill.zeroShortArray 3 avgt 12 557.682 ? 1.661 ns/op > > After: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 3 avgt 12 21.471 ? 0.002 ns/op > ArrayFill.fillIntArray 3 avgt 12 28.631 ? 0.003 ns/op > ArrayFill.fillShortArray 3 avgt 12 20.436 ? 0.288 n... Thats looks good, done! > PS: Would you mind adding following small change while you are on it? This optimizes out three register-register moves in this array fill stub. [25135-addon.diff.txt](https://github.com/user-attachments/files/20133615/25135-addon.diff.txt) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25135#issuecomment-2868427158 From fyang at openjdk.org Sat May 10 06:47:54 2025 From: fyang at openjdk.org (Fei Yang) Date: Sat, 10 May 2025 06:47:54 GMT Subject: RFR: 8356593: RISC-V: Small improvement to array fill stub [v2] In-Reply-To: <_peFZRHmDi_lsVBqofs4Is3cAyclhaFhUbT-AAmO0bE=.0dea8eb5-5961-47d2-bda6-0c626015d4cd@github.com> References: <_peFZRHmDi_lsVBqofs4Is3cAyclhaFhUbT-AAmO0bE=.0dea8eb5-5961-47d2-bda6-0c626015d4cd@github.com> Message-ID: <8U2XppIHilycBx-IMad6fJN-V-eDO4z-tYdajwx4EHI=.6e84675b-f4e8-4165-9b0e-3bdc5ca6f46c@github.com> On Sat, 10 May 2025 05:56:46 GMT, Anjian-Wen wrote: >> When working on [JDK-8351140](https://bugs.openjdk.org/browse/JDK-8351140), I witnessed possible misaligned memory access in array fill stub. >> We fill by element for short arrays (< 8 bytes), which assumes a heapword alignment[1]. But that is not guaranteed. This issue could be reproduced by running: `make test TEST="micro:vm.compiler.ArrayFill"` >> with `@Param("5") private int size;` on riscv platforms with slow misalgned memory accesses. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L2141 >> >> This fixes this issue by using a small loop to fill the elements for short arrays. >> Tier1-2 tested on linux-riscv64 platform. JMH result on P550 SBC for reference: >> 1. (`@Param("5") private int size;`): >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 5 avgt 12 558.781 ? 1.396 ns/op >> ArrayFill.fillIntArray 5 avgt 12 29.346 ? 0.003 ns/op >> ArrayFill.fillShortArray 5 avgt 12 30.779 ? 0.004 ns/op >> ArrayFill.zeroByteArray 5 avgt 12 559.249 ? 1.909 ns/op >> ArrayFill.zeroIntArray 5 avgt 12 29.346 ? 0.002 ns/op >> ArrayFill.zeroShortArray 5 avgt 12 30.777 ? 0.006 ns/op >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 5 avgt 12 23.977 ? 0.004 ns/op >> ArrayFill.fillIntArray 5 avgt 12 29.343 ? 0.004 ns/op >> ArrayFill.fillShortArray 5 avgt 12 30.776 ? 0.005 ns/op >> ArrayFill.zeroByteArray 5 avgt 12 23.977 ? 0.002 ns/op >> ArrayFill.zeroIntArray 5 avgt 12 29.345 ? 0.005 ns/op >> ArrayFill.zeroShortArray 5 avgt 12 30.776 ? 0.004 ns/op >> >> 2. (`@Param("3") private int size;`): >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 3 avgt 12 428.923 ? 0.409 ns/op >> ArrayFill.fillIntArray 3 avgt 12 28.629 ? 0.005 ns/op >> ArrayFill.fillShortArray 3 avgt 12 558.872 ? 2.641 ns/op >> ArrayFill.zeroByteArray 3 avgt 12 429.744 ? 2.049 ns/op >> ArrayFill.zeroIntArray 3 avgt 12 28.628 ? 0.002 ns/op >> ArrayFill.zeroShortArray 3 avgt 12 557.682 ? 1.661 ns/op >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 3 avgt 12 21.471 ? 0.002 ns/op >> ArrayFill.fillIntArray 3 avgt 12... > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > register optimize Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25135#pullrequestreview-2830465437 From kvn at openjdk.org Sat May 10 15:03:50 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 10 May 2025 15:03:50 GMT Subject: RFR: 8356192: Enable AOT code caching only on supported platforms [v2] In-Reply-To: <-TcGxaFy_u5sBfFuuCCDErFThK3ASl7IvsVk7Hiv9os=.bfe1f125-5c24-4b27-998d-f1666c7b92c3@github.com> References: <0jJANchQvZPwpb-L02B47hWK9eUIPoZviMEWx1a4Gpo=.d75f0315-bda0-48a7-bffe-4a3af898e24d@github.com> <-TcGxaFy_u5sBfFuuCCDErFThK3ASl7IvsVk7Hiv9os=.bfe1f125-5c24-4b27-998d-f1666c7b92c3@github.com> Message-ID: <9mQAfiyhzQji6AVwDxrGMnWQ2FqXTGJi3Io7wejVw4o=.c299e1be-cfdc-458a-abf0-1abee2876fc2@github.com> On Sat, 10 May 2025 03:19:07 GMT, Fei Yang wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Limit platforms to run AOTCode test > > LGTM. I am witnessing similar issues on linux-riscv as well. > `make test TEST="runtime/cds/appcds"` passed on linux-riscv after this change. Thank you, @RealFYang for review and testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25158#issuecomment-2868951537 From dlong at openjdk.org Sat May 10 19:19:51 2025 From: dlong at openjdk.org (Dean Long) Date: Sat, 10 May 2025 19:19:51 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock [v2] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 15:22:56 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> This PR addresses an `assert(bb->is_reachable())` that is triggered in the code for `-XX:+VerifyStack` after a deoptimization with reason `null_assert_or_unreached0` at a `getstatic` bytecode. Following the `getstatic` is an `areturn` and then an unreachable bytecode. When the code for `VerifyStack` tries to compute an oop map for the basic block of the unreachable bytecode, the assert triggers: >> >> getstatic Field A.val:"LB"; // if class B is not loaded, C2 deopts with reason "null_assert_or_unreached0" >> areturn; >> // The following is unreachable >> iconst_0; >> >> >> This is a similar problem to [JDK-8271055](https://bugs.openjdk.org/browse/JDK-8271055) (#7331), but this particular deopt with reason `null_assert_or_unreached0` at `getstatic` of a field containing an object reference [deopts at the next bytecode](https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/opto/parse3.cpp#L176-L199). The aforementioned issue introduced a check to skip stack verification of the next bytecode in the code if the execution after the deopted bytecode does not continue at the next bytecode in the code, i.e. falls through to the next bytecode. Unfortunately, this check did not include `areturn` as a bytecode that does not fall-through: >> https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/runtime/deoptimization.cpp#L845-L856 >> >> # Change Summary >> >> To fix the immediate issue described above, this PR adds `areturn` to the list of bytecodes that does not fall through. However, all return bytecodes exhibit the same behavior and might be susceptible to a similar issue. Even though I was not able to reproduce the same crash with `{d,f,i,l}return` because I could not get those or the preceding bytecode to deopt, I also added them to the `falls_through()` function. For the remaining bytecodes in `falls_through()` with the exception of `athrow` I wrote a regression test. >> >> # Testing >> >> - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14595928439) >> - [x] tier1 through tier3 on Oracle supported platforms and OSs plus Oracle internal testing >> >> # Acknowledgements >> Special thanks to @eme64 for his hard work on reducing a reproducer that works on all platforms. > > Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: > > - Add lookupswitch and tableswitch > - Elaborate why we need that class file version > - Reorganized tests I forgot that uncommon traps always use reexecute semantics, even if they don't explicitly set the reexecute flag on the debug info. That's why they can update the JVM state and advance to the next bci. It seems like it would be safe to skip the falls_through/try_next_mask logic for the top frame if we are reexecuting at the current bci, but that is a riskier change. BTW, because some uncommon traps advance to the next bci, code like Deoptimization::gather_statistics() will not be gathering statistics for the bytecode that causes the trap, making the statistics less meaningful. It seems better to allow uncommon traps to use the current bci instead and let the deoptimization code advance it if the reexecute flag is not set. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25118#issuecomment-2869119219 From jbhateja at openjdk.org Sun May 11 07:55:01 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 11 May 2025 07:55:01 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs Message-ID: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> This is a follow-up PR#22755 to improve Float16 operations inferencing. The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. Best Regards, Jatin ------------- Commit messages: - Adding test points and some constant folding cleanups - Merge branch 'master' of https://github.com/openjdk/jdk into JDK-8352635 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352635 - 8352635: Improve inferencing of Float16 operations with constant inputs Changes: https://git.openjdk.org/jdk/pull/24179/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352635 Stats: 330 lines in 5 files changed: 199 ins; 75 del; 56 mod Patch: https://git.openjdk.org/jdk/pull/24179.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24179/head:pull/24179 PR: https://git.openjdk.org/jdk/pull/24179 From jbhateja at openjdk.org Sun May 11 07:56:52 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 11 May 2025 07:56:52 GMT Subject: RFR: 8355708: Two Float16 IR tests fail after JDK-8345125 In-Reply-To: References: Message-ID: <7caj9HDo6fsjr1Pz538KhersIBUeaQcppGjZu6cjWh4=.b9ddb988-ed5e-4c25-9b16-54c86cdf92e2@github.com> On Fri, 9 May 2025 13:17:42 GMT, Bhavana Kilambi wrote: > Two FP16 tests fail due to IR verification failure in JTREG. Increased the warmup time to 10000 to make sure it is being compiled by c2 and the expected IR is being generated. > > Testing: > Tested both the testcases with and without these options - `"-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation"` and they pass successfully on aarch64. LGTM ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25141#pullrequestreview-2831296833 From syan at openjdk.org Sun May 11 14:36:50 2025 From: syan at openjdk.org (SendaoYan) Date: Sun, 11 May 2025 14:36:50 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v2] In-Reply-To: References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: On Fri, 9 May 2025 15:39:35 GMT, Andrew Haley wrote: >> This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). >> >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op >> MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op >> MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op >> MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > generate_unsafecopy_common_error_exit Changes requested by syan (Committer). test/micro/org/openjdk/bench/java/lang/foreign/MemorySegmentFillUnsafe.java line 2: > 1: /* > 2: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. 2025 maybe more suitable ------------- PR Review: https://git.openjdk.org/jdk/pull/25147#pullrequestreview-2831415179 PR Review Comment: https://git.openjdk.org/jdk/pull/25147#discussion_r2083540891 From jkarthikeyan at openjdk.org Mon May 12 02:35:50 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 12 May 2025 02:35:50 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v3] In-Reply-To: References: Message-ID: > Hi all, > This is a small patch that improves the implementation of Value() for `AbsINode` and `AbsLNode` by returning the absolute value of the input range. Most of the logic is trivial except for the special case where `_lo == jint_min/jlong_min` which must return the entire type range when encountered, for which I've added a small proof in the comments. I've also added some unit tests and updated the file to limit IR check platforms with more granularity. > > Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Replace uabs usage with ABS - Merge branch 'master' into abs-value - Merge - Improve AbsNode::Value ------------- Changes: https://git.openjdk.org/jdk/pull/23685/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23685&range=02 Stats: 155 lines in 2 files changed: 140 ins; 4 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/23685.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23685/head:pull/23685 PR: https://git.openjdk.org/jdk/pull/23685 From jkarthikeyan at openjdk.org Mon May 12 02:35:53 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 12 May 2025 02:35:53 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 14:48:20 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This is a small patch that improves the implementation of Value() for `AbsINode` and `AbsLNode` by returning the absolute value of the input range. Most of the logic is trivial except for the special case where `_lo == jint_min/jlong_min` which must return the entire type range when encountered, for which I've added a small proof in the comments. I've also added some unit tests and updated the file to limit IR check platforms with more granularity. >> >> Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge > - Improve AbsNode::Value Thank you all for the comments! I've pushed an update that refactors the method to check for `min_value` ahead of doing abs, so that we can safely use `ABS()` instead of `uabs()`. I've refactored the behavior for constants to avoid using `uabs` there as well. A re-review would be appreciated! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23685#issuecomment-2870549896 From jkarthikeyan at openjdk.org Mon May 12 02:43:00 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 12 May 2025 02:43:00 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v2] In-Reply-To: <8E2H46eKkBbimhaPlfDf1qyYsB-bLC7Y5JsZmGCx9rU=.7db1d0ec-b452-4643-a9b5-58ae608de595@github.com> References: <8E2H46eKkBbimhaPlfDf1qyYsB-bLC7Y5JsZmGCx9rU=.7db1d0ec-b452-4643-a9b5-58ae608de595@github.com> Message-ID: <4pWjq5DICx7Oe4WrzYeYepJhHW3AntDEV1-0EXmkaew=.ebd4fc4b-a966-4b86-96c4-ef5e6615affa@github.com> On Mon, 7 Apr 2025 08:27:24 GMT, Damon Fenacci wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge >> - Improve AbsNode::Value > > test/hotspot/jtreg/compiler/c2/irTests/TestIRAbs.java line 249: > >> 247: @DontCompile >> 248: public void checkIntRange(int i) { >> 249: Asserts.assertEquals(Math.abs((i & 7) - 4) > 4, testIntRange1(i)); > > Cool improvement @jaskarth! > It might not be directly related to your optimization and marginally relevant but I was wondering if it would make sense to widen the choice of constants a bit (maybe adding few more or some randomly generated ones)? Thanks for taking a look! Currently we're using the Generators API to create randomly distributed integers to test with (in `checkIntRanges`, from which `checkIntRange` is called) while the constants here are to force the AbsNode to have a range of `[-4, 3]` to ensure that it can be statically optimized away. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23685#discussion_r2083729000 From duke at openjdk.org Mon May 12 02:48:55 2025 From: duke at openjdk.org (duke) Date: Mon, 12 May 2025 02:48:55 GMT Subject: RFR: 8356593: RISC-V: Small improvement to array fill stub [v2] In-Reply-To: <_peFZRHmDi_lsVBqofs4Is3cAyclhaFhUbT-AAmO0bE=.0dea8eb5-5961-47d2-bda6-0c626015d4cd@github.com> References: <_peFZRHmDi_lsVBqofs4Is3cAyclhaFhUbT-AAmO0bE=.0dea8eb5-5961-47d2-bda6-0c626015d4cd@github.com> Message-ID: On Sat, 10 May 2025 05:56:46 GMT, Anjian-Wen wrote: >> When working on [JDK-8351140](https://bugs.openjdk.org/browse/JDK-8351140), I witnessed possible misaligned memory access in array fill stub. >> We fill by element for short arrays (< 8 bytes), which assumes a heapword alignment[1]. But that is not guaranteed. This issue could be reproduced by running: `make test TEST="micro:vm.compiler.ArrayFill"` >> with `@Param("5") private int size;` on riscv platforms with slow misalgned memory accesses. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L2141 >> >> This fixes this issue by using a small loop to fill the elements for short arrays. >> Tier1-2 tested on linux-riscv64 platform. JMH result on P550 SBC for reference: >> 1. (`@Param("5") private int size;`): >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 5 avgt 12 558.781 ? 1.396 ns/op >> ArrayFill.fillIntArray 5 avgt 12 29.346 ? 0.003 ns/op >> ArrayFill.fillShortArray 5 avgt 12 30.779 ? 0.004 ns/op >> ArrayFill.zeroByteArray 5 avgt 12 559.249 ? 1.909 ns/op >> ArrayFill.zeroIntArray 5 avgt 12 29.346 ? 0.002 ns/op >> ArrayFill.zeroShortArray 5 avgt 12 30.777 ? 0.006 ns/op >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 5 avgt 12 23.977 ? 0.004 ns/op >> ArrayFill.fillIntArray 5 avgt 12 29.343 ? 0.004 ns/op >> ArrayFill.fillShortArray 5 avgt 12 30.776 ? 0.005 ns/op >> ArrayFill.zeroByteArray 5 avgt 12 23.977 ? 0.002 ns/op >> ArrayFill.zeroIntArray 5 avgt 12 29.345 ? 0.005 ns/op >> ArrayFill.zeroShortArray 5 avgt 12 30.776 ? 0.004 ns/op >> >> 2. (`@Param("3") private int size;`): >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 3 avgt 12 428.923 ? 0.409 ns/op >> ArrayFill.fillIntArray 3 avgt 12 28.629 ? 0.005 ns/op >> ArrayFill.fillShortArray 3 avgt 12 558.872 ? 2.641 ns/op >> ArrayFill.zeroByteArray 3 avgt 12 429.744 ? 2.049 ns/op >> ArrayFill.zeroIntArray 3 avgt 12 28.628 ? 0.002 ns/op >> ArrayFill.zeroShortArray 3 avgt 12 557.682 ? 1.661 ns/op >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 3 avgt 12 21.471 ? 0.002 ns/op >> ArrayFill.fillIntArray 3 avgt 12... > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > register optimize @Anjian-Wen Your change (at version a668825b908ecf9673cb8ff2312a44ec05bda643) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25135#issuecomment-2870568252 From duke at openjdk.org Mon May 12 03:04:59 2025 From: duke at openjdk.org (Anjian-Wen) Date: Mon, 12 May 2025 03:04:59 GMT Subject: Integrated: 8356593: RISC-V: Small improvement to array fill stub In-Reply-To: References: Message-ID: On Fri, 9 May 2025 03:19:11 GMT, Anjian-Wen wrote: > When working on [JDK-8351140](https://bugs.openjdk.org/browse/JDK-8351140), I witnessed possible misaligned memory access in array fill stub. > We fill by element for short arrays (< 8 bytes), which assumes a heapword alignment[1]. But that is not guaranteed. This issue could be reproduced by running: `make test TEST="micro:vm.compiler.ArrayFill"` > with `@Param("5") private int size;` on riscv platforms with slow misalgned memory accesses. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp#L2141 > > This fixes this issue by using a small loop to fill the elements for short arrays. > Tier1-2 tested on linux-riscv64 platform. JMH result on P550 SBC for reference: > 1. (`@Param("5") private int size;`): > > Before: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 5 avgt 12 558.781 ? 1.396 ns/op > ArrayFill.fillIntArray 5 avgt 12 29.346 ? 0.003 ns/op > ArrayFill.fillShortArray 5 avgt 12 30.779 ? 0.004 ns/op > ArrayFill.zeroByteArray 5 avgt 12 559.249 ? 1.909 ns/op > ArrayFill.zeroIntArray 5 avgt 12 29.346 ? 0.002 ns/op > ArrayFill.zeroShortArray 5 avgt 12 30.777 ? 0.006 ns/op > > After: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 5 avgt 12 23.977 ? 0.004 ns/op > ArrayFill.fillIntArray 5 avgt 12 29.343 ? 0.004 ns/op > ArrayFill.fillShortArray 5 avgt 12 30.776 ? 0.005 ns/op > ArrayFill.zeroByteArray 5 avgt 12 23.977 ? 0.002 ns/op > ArrayFill.zeroIntArray 5 avgt 12 29.345 ? 0.005 ns/op > ArrayFill.zeroShortArray 5 avgt 12 30.776 ? 0.004 ns/op > > 2. (`@Param("3") private int size;`): > > Before: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 3 avgt 12 428.923 ? 0.409 ns/op > ArrayFill.fillIntArray 3 avgt 12 28.629 ? 0.005 ns/op > ArrayFill.fillShortArray 3 avgt 12 558.872 ? 2.641 ns/op > ArrayFill.zeroByteArray 3 avgt 12 429.744 ? 2.049 ns/op > ArrayFill.zeroIntArray 3 avgt 12 28.628 ? 0.002 ns/op > ArrayFill.zeroShortArray 3 avgt 12 557.682 ? 1.661 ns/op > > After: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 3 avgt 12 21.471 ? 0.002 ns/op > ArrayFill.fillIntArray 3 avgt 12 28.631 ? 0.003 ns/op > ArrayFill.fillShortArray 3 avgt 12 20.436 ? 0.288 n... This pull request has now been integrated. Changeset: d7cb933b Author: Anjian-Wen Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/d7cb933b89839b692f5562aeeb92076cd25a99f6 Stats: 69 lines in 1 file changed: 37 ins; 24 del; 8 mod 8356593: RISC-V: Small improvement to array fill stub Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/25135 From jkarthikeyan at openjdk.org Mon May 12 03:11:52 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 12 May 2025 03:11:52 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v13] In-Reply-To: References: Message-ID: > Hi all, > This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: > > > Baseline Patch > Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement > VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) > VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) > VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) > > > I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Check for AVX2 for byte/long conversions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23413/files - new: https://git.openjdk.org/jdk/pull/23413/files/03ee1154..78934c96 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=11-12 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23413/head:pull/23413 PR: https://git.openjdk.org/jdk/pull/23413 From jkarthikeyan at openjdk.org Mon May 12 03:11:54 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 12 May 2025 03:11:54 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v12] In-Reply-To: References: <05AJmJd1G9_Z5TzYb6kuA1KcXqN96C2-ivfhnstgfCM=.aadfc52f-7748-4abb-a497-8f5049ab608b@github.com> Message-ID: On Mon, 5 May 2025 13:51:29 GMT, Emanuel Peter wrote: >> Thanks a lot for running the benchmark on your AVX512 machine! The results are very interesting, in the char cases it looks like we over-unroll the loop with SuperWord enabled even though we don't end up vectorizing the loop, fixing that could solve the slowdown. Since you mentioned the unroll amount was 32x, it might be unrolling to fill a vector (`512/sizeof(char) = 32`). >> >>> Wait, but you seem to say that you want to support `casting to T_CHAR`. But is the issue not casting FROM char? >> >> You are correct, I think that is my mistake. It looks like casting to char is supported because stores to both short and char become `StoreC`, but casting from char isn't supported because we have no `VectorCastC2X` node. I'll update the bug to make it more accurate. >> >> I've also pushed a small commit to remove some extra whitespace and to make the benchmark run faster. > > @jaskarth Just checked the internal testing. Saw this failure with `-XX:UseAVX=1`: > > > Failed IR Rules (2) of Methods (2) > ---------------------------------- > 1) Method "public java.lang.Object[] compiler.loopopts.superword.TestCompatibleUseDefTypeSize.testByteToLong(byte[],long[])" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#VECTOR_CAST_B2L#_", "_ at min(max_byte, max_long)", ">0"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={"AlignVector", "false", "UseCompactObjectHeaders", "false"}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={"avx", "true"}, applyIfAnd={}, applyIfNot={})" > > Phase "PrintIdeal": > - counts: Graph contains wrong number of nodes: > * Constraint 1: "(\\d+(\\s){2}(VectorCastB2X.*)+(\\s){2}===.*vector[A-Za-z])" > - Failed comparison: [found] 0 > 0 [given] > - No nodes matched! > > 2) Method "public java.lang.Object[] compiler.loopopts.superword.TestCompatibleUseDefTypeSize.testLongToByte(long[],byte[])" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#VECTOR_CAST_L2B#_", "_ at min(max_long, max_byte)", ">0"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={"AlignVector", "false", "UseCompactObjectHeaders", "false"}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={"avx", "true"}, applyIfAnd={}, applyIfNot={})" > > Phase "PrintIdeal": > - counts: Graph contains wrong number of nodes: > * Constraint 1: "(\\d+(\\s){2}(VectorCastL2X.*)+(\\s){2}===.*vector[A-Za-z])" > - Failed comparison: [found] 0 > 0 [given] > - No nodes matched! @eme64 Thanks for the testing results! It looks like byte<->long conversion isn't supported with AVX1, so I've pushed a small to make the test to check for AVX2 in those cases instead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2870630280 From haosun at openjdk.org Mon May 12 03:22:51 2025 From: haosun at openjdk.org (Hao Sun) Date: Mon, 12 May 2025 03:22:51 GMT Subject: RFR: 8355708: Two Float16 IR tests fail after JDK-8345125 In-Reply-To: References: Message-ID: On Fri, 9 May 2025 13:17:42 GMT, Bhavana Kilambi wrote: > Two FP16 tests fail due to IR verification failure in JTREG. Increased the warmup time to 10000 to make sure it is being compiled by c2 and the expected IR is being generated. > > Testing: > Tested both the testcases with and without these options - `"-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation"` and they pass successfully on aarch64. Marked as reviewed by haosun (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25141#pullrequestreview-2831707842 From haosun at openjdk.org Mon May 12 03:50:52 2025 From: haosun at openjdk.org (Hao Sun) Date: Mon, 12 May 2025 03:50:52 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations In-Reply-To: References: Message-ID: On Wed, 7 May 2025 14:14:14 GMT, Bhavana Kilambi wrote: > This patch adds aarch64 backend (both Neon and SVE) for FP16 vector operations - add, mul, sub, div, min, max, sqrt and fma. > > Testing: > JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) pass on aarch64 which also includes the JTREG test to test the FP16 vector operations - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 225: > 223: case Op_MaxVHF: > 224: case Op_SqrtVHF: > 225: // FEAT_FP16 is enabled if both "fphp" and "asimdhp" features are supported. It's an unary op and we should add it to `is_vector_unary_op_name` in `adlc/dfa.cpp`. See the related code in the previous patch https://github.com/openjdk/jdk/pull/9534 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25096#discussion_r2083771683 From amitkumar at openjdk.org Mon May 12 03:52:56 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 12 May 2025 03:52:56 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: <2NUKCBO7aaoQYPLVWn_rJ4nL28qtgm1OqeD6Zhil2mQ=.f5eca835-22bf-44c1-a2e1-71bdf1cd9401@github.com> References: <2NUKCBO7aaoQYPLVWn_rJ4nL28qtgm1OqeD6Zhil2mQ=.f5eca835-22bf-44c1-a2e1-71bdf1cd9401@github.com> Message-ID: <1TYgAXK73h2YE6-vEvg1wKEmLiqrl88fa5OiSkPu0qU=.0050c295-0bb9-4a2e-a81f-fcb08e24efe5@github.com> On Fri, 9 May 2025 13:30:26 GMT, Martin Doerr wrote: > Thanks! That sounds like mvc should better not be used for `Unsafe` operations. Seeing no failures in some tests doesn't prove that it's safe. @TheRealMDoerr But in this case MVC will only be used iff store is unaligned. If they are unaligned then we don't care about the atomicity. In other case, we will use `sth`, `st`, `stg` as per alignment. And current C++ implementation is also emitting `mvc` instruction for unaligned case. Which is the behaviour this stub will replicate. If we don't go ahead with mvc, then we are seeing regression, as you have noticed in the previous result. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2870724876 From jbhateja at openjdk.org Mon May 12 04:24:34 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 12 May 2025 04:24:34 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v2] In-Reply-To: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: > This is a follow-up PR#22755 to improve Float16 operations inferencing. > > The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. > > Best Regards, > Jatin Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Adding test points and some re-factoring ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24179/files - new: https://git.openjdk.org/jdk/pull/24179/files/021009c0..4a7a519b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=00-01 Stats: 47 lines in 3 files changed: 33 ins; 4 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24179.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24179/head:pull/24179 PR: https://git.openjdk.org/jdk/pull/24179 From enikitin at openjdk.org Mon May 12 06:08:28 2025 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Mon, 12 May 2025 06:08:28 GMT Subject: RFR: 8356702: CTW: Update modules Message-ID: <5_pxWyLzGtPZEDsJKkq6i5wFIemDsY-OeXTgkVO_kuk=.ed16944a-2e41-4c19-a27c-6c1a8269da42@github.com> This PR enhances CTW test wrappers generator in order to make it more user-friendly. Added features are: 1. Automatic scanning for modules list under `open/src` 2. Automatic recognition of current year; 3. Multi-wrapper modules support (allows for splitting huge modules into 2 and more wrappers) 4. ability to exclude modules; The updated generator have been used to refresh JTReg module wrappers. The most meaningful change is contained in the `generate.bash` Testing: `open/test/hotspot/jtreg/applications/ctw/modules` with the supported platforms, no failures spotted. ------------- Commit messages: - Update modified wrappers - Add automatic year calculation - Add excluded modules - Add support for multi-part modules - Add optional scanning for modules - Extract generation into a function Changes: https://git.openjdk.org/jdk/pull/25175/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25175&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356702 Stats: 117 lines in 68 files changed: 40 ins; 1 del; 76 mod Patch: https://git.openjdk.org/jdk/pull/25175.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25175/head:pull/25175 PR: https://git.openjdk.org/jdk/pull/25175 From hgreule at openjdk.org Mon May 12 06:15:53 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 12 May 2025 06:15:53 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v4] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 12:24:34 GMT, Emanuel Peter wrote: >> Thanks for testing again @eme64. Are the results in? > > @SirYwell They are **almost** completed, but so far no failure :) @eme64 just to make sure, I assume we can integrate this now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24382#issuecomment-2870966962 From epeter at openjdk.org Mon May 12 06:26:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 May 2025 06:26:57 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v4] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 06:13:15 GMT, Hannes Greule wrote: >> @SirYwell They are **almost** completed, but so far no failure :) > > @eme64 just to make sure, I assume we can integrate this now? @SirYwell Yes, it is all green :green_circle: Ship it :) ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24382#issuecomment-2871014695 From epeter at openjdk.org Mon May 12 06:29:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 May 2025 06:29:56 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v12] In-Reply-To: References: <05AJmJd1G9_Z5TzYb6kuA1KcXqN96C2-ivfhnstgfCM=.aadfc52f-7748-4abb-a497-8f5049ab608b@github.com> Message-ID: On Mon, 12 May 2025 03:08:28 GMT, Jasmine Karthikeyan wrote: >> @jaskarth Just checked the internal testing. Saw this failure with `-XX:UseAVX=1`: >> >> >> Failed IR Rules (2) of Methods (2) >> ---------------------------------- >> 1) Method "public java.lang.Object[] compiler.loopopts.superword.TestCompatibleUseDefTypeSize.testByteToLong(byte[],long[])" - [Failed IR rules: 1]: >> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#VECTOR_CAST_B2L#_", "_ at min(max_byte, max_long)", ">0"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={"AlignVector", "false", "UseCompactObjectHeaders", "false"}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={"avx", "true"}, applyIfAnd={}, applyIfNot={})" >> > Phase "PrintIdeal": >> - counts: Graph contains wrong number of nodes: >> * Constraint 1: "(\\d+(\\s){2}(VectorCastB2X.*)+(\\s){2}===.*vector[A-Za-z])" >> - Failed comparison: [found] 0 > 0 [given] >> - No nodes matched! >> >> 2) Method "public java.lang.Object[] compiler.loopopts.superword.TestCompatibleUseDefTypeSize.testLongToByte(long[],byte[])" - [Failed IR rules: 1]: >> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#VECTOR_CAST_L2B#_", "_ at min(max_long, max_byte)", ">0"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={"AlignVector", "false", "UseCompactObjectHeaders", "false"}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={"avx", "true"}, applyIfAnd={}, applyIfNot={})" >> > Phase "PrintIdeal": >> - counts: Graph contains wrong number of nodes: >> * Constraint 1: "(\\d+(\\s){2}(VectorCastL2X.*)+(\\s){2}===.*vector[A-Za-z])" >> - Failed comparison: [found] 0 > 0 [given] >> - No nodes matched! > > @eme64 Thanks for the testing results! It looks like byte<->long conversion isn't supported with AVX1, so I've pushed a small to make the test to check for AVX2 in those cases instead. @jaskarth Excellent! I'll run another round of testing, just to be sure :) Please ping me again in 24h! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2871032679 From epeter at openjdk.org Mon May 12 06:35:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 May 2025 06:35:51 GMT Subject: RFR: 8351950: C2: masked vector MIN/MAX AVX512: SIGFPE / no valid evex tuple_table entry In-Reply-To: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: On Sat, 3 May 2025 15:49:24 GMT, Jatin Bhateja wrote: > PR adds missing EVEX compressed displacement attributes used for computing the scale factor (N) of compressed displacement. > AVX512 memory operand instructions use compressed disp8 encoding if the displacement is a multiple of scale (N), which depends on Vector Length, embedded broadcasting, and lane size. Please refer to section 2.7.5 of Intel SDM for more details. > > e.g., Consider two instructions, one with displacement 0x10203040 and the other with displacement 0x40, instruction operates over full 64-byte vector hence scale N = 64. Displacement of latter instruction is a multiple of scale, thus can be represented by 1 byte displacement encoding, while the former requires 4 bytes to represent displacement in instruction encoding. > > > 1) vpternlogq $0xff,0x10203040(%r20,%r21,8),%zmm23,%zmm24 > EVEX OP MR SIB DISP IMM > --------------|----|----|----|---------------|-----| > 62 6b c1 40 25 84 ec 40 30 20 10 ff > > 2) vpternlogq $0xff,0x40(%r20,%r21,8),%zmm23,%zmm24 > For full vector width operation, scalar matches with vector size, hence scale N = 64 > effective displacement / compressed DISP8 = OFFSET(64) / 64 = 0x1 > EVEX OP MR SIB DISP IMM > -------------|----|---|---|-----------|---| > 62 6b c1 40 25 44 ec 01 ff > > > Kindly review and share your feedback. > > Best Regards, > Jatin @jatin-bhateja I'll run some internal testing, please ping me in 24h for results! :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25021#issuecomment-2871046053 From hgreule at openjdk.org Mon May 12 06:39:51 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 12 May 2025 06:39:51 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v6] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 08:55:30 GMT, Hannes Greule wrote: >> This change implements constant folding for ReverseBytes nodes. >> >> Currently, `byteswap` is included transitively by `reverse_bits.hpp`. I'm not sure if this is fine or if I need to add an explicit include there. >> >> I appreciate any reviews and comments. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > correct driver path Thanks for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24382#issuecomment-2871052562 From duke at openjdk.org Mon May 12 06:39:52 2025 From: duke at openjdk.org (duke) Date: Mon, 12 May 2025 06:39:52 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v6] In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 08:55:30 GMT, Hannes Greule wrote: >> This change implements constant folding for ReverseBytes nodes. >> >> Currently, `byteswap` is included transitively by `reverse_bits.hpp`. I'm not sure if this is fine or if I need to add an explicit include there. >> >> I appreciate any reviews and comments. > > Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: > > correct driver path @SirYwell Your change (at version 3a94bbe6d63b91171c948ea4a1496c3b918f1d8a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24382#issuecomment-2871053425 From xgong at openjdk.org Mon May 12 06:44:52 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 12 May 2025 06:44:52 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations In-Reply-To: References: Message-ID: On Wed, 7 May 2025 14:14:14 GMT, Bhavana Kilambi wrote: > This patch adds aarch64 backend (both Neon and SVE) for FP16 vector operations - add, mul, sub, div, min, max, sqrt and fma. > > Testing: > JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) pass on aarch64 which also includes the JTREG test to test the FP16 vector operations - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2749: > 2747: void adv_simd_three_same(Instruction_aarch64 ¤t_insn, FloatRegister Vd, > 2748: SIMD_Arrangement T, FloatRegister Vn, FloatRegister Vm, > 2749: int op1, int op2, int op3); May I ask why you move this to the .cpp file? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25096#discussion_r2083923678 From epeter at openjdk.org Mon May 12 06:47:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 May 2025 06:47:02 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes [v6] In-Reply-To: References: Message-ID: <-MCTIjyLTjNiIFdUVPjm31nh2zxgqyKzZ1yChxIbUyg=.e7d9134a-fcfc-4517-a8ff-510e82536d33@github.com> On Mon, 12 May 2025 06:37:06 GMT, Hannes Greule wrote: >> Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: >> >> correct driver path > > Thanks for your reviews! @SirYwell Thanks for the work :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24382#issuecomment-2871068616 From hgreule at openjdk.org Mon May 12 06:47:03 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 12 May 2025 06:47:03 GMT Subject: Integrated: 8353551: C2: Constant folding for ReverseBytes nodes In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 16:17:57 GMT, Hannes Greule wrote: > This change implements constant folding for ReverseBytes nodes. > > Currently, `byteswap` is included transitively by `reverse_bits.hpp`. I'm not sure if this is fine or if I need to add an explicit include there. > > I appreciate any reviews and comments. This pull request has now been integrated. Changeset: de801fea Author: Hannes Greule Committer: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/de801fea76b1328f3bda503088618162388eb119 Stats: 235 lines in 3 files changed: 227 ins; 0 del; 8 mod 8353551: C2: Constant folding for ReverseBytes nodes Reviewed-by: epeter, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/24382 From xgong at openjdk.org Mon May 12 06:48:53 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 12 May 2025 06:48:53 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations In-Reply-To: References: Message-ID: On Wed, 7 May 2025 14:14:14 GMT, Bhavana Kilambi wrote: > This patch adds aarch64 backend (both Neon and SVE) for FP16 vector operations - add, mul, sub, div, min, max, sqrt and fma. > > Testing: > JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) pass on aarch64 which also includes the JTREG test to test the FP16 vector operations - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` src/hotspot/cpu/aarch64/aarch64_vector.ad line 698: > 696: instruct vaddHF_masked(vReg dst_src1, vReg src2, pRegGov pg) %{ > 697: predicate(UseSVE > 0); > 698: match(Set dst_src1 (AddVHF (Binary dst_src1 src2) pg)); Do we have such a case in existing jtreg now? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25096#discussion_r2083932385 From pminborg at openjdk.org Mon May 12 06:54:51 2025 From: pminborg at openjdk.org (Per Minborg) Date: Mon, 12 May 2025 06:54:51 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v2] In-Reply-To: References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: On Fri, 9 May 2025 15:39:35 GMT, Andrew Haley wrote: >> This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). >> >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op >> MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op >> MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op >> MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > generate_unsafecopy_common_error_exit Looking at the improvements made, I suggest we also change (in `SegmentBulkOperations`): private static final int NATIVE_THRESHOLD_FILL = powerOfPropertyOr("fill", Architecture.isAARCH64() ? 18 : 5); to private static final int NATIVE_THRESHOLD_FILL = powerOfPropertyOr("fill", 5); ------------- PR Comment: https://git.openjdk.org/jdk/pull/25147#issuecomment-2871092439 From rcastanedalo at openjdk.org Mon May 12 07:30:59 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 May 2025 07:30:59 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v6] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 18:10:09 GMT, Daniel Lund?n wrote: >> The current documentation for `PhaseCFG::insert_anti_dependences` is difficult to follow and sometimes even misleading. We should ensure the method is appropriately documented. >> >> ### Changeset >> >> - Rename `PhaseCFG::insert_anti_dependences` to `PhaseCFG::raise_above_anti_dependences`. The purpose of `PhaseCFG::raise_above_anti_dependences` is twofold: raise the load's LCA so that the load is scheduled before anti-dependent stores, and if necessary add anti-dependence edges between the load and certain anti-dependent stores (to ensure we later "raise" the load before anti-dependent stores in LCM). The name `PhaseCFG::insert_anti_dependences` suggests that we only add anti-dependence edges. The name `PhaseCFG::raise_above_anti_dependences`, therefore, seems more appropriate. >> - Significantly add to and revise the source code documentation of `PhaseCFG::raise_above_anti_dependences`. >> - Add, move, and revise `assert`s in `PhaseCFG::raise_above_anti_dependences`, including improved `assert` messages in a few places. >> - In the main worklist loop of `PhaseCFG::raise_above_anti_dependences`: >> - Clean up how we identify the search root (avoid mutation). >> - Add a missing early exit for `Phi` nodes when `LCA == early`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14706896111) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/gcm.cpp > > Co-authored-by: Roberto Casta?eda Lozano Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24926#pullrequestreview-2832115145 From mhaessig at openjdk.org Mon May 12 07:34:52 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 12 May 2025 07:34:52 GMT Subject: RFR: 8355708: Two Float16 IR tests fail after JDK-8345125 In-Reply-To: References: Message-ID: On Fri, 9 May 2025 13:17:42 GMT, Bhavana Kilambi wrote: > Two FP16 tests fail due to IR verification failure in JTREG. Increased the warmup time to 10000 to make sure it is being compiled by c2 and the expected IR is being generated. > > Testing: > Tested both the testcases with and without these options - `"-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation"` and they pass successfully on aarch64. Looks good to me as well. I just kicked off testing and will come back to you once the results are in. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25141#issuecomment-2871224005 From aph at openjdk.org Mon May 12 07:49:54 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 12 May 2025 07:49:54 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations In-Reply-To: References: Message-ID: <1d7X7DQI-GBy7lItl_rFGe5Rs7y5AfwHPCv7i547QaI=.d9542f26-c533-4eb2-ad2a-97098342e370@github.com> On Mon, 12 May 2025 06:37:13 GMT, Xiaohong Gong wrote: >> This patch adds aarch64 backend (both Neon and SVE) for FP16 vector operations - add, mul, sub, div, min, max, sqrt and fma. >> >> Testing: >> JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) pass on aarch64 which also includes the JTREG test to test the FP16 vector operations - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2749: > >> 2747: void adv_simd_three_same(Instruction_aarch64 ¤t_insn, FloatRegister Vd, >> 2748: SIMD_Arrangement T, FloatRegister Vn, FloatRegister Vm, >> 2749: int op1, int op2, int op3); > > May I ask why you move this to the .cpp file? It's not a bad thing to do. Once the function gets so large, especially when it is inlined many times, that's a wise thing to do. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25096#discussion_r2084038897 From xgong at openjdk.org Mon May 12 07:52:51 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 12 May 2025 07:52:51 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations In-Reply-To: <1d7X7DQI-GBy7lItl_rFGe5Rs7y5AfwHPCv7i547QaI=.d9542f26-c533-4eb2-ad2a-97098342e370@github.com> References: <1d7X7DQI-GBy7lItl_rFGe5Rs7y5AfwHPCv7i547QaI=.d9542f26-c533-4eb2-ad2a-97098342e370@github.com> Message-ID: On Mon, 12 May 2025 07:47:35 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2749: >> >>> 2747: void adv_simd_three_same(Instruction_aarch64 ¤t_insn, FloatRegister Vd, >>> 2748: SIMD_Arrangement T, FloatRegister Vn, FloatRegister Vm, >>> 2749: int op1, int op2, int op3); >> >> May I ask why you move this to the .cpp file? > > It's not a bad thing to do. Once the function gets so large, especially when it is inlined many times, that's a wise thing to do. Make sense to me. Thanks for your explanation @theRealAph ! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25096#discussion_r2084044297 From mdoerr at openjdk.org Mon May 12 08:10:51 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 12 May 2025 08:10:51 GMT Subject: RFR: 8356192: Enable AOT code caching only on supported platforms [v2] In-Reply-To: References: <0jJANchQvZPwpb-L02B47hWK9eUIPoZviMEWx1a4Gpo=.d75f0315-bda0-48a7-bffe-4a3af898e24d@github.com> Message-ID: On Fri, 9 May 2025 23:23:04 GMT, Vladimir Kozlov wrote: >> @TheRealMDoerr reported failures in `runtime/cds/appcds` testing on PPC64 after [JDK-8350209](https://bugs.openjdk.org/browse/JDK-8350209) integration. >> >> AOT code caching should be limited to supported platforms: x64 and aarch64. >> >> Testing: GHA > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Limit platforms to run AOTCode test LGTM. Thanks! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25158#pullrequestreview-2832235951 From mli at openjdk.org Mon May 12 08:40:29 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 12 May 2025 08:40:29 GMT Subject: RFR: 8356642: RISC-V: enable hotspot/jtreg/compiler/vectorapi/VectorFusedMultiplyAddSubTest.java Message-ID: Hi, Can you help to review this patch? API comibnation like `av.lanewise(VectorOperators.FMA, bv.neg(), cv, mask).intoArray(fr, i);` should be able to be optimized to some special instruct rather than a bunch of instructs. We can not verify IR node (via `beforeMatchingNameRegex`) in PrintIdeal phase, as it does not mean that the special instruct are generated in FINAL CODE phase, it could be a combination of multiple different instruct. As riscv use different instruct names, so I create new ones in IRNode.java. Thanks ------------- Commit messages: - modify summary - initial commit Changes: https://git.openjdk.org/jdk/pull/25145/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25145&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356642 Stats: 57 lines in 2 files changed: 34 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/25145.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25145/head:pull/25145 PR: https://git.openjdk.org/jdk/pull/25145 From fyang at openjdk.org Mon May 12 08:40:29 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 12 May 2025 08:40:29 GMT Subject: RFR: 8356642: RISC-V: enable hotspot/jtreg/compiler/vectorapi/VectorFusedMultiplyAddSubTest.java In-Reply-To: References: Message-ID: On Fri, 9 May 2025 13:47:41 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > API comibnation like `av.lanewise(VectorOperators.FMA, bv.neg(), cv, mask).intoArray(fr, i);` should be able to be optimized to some special instruct rather than a bunch of instructs. > > We can not verify IR node (via `beforeMatchingNameRegex`) in PrintIdeal phase, as it does not mean that the special instruct are generated in FINAL CODE phase, it could be a combination of multiple different instruct. > > As riscv use different instruct names, so I create new ones in IRNode.java. > > Thanks Looks fine. You need to fix the jcheck error. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25145#pullrequestreview-2831581126 From fjiang at openjdk.org Mon May 12 08:40:29 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 12 May 2025 08:40:29 GMT Subject: RFR: 8356642: RISC-V: enable hotspot/jtreg/compiler/vectorapi/VectorFusedMultiplyAddSubTest.java In-Reply-To: References: Message-ID: On Fri, 9 May 2025 13:47:41 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > API comibnation like `av.lanewise(VectorOperators.FMA, bv.neg(), cv, mask).intoArray(fr, i);` should be able to be optimized to some special instruct rather than a bunch of instructs. > > We can not verify IR node (via `beforeMatchingNameRegex`) in PrintIdeal phase, as it does not mean that the special instruct are generated in FINAL CODE phase, it could be a combination of multiple different instruct. > > As riscv use different instruct names, so I create new ones in IRNode.java. > > Thanks Looks good! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/25145#pullrequestreview-2832145089 From mli at openjdk.org Mon May 12 08:40:29 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 12 May 2025 08:40:29 GMT Subject: RFR: 8356642: RISC-V: enable hotspot/jtreg/compiler/vectorapi/VectorFusedMultiplyAddSubTest.java In-Reply-To: References: Message-ID: On Mon, 12 May 2025 00:51:12 GMT, Fei Yang wrote: > Looks fine. You need to fix the jcheck error. Thanks! fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25145#issuecomment-2871464101 From mli at openjdk.org Mon May 12 08:40:30 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 12 May 2025 08:40:30 GMT Subject: RFR: 8356642: RISC-V: enable hotspot/jtreg/compiler/vectorapi/VectorFusedMultiplyAddSubTest.java In-Reply-To: References: Message-ID: On Mon, 12 May 2025 07:39:47 GMT, Feilong Jiang wrote: > Looks good! Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25145#issuecomment-2871464505 From duke at openjdk.org Mon May 12 08:45:54 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Mon, 12 May 2025 08:45:54 GMT Subject: RFR: 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list [v6] In-Reply-To: References: Message-ID: On Thu, 8 May 2025 12:25:11 GMT, Saranya Natarajan wrote: >> Issue: The assertion failure , `assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list`, occurs when [loop striping mining ](https://bugs.openjdk.org/browse/JDK-8186027)may create a [MaxL](https://bugs.openjdk.org/browse/JDK-8324655) after macro expansion. >> >> Analysis : Before the macro nodes are expanded in` expand_macro_nodes`, there is a process where nodes from the macro list are eliminated. This also includes elimination of any `OuterStripMinedLoop` node in the macro list. The bug occurs due to the refining of the strip mined loop in `adjust_strip_mined_loop` function just before it is eliminated. In this case, a` MaxL` node is added to the macro list in `adjust_strip_mined_loop`. >> >> Fix: The fix involves performing the refining of the strip mined loop before elimination process. More specifically, moving the `adjust_strip_mined_loop` function outside the elimination loop. >> >> Improvement: The process of eliminating macro nodes by calling `eliminate_macro_nodes` and performing additional Opaque and LoopLimit nodes elimination in ` expand_macro_nodes` is unintuitive as suggested in [JDK-8325478 ](https://bugs.openjdk.org/browse/JDK-8325478) and the current fix should be moved along with the other elimination code. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > removing header and modifying method name Thank you for all the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24890#issuecomment-2871498882 From duke at openjdk.org Mon May 12 08:45:55 2025 From: duke at openjdk.org (duke) Date: Mon, 12 May 2025 08:45:55 GMT Subject: RFR: 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list [v6] In-Reply-To: References: Message-ID: On Thu, 8 May 2025 12:25:11 GMT, Saranya Natarajan wrote: >> Issue: The assertion failure , `assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list`, occurs when [loop striping mining ](https://bugs.openjdk.org/browse/JDK-8186027)may create a [MaxL](https://bugs.openjdk.org/browse/JDK-8324655) after macro expansion. >> >> Analysis : Before the macro nodes are expanded in` expand_macro_nodes`, there is a process where nodes from the macro list are eliminated. This also includes elimination of any `OuterStripMinedLoop` node in the macro list. The bug occurs due to the refining of the strip mined loop in `adjust_strip_mined_loop` function just before it is eliminated. In this case, a` MaxL` node is added to the macro list in `adjust_strip_mined_loop`. >> >> Fix: The fix involves performing the refining of the strip mined loop before elimination process. More specifically, moving the `adjust_strip_mined_loop` function outside the elimination loop. >> >> Improvement: The process of eliminating macro nodes by calling `eliminate_macro_nodes` and performing additional Opaque and LoopLimit nodes elimination in ` expand_macro_nodes` is unintuitive as suggested in [JDK-8325478 ](https://bugs.openjdk.org/browse/JDK-8325478) and the current fix should be moved along with the other elimination code. > > Saranya Natarajan has updated the pull request incrementally with one additional commit since the last revision: > > removing header and modifying method name @sarannat Your change (at version a50d9f4faabff37bac70f8bb1104e79183b5c7a5) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24890#issuecomment-2871510094 From bkilambi at openjdk.org Mon May 12 08:50:52 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 12 May 2025 08:50:52 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations In-Reply-To: References: Message-ID: On Mon, 12 May 2025 06:43:33 GMT, Xiaohong Gong wrote: >> This patch adds aarch64 backend (both Neon and SVE) for FP16 vector operations - add, mul, sub, div, min, max, sqrt and fma. >> >> Testing: >> JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) pass on aarch64 which also includes the JTREG test to test the FP16 vector operations - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` > > src/hotspot/cpu/aarch64/aarch64_vector.ad line 698: > >> 696: instruct vaddHF_masked(vReg dst_src1, vReg src2, pRegGov pg) %{ >> 697: predicate(UseSVE > 0); >> 698: match(Set dst_src1 (AddVHF (Binary dst_src1 src2) pg)); > > Do we have such a case in existing jtreg now? Not at the moment. Thanks for pointing this out. I think for this PR, I will remove the predicated instructions support and for now only keep support for non-masked ones. I will add this support when there will be more focus on the masked versions once VectorAPI with Float16Vector is integrated with mainline. Hope this is ok. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25096#discussion_r2084165085 From duke at openjdk.org Mon May 12 08:53:02 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Mon, 12 May 2025 08:53:02 GMT Subject: Integrated: 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 21:56:10 GMT, Saranya Natarajan wrote: > Issue: The assertion failure , `assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list`, occurs when [loop striping mining ](https://bugs.openjdk.org/browse/JDK-8186027)may create a [MaxL](https://bugs.openjdk.org/browse/JDK-8324655) after macro expansion. > > Analysis : Before the macro nodes are expanded in` expand_macro_nodes`, there is a process where nodes from the macro list are eliminated. This also includes elimination of any `OuterStripMinedLoop` node in the macro list. The bug occurs due to the refining of the strip mined loop in `adjust_strip_mined_loop` function just before it is eliminated. In this case, a` MaxL` node is added to the macro list in `adjust_strip_mined_loop`. > > Fix: The fix involves performing the refining of the strip mined loop before elimination process. More specifically, moving the `adjust_strip_mined_loop` function outside the elimination loop. > > Improvement: The process of eliminating macro nodes by calling `eliminate_macro_nodes` and performing additional Opaque and LoopLimit nodes elimination in ` expand_macro_nodes` is unintuitive as suggested in [JDK-8325478 ](https://bugs.openjdk.org/browse/JDK-8325478) and the current fix should be moved along with the other elimination code. This pull request has now been integrated. Changeset: 0258d999 Author: Saranya Natarajan Committer: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/0258d9998ebc523a6463818be00353c6ac8b7c9c Stats: 63 lines in 3 files changed: 62 ins; 1 del; 0 mod 8347515: C2: assert(!success || (C->macro_count() == (old_macro_count - 1))) failed: elimination must have deleted one node from macro list Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/24890 From xgong at openjdk.org Mon May 12 08:58:54 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 12 May 2025 08:58:54 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations In-Reply-To: References: Message-ID: On Mon, 12 May 2025 08:48:26 GMT, Bhavana Kilambi wrote: >> src/hotspot/cpu/aarch64/aarch64_vector.ad line 698: >> >>> 696: instruct vaddHF_masked(vReg dst_src1, vReg src2, pRegGov pg) %{ >>> 697: predicate(UseSVE > 0); >>> 698: match(Set dst_src1 (AddVHF (Binary dst_src1 src2) pg)); >> >> Do we have such a case in existing jtreg now? > > Not at the moment. Thanks for pointing this out. I think for this PR, I will remove the predicated instructions support and for now only keep support for non-masked ones. I will add this support when there will be more focus on the masked versions once VectorAPI with Float16Vector is integrated with mainline. Hope this is ok. Sounds good to me. But I'm worried it may crash with bad ad file on AArch64 if the Vector API java and compiler IR part is ready for HF types, while the AArch64 relative masked rules are missing. Beacause the masked vector IR have been generated, while the codegen is missing on AArch64. We have to add the HF ops to `match_rule_supported_vector_masked` first, and then remove them when adding the masked version rules. WDYT? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25096#discussion_r2084182434 From mdoerr at openjdk.org Mon May 12 09:02:34 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 12 May 2025 09:02:34 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 06:09:25 GMT, Amit Kumar wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > improved mvc implementation As I said, mvc usage may be a bug. It was probably not indented that gcc generates it for `Unsafe` operations. Atomicity is never a problem when filling memory with Bytes. The code is designed to have a defined behavior when hitting signals. That's why `UnsafeMemoryAccessMark` is used. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2871629096 From mhaessig at openjdk.org Mon May 12 09:02:52 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 12 May 2025 09:02:52 GMT Subject: RFR: 8356642: RISC-V: enable hotspot/jtreg/compiler/vectorapi/VectorFusedMultiplyAddSubTest.java In-Reply-To: References: Message-ID: On Fri, 9 May 2025 13:47:41 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > API comibnation like `av.lanewise(VectorOperators.FMA, bv.neg(), cv, mask).intoArray(fr, i);` should be able to be optimized to some special instruct rather than a bunch of instructs. > > We can not verify IR node (via `beforeMatchingNameRegex`) in PrintIdeal phase, as it does not mean that the special instruct are generated in FINAL CODE phase, it could be a combination of multiple different instruct. > > As riscv use different instruct names, so I create new ones in IRNode.java. > > Thanks The changes also look good to me. I kicked off testing and will come back to you once the results are in. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25145#issuecomment-2871634982 From dlunden at openjdk.org Mon May 12 09:04:59 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 12 May 2025 09:04:59 GMT Subject: RFR: 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000) In-Reply-To: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> References: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> Message-ID: On Wed, 30 Apr 2025 10:30:33 GMT, Daniel Lund?n wrote: > Certain idealizations introduce more new nodes than expected when adding the new assert in the changeset for [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833). The limit checked by the new assert is too optimistic. > > ### Changeset > > Tweak the maximum live node increase per iteration in the main IGVN loop from `NodeLimitFudgeFactor * 2` (4000 by default) to `NodeLimitFudgeFactor * 3` (6000 by default). This change does not only affect the newly added assert in [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833), but also the IGVN live node count bailout which is `MaxNodeLimit` minus the maximum live node increase per iteration. That is, the bailout by default is currently at 80000 - 4000 = 76000 live nodes, and 80000 - 6000 = 74000 live nodes after this changeset. In practice, the difference does not matter (see Testing below). > > The motivation for just tweaking the limit and keeping the assert added by [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833) is that individual IGVN transformations (within a single iteration of the IGVN loop) should, in theory, only affect a local set of nodes in the ideal graph. Therefore, the assert is a good sanity check that various transformations (current ones and whatever we might add in the future) do not scale in the size of the ideal graph (i.e., they are local transformations). > > I have not managed to construct a reliable regression test, as triggering the assert is difficult (highly intermittent). Also, the issue is benign (a too optimistic limit). > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14594986152) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Checked IGVN live node count bailouts in DaCapo, Renaissance, SPECjvm, and SPECjbb and observed no bailouts before nor after this changeset. Thanks again for all the reviews, integrating this now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24960#issuecomment-2871639944 From dlunden at openjdk.org Mon May 12 09:04:59 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 12 May 2025 09:04:59 GMT Subject: Integrated: 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000) In-Reply-To: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> References: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> Message-ID: On Wed, 30 Apr 2025 10:30:33 GMT, Daniel Lund?n wrote: > Certain idealizations introduce more new nodes than expected when adding the new assert in the changeset for [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833). The limit checked by the new assert is too optimistic. > > ### Changeset > > Tweak the maximum live node increase per iteration in the main IGVN loop from `NodeLimitFudgeFactor * 2` (4000 by default) to `NodeLimitFudgeFactor * 3` (6000 by default). This change does not only affect the newly added assert in [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833), but also the IGVN live node count bailout which is `MaxNodeLimit` minus the maximum live node increase per iteration. That is, the bailout by default is currently at 80000 - 4000 = 76000 live nodes, and 80000 - 6000 = 74000 live nodes after this changeset. In practice, the difference does not matter (see Testing below). > > The motivation for just tweaking the limit and keeping the assert added by [JDK-8351833](https://bugs.openjdk.org/browse/JDK-8351833) is that individual IGVN transformations (within a single iteration of the IGVN loop) should, in theory, only affect a local set of nodes in the ideal graph. Therefore, the assert is a good sanity check that various transformations (current ones and whatever we might add in the future) do not scale in the size of the ideal graph (i.e., they are local transformations). > > I have not managed to construct a reliable regression test, as triggering the assert is difficult (highly intermittent). Also, the issue is benign (a too optimistic limit). > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14594986152) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Checked IGVN live node count bailouts in DaCapo, Renaissance, SPECjvm, and SPECjbb and observed no bailouts before nor after this changeset. This pull request has now been integrated. Changeset: 2b325416 Author: Daniel Lund?n URL: https://git.openjdk.org/jdk/commit/2b3254160933e8b11527f801507a9c01b90d22b0 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000) Reviewed-by: chagedorn, dfenacci, rcastanedalo, epeter ------------- PR: https://git.openjdk.org/jdk/pull/24960 From mdoerr at openjdk.org Mon May 12 09:09:57 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 12 May 2025 09:09:57 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: <1TYgAXK73h2YE6-vEvg1wKEmLiqrl88fa5OiSkPu0qU=.0050c295-0bb9-4a2e-a81f-fcb08e24efe5@github.com> References: <2NUKCBO7aaoQYPLVWn_rJ4nL28qtgm1OqeD6Zhil2mQ=.f5eca835-22bf-44c1-a2e1-71bdf1cd9401@github.com> <1TYgAXK73h2YE6-vEvg1wKEmLiqrl88fa5OiSkPu0qU=.0050c295-0bb9-4a2e-a81f-fcb08e24efe5@github.com> Message-ID: On Mon, 12 May 2025 03:49:55 GMT, Amit Kumar wrote: > If we don't go ahead with mvc, then we are seeing regression, as you have noticed in the previous result. Are these corner cases relevant at all? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2871668643 From epeter at openjdk.org Mon May 12 09:12:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 May 2025 09:12:24 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:44:22 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - Whitespace >> - Suggestions by Christian >> >> Co-authored-by: Christian Hagedorn >> - typo >> - For Christian: example and more intro >> - fix hashtag >> - manual merge >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - move library >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - ... and 6 more: https://git.openjdk.org/jdk/compare/0844745e...fae7ced6 > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 98: > >> 96: System.out.println("Hello World!"); >> 97: """, >> 98: "int a = ", Integer.valueOf(1), ";\n", > > Might be better to use `System.lineSeparator()` instead of `\n` to be platform independent. @chhagedorn As discussed offline: We figured out that text blocks also normalize newline to `\n`, see https://openjdk.org/jeps/378. Since Templates will make heavy use of text blocks, it makes sense to just keep everything with `\n`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2084205369 From adinn at openjdk.org Mon May 12 09:20:52 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 12 May 2025 09:20:52 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v2] In-Reply-To: References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: On Fri, 9 May 2025 15:39:35 GMT, Andrew Haley wrote: >> This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). >> >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op >> MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op >> MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op >> MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > generate_unsafecopy_common_error_exit src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 2611: > 2609: > 2610: __ subs(count, count, 64); > 2611: __ add(dest, dest, 64); This add could be elided by employing a post-increment on dest in each of the two writes above, saving on code size. Is there a reason to prefer the add? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25147#discussion_r2084225622 From mhaessig at openjdk.org Mon May 12 09:25:43 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 12 May 2025 09:25:43 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock [v3] In-Reply-To: References: Message-ID: > # Issue Summary > > This PR addresses an `assert(bb->is_reachable())` that is triggered in the code for `-XX:+VerifyStack` after a deoptimization with reason `null_assert_or_unreached0` at a `getstatic` bytecode. Following the `getstatic` is an `areturn` and then an unreachable bytecode. When the code for `VerifyStack` tries to compute an oop map for the basic block of the unreachable bytecode, the assert triggers: > > getstatic Field A.val:"LB"; // if class B is not loaded, C2 deopts with reason "null_assert_or_unreached0" > areturn; > // The following is unreachable > iconst_0; > > > This is a similar problem to [JDK-8271055](https://bugs.openjdk.org/browse/JDK-8271055) (#7331), but this particular deopt with reason `null_assert_or_unreached0` at `getstatic` of a field containing an object reference [deopts at the next bytecode](https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/opto/parse3.cpp#L176-L199). The aforementioned issue introduced a check to skip stack verification of the next bytecode in the code if the execution after the deopted bytecode does not continue at the next bytecode in the code, i.e. falls through to the next bytecode. Unfortunately, this check did not include `areturn` as a bytecode that does not fall-through: > https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/runtime/deoptimization.cpp#L845-L856 > > # Change Summary > > To fix the immediate issue described above, this PR adds `areturn` to the list of bytecodes that does not fall through. However, all return bytecodes exhibit the same behavior and might be susceptible to a similar issue. Even though I was not able to reproduce the same crash with `{d,f,i,l}return` because I could not get those or the preceding bytecode to deopt, I also added them to the `falls_through()` function. For the remaining bytecodes in `falls_through()` with the exception of `athrow` I wrote a regression test. > > # Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14595928439) > - [x] tier1 through tier3 on Oracle supported platforms and OSs plus Oracle internal testing > > # Acknowledgements > Special thanks to @eme64 for his hard work on reducing a reproducer that works on all platforms. Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Add jsr to falls_through() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25118/files - new: https://git.openjdk.org/jdk/pull/25118/files/564d2fca..c18b8dc0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25118&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25118&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25118.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25118/head:pull/25118 PR: https://git.openjdk.org/jdk/pull/25118 From aph at openjdk.org Mon May 12 09:37:24 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 12 May 2025 09:37:24 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v3] In-Reply-To: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: > This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). > > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op > MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op > MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op > MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Stub stack frame ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25147/files - new: https://git.openjdk.org/jdk/pull/25147/files/1078ba8c..47179d57 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25147&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25147&range=01-02 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25147/head:pull/25147 PR: https://git.openjdk.org/jdk/pull/25147 From aph at openjdk.org Mon May 12 09:37:24 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 12 May 2025 09:37:24 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v2] In-Reply-To: References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: <8nuQIwaoVWwaSrsAE8oni5LET--OIIsSloJYgenOUY8=.68950b83-957d-48f6-87e1-a63f88197df5@github.com> On Mon, 12 May 2025 09:18:13 GMT, Andrew Dinn wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> generate_unsafecopy_common_error_exit > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 2611: > >> 2609: >> 2610: __ subs(count, count, 64); >> 2611: __ add(dest, dest, 64); > > This add could be elided by employing a post-increment on dest in each of the two writes above, saving on code size. Is there a reason to prefer the add? Yes, there is. A post-increment is often an extra cycle. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25147#discussion_r2084252981 From aph at openjdk.org Mon May 12 09:42:51 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 12 May 2025 09:42:51 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v2] In-Reply-To: References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: <6iRBKELx-Xnkgfm4MbONISmz9D3SjO_38PlaQXijv7w=.d6cb630c-7e83-4695-9ccc-6cfa30da5e17@github.com> On Mon, 12 May 2025 06:52:12 GMT, Per Minborg wrote: > Looking at the improvements made, I suggest we also change (in `SegmentBulkOperations`): > > ``` > private static final int NATIVE_THRESHOLD_FILL = powerOfPropertyOr("fill", Architecture.isAARCH64() ? 18 : 5); > ``` > > to > > ``` > private static final int NATIVE_THRESHOLD_FILL = powerOfPropertyOr("fill", 5); > ``` Possibly so, yes, but I'm still looking at the reasons for the differences. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25147#issuecomment-2871790462 From epeter at openjdk.org Mon May 12 09:50:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 May 2025 09:50:36 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v13] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: improve comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/b161b662..f09c5369 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=11-12 Stats: 11 lines in 1 file changed: 11 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Mon May 12 09:50:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 May 2025 09:50:36 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 20:46:37 GMT, Emanuel Peter wrote: >> Honestly, I don't yet have a clear answer for this. Hmm. >> I'm not sure this is the best place to give this guidance. >> >> I guess the difference is to use a separate "token" vs a hashtag replacement. >> - token: can paste anything. But it requires you to interrupt the string and add commas. That can be a little clunky. And: you can only do a recursive Template call with the token method. >> - hashtag: you need it captured as string, either by a template argument or `let`. Does not allow recursive template calls. But it looks a little nicer cosmetically. >> >> Is this somewhat helpful? Maybe I can put that somewhere later in the tutorial? What do you think? > > Maybe my guidance would be to prefer hashtag, if need be with a `let`. Especially if it is about inserting something on the same line. > If it is on a new line, then the token method looks nicer often. For example if you stream over a list. > And recursive Template calls just have to be "tokens". ![image](https://github.com/user-attachments/assets/51183ba0-1fce-4057-b7d4-bad62cade181) I added this section here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2084285448 From epeter at openjdk.org Mon May 12 09:50:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 May 2025 09:50:36 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 20:37:27 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 143: >> >>> 141: public static void main() { >>> 142: """, >>> 143: templateHello.withArgs(), >> >> `withArgs()` looks strange when there are no args. Could we find a better name for it? But maybe I'm missing a pattern here. > > Hmm, yeah, that is a slight concern. But it does return a `TemplateWithArgs`, which means a template that knows all the arguments already. This one happens to be a zero-arg version. > > I suppose I could rename it to `withArgsNone()` or `withZeroArgs` or `withNoArgs` for the zero-args version? Would that be an improvement? I will do the refactoring we discussed offline, with `UnfilledTemplate` and `FilledTemplate`. And `Template.make` with zero args directly generates a FilledTemplate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2084289706 From adinn at openjdk.org Mon May 12 09:50:52 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 12 May 2025 09:50:52 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v3] In-Reply-To: References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: On Mon, 12 May 2025 09:37:24 GMT, Andrew Haley wrote: >> This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). >> >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op >> MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op >> MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op >> MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Stub stack frame src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 2619: > 2617: > 2618: __ bind(tail); > 2619: // __ add(count, count, 64); I can see why you commented this out (and prefer that to deleting it). However, a comment explaining why it is not needed might avoid maintainers being side-tracked. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25147#discussion_r2084293642 From aph at openjdk.org Mon May 12 10:06:51 2025 From: aph at openjdk.org (Andrew Haley) Date: Mon, 12 May 2025 10:06:51 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v3] In-Reply-To: References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: <-74qcaU_hYstfJykNB6hvffkOH9WQVtZOVNza8TGdMs=.10f80939-81e6-403e-930c-8e12ee6cb35d@github.com> On Mon, 12 May 2025 09:48:03 GMT, Andrew Dinn wrote: >> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Stub stack frame > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 2619: > >> 2617: >> 2618: __ bind(tail); >> 2619: // __ add(count, count, 64); > > I can see why you commented this out (and prefer that to deleting it). However, a comment explaining why it is not needed might avoid maintainers being side-tracked. Eh, people enjoy a little puzzle. OK. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25147#discussion_r2084323192 From bkilambi at openjdk.org Mon May 12 10:13:55 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 12 May 2025 10:13:55 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations In-Reply-To: References: Message-ID: On Mon, 12 May 2025 08:56:42 GMT, Xiaohong Gong wrote: >> Not at the moment. Thanks for pointing this out. I think for this PR, I will remove the predicated instructions support and for now only keep support for non-masked ones. I will add this support when there will be more focus on the masked versions once VectorAPI with Float16Vector is integrated with mainline. Hope this is ok. > > Sounds good to me. But I'm worried it may crash with bad ad file on AArch64 if the Vector API java and compiler IR part is ready for HF types, while the AArch64 relative masked rules are missing. Beacause the masked vector IR have been generated, while the codegen is missing on AArch64. We have to add the HF ops to `match_rule_supported_vector_masked` first, and then remove them when adding the masked version rules. WDYT? It's a good idea to disable masked ops in the backend. That should help not generate the masked IR in the inline expanders. I'll update this PR soon. Thanks for your comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25096#discussion_r2084333542 From jbhateja at openjdk.org Mon May 12 10:59:29 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 12 May 2025 10:59:29 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v3] In-Reply-To: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: > This is a follow-up PR#22755 to improve Float16 operations inferencing. > > The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Enabling some test points ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24179/files - new: https://git.openjdk.org/jdk/pull/24179/files/4a7a519b..016f0bc1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=01-02 Stats: 22 lines in 3 files changed: 5 ins; 7 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24179.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24179/head:pull/24179 PR: https://git.openjdk.org/jdk/pull/24179 From mhaessig at openjdk.org Mon May 12 11:31:56 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 12 May 2025 11:31:56 GMT Subject: RFR: 8355708: Two Float16 IR tests fail after JDK-8345125 In-Reply-To: References: Message-ID: On Fri, 9 May 2025 13:17:42 GMT, Bhavana Kilambi wrote: > Two FP16 tests fail due to IR verification failure in JTREG. Increased the warmup time to 10000 to make sure it is being compiled by c2 and the expected IR is being generated. > > Testing: > Tested both the testcases with and without these options - `"-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation"` and they pass successfully on aarch64. The tests all passed. ------------- Marked as reviewed by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/25141#pullrequestreview-2832886166 From mli at openjdk.org Mon May 12 11:42:23 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 12 May 2025 11:42:23 GMT Subject: RFR: 8350960: RISC-V: Add riscv backend for Float16 operations - vectorization Message-ID: Hi, Can you help to review this patch? It's a follow-up of https://github.com/openjdk/jdk/commit/9a3f9997b68a1f64e53b9711b878fb073c3c9b90. Thanks! ## Test performance test running in progress ... ------------- Commit messages: - assert - initial commit Changes: https://git.openjdk.org/jdk/pull/25181/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25181&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350960 Stats: 154 lines in 3 files changed: 142 ins; 1 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/25181.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25181/head:pull/25181 PR: https://git.openjdk.org/jdk/pull/25181 From bkilambi at openjdk.org Mon May 12 11:42:51 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 12 May 2025 11:42:51 GMT Subject: RFR: 8355708: Two Float16 IR tests fail after JDK-8345125 In-Reply-To: References: Message-ID: On Fri, 9 May 2025 13:17:42 GMT, Bhavana Kilambi wrote: > Two FP16 tests fail due to IR verification failure in JTREG. Increased the warmup time to 10000 to make sure it is being compiled by c2 and the expected IR is being generated. > > Testing: > Tested both the testcases with and without these options - `"-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation"` and they pass successfully on aarch64. Thanks everyone for your reviews. Can I ask any of you to please sponsor this patch? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25141#issuecomment-2872210187 From duke at openjdk.org Mon May 12 11:42:51 2025 From: duke at openjdk.org (duke) Date: Mon, 12 May 2025 11:42:51 GMT Subject: RFR: 8355708: Two Float16 IR tests fail after JDK-8345125 In-Reply-To: References: Message-ID: <9UM32TyJjyKXjegCfB3MG_knEg2W-iOGmy8IZmAw2LA=.17c391f7-d7b7-4933-ad2a-6aae2840feb8@github.com> On Fri, 9 May 2025 13:17:42 GMT, Bhavana Kilambi wrote: > Two FP16 tests fail due to IR verification failure in JTREG. Increased the warmup time to 10000 to make sure it is being compiled by c2 and the expected IR is being generated. > > Testing: > Tested both the testcases with and without these options - `"-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation"` and they pass successfully on aarch64. @Bhavana-Kilambi Your change (at version 6485c0e1668d3e5d384e74c8d1fc9574d5e546b2) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25141#issuecomment-2872216373 From mhaessig at openjdk.org Mon May 12 11:45:51 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 12 May 2025 11:45:51 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock [v2] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 23:56:14 GMT, Dean Long wrote: > so it's probably better to treat jsr the same as goto and add it to falls_through. I added `jsr` to `falls_through()`. > The comment justifying this logic claims the compiler can emit debug information with the "after" state instead of the "before" state. If this is not true, then we can remove this VerifyStack logic. If it is true, it would be good to understand exactly under what circumstances it can happen. Isn't this just what is happening in this particular bug? But it is not the debug info that is emitted before or after and only the `bci` that is rarely advanced. Granted, I have not been able to look into that in depth, because I am unfamiliar with those debuginfo emitting codepaths, but a cursory look seemed to suggest that it is always emitted at the end of `PhaseOutput::Process_OopMap_Node` which would be the same for all bytecodes. > It seems like it would be safe to skip the falls_through/try_next_mask logic for the top frame if we are reexecuting at the current bci, but that is a riskier change. I agree, but I don's see a good way to find out if the `bci` has been moved. A hacky solution might be to check if the deopt reason is `assert_null_or_unreachable0` since the `bci` is only moved in some deopts with that reason. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25118#issuecomment-2872235549 From jbhateja at openjdk.org Mon May 12 12:17:11 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 12 May 2025 12:17:11 GMT Subject: RFR: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry [v2] In-Reply-To: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: > PR adds missing EVEX compressed displacement attributes used for computing the scale factor (N) of compressed displacement. > AVX512 memory operand instructions use compressed disp8 encoding if the displacement is a multiple of scale (N), which depends on Vector Length, embedded broadcasting, and lane size. Please refer to section 2.7.5 of Intel SDM for more details. > > e.g., Consider two instructions, one with displacement 0x10203040 and the other with displacement 0x40, instruction operates over full 64-byte vector hence scale N = 64. Displacement of latter instruction is a multiple of scale, thus can be represented by 1 byte displacement encoding, while the former requires 4 bytes to represent displacement in instruction encoding. > > > 1) vpternlogq $0xff,0x10203040(%r20,%r21,8),%zmm23,%zmm24 > EVEX OP MR SIB DISP IMM > --------------|----|----|----|---------------|-----| > 62 6b c1 40 25 84 ec 40 30 20 10 ff > > 2) vpternlogq $0xff,0x40(%r20,%r21,8),%zmm23,%zmm24 > For full vector width operation, scalar matches with vector size, hence scale N = 64 > effective displacement / compressed DISP8 = OFFSET(64) / 64 = 0x1 > EVEX OP MR SIB DISP IMM > -------------|----|---|---|-----------|---| > 62 6b c1 40 25 44 ec 01 ff > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Correcting tuple types in some assembler routines ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25021/files - new: https://git.openjdk.org/jdk/pull/25021/files/127ff6f1..3e0f0410 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25021&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25021&range=00-01 Stats: 47 lines in 1 file changed: 1 ins; 0 del; 46 mod Patch: https://git.openjdk.org/jdk/pull/25021.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25021/head:pull/25021 PR: https://git.openjdk.org/jdk/pull/25021 From jbhateja at openjdk.org Mon May 12 12:17:11 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 12 May 2025 12:17:11 GMT Subject: RFR: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry In-Reply-To: References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: On Mon, 12 May 2025 06:33:40 GMT, Emanuel Peter wrote: > @jatin-bhateja I'll run some internal testing, please ping me in 24h for results! :) Please use the latest version ------------- PR Comment: https://git.openjdk.org/jdk/pull/25021#issuecomment-2872313317 From jbhateja at openjdk.org Mon May 12 12:17:11 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 12 May 2025 12:17:11 GMT Subject: RFR: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry [v2] In-Reply-To: References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: <9eOTeCgUZb5gOKid64hu3ZjC_IcFm7K8-hrl9pOnTjQ=.5bff4fb9-525c-4da5-9213-fbed1a6fccfd@github.com> On Tue, 6 May 2025 16:40:31 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Correcting tuple types in some assembler routines > > src/hotspot/cpu/x86/assembler_x86.cpp line 11571: > >> 11569: attributes.set_is_evex_instruction(); >> 11570: attributes.set_embedded_opmask_register_specifier(mask); >> 11571: attributes.set_address_attributes(/* tuple_type */ EVEX_FV, /* input_size_in_bits */ EVEX_NObit); > > @jatin-bhateja How are these `perm` cases related to the `min / max` cases that were reported? I did also not find a test below. @eme64 , it seems this routine is not being used in JIT code currently. I am making it functionally correct. There are few other cases which I have discovered and I think its appropriate to correct those as well along with this PR. I am modifying the PR title to appropriately match with the patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25021#discussion_r2084534821 From mhaessig at openjdk.org Mon May 12 12:18:57 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 12 May 2025 12:18:57 GMT Subject: RFR: 8356642: RISC-V: enable hotspot/jtreg/compiler/vectorapi/VectorFusedMultiplyAddSubTest.java In-Reply-To: References: Message-ID: On Fri, 9 May 2025 13:47:41 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > API comibnation like `av.lanewise(VectorOperators.FMA, bv.neg(), cv, mask).intoArray(fr, i);` should be able to be optimized to some special instruct rather than a bunch of instructs. > > We can not verify IR node (via `beforeMatchingNameRegex`) in PrintIdeal phase, as it does not mean that the special instruct are generated in FINAL CODE phase, it could be a combination of multiple different instruct. > > As riscv use different instruct names, so I create new ones in IRNode.java. > > Thanks The tests passed. ------------- Marked as reviewed by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/25145#pullrequestreview-2833003074 From mli at openjdk.org Mon May 12 12:28:05 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 12 May 2025 12:28:05 GMT Subject: RFR: 8356642: RISC-V: enable hotspot/jtreg/compiler/vectorapi/VectorFusedMultiplyAddSubTest.java In-Reply-To: References: Message-ID: On Mon, 12 May 2025 12:15:56 GMT, Manuel H?ssig wrote: > The tests passed. Thank you for tesing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25145#issuecomment-2872351153 From mli at openjdk.org Mon May 12 12:28:05 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 12 May 2025 12:28:05 GMT Subject: Integrated: 8356642: RISC-V: enable hotspot/jtreg/compiler/vectorapi/VectorFusedMultiplyAddSubTest.java In-Reply-To: References: Message-ID: <0hEdCPTGYXHTcFqehTlBKHmiG2QuZJNP-tDMWZRYYb8=.104dc88b-fca7-4745-bfdd-0d4478f267cc@github.com> On Fri, 9 May 2025 13:47:41 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > > API comibnation like `av.lanewise(VectorOperators.FMA, bv.neg(), cv, mask).intoArray(fr, i);` should be able to be optimized to some special instruct rather than a bunch of instructs. > > We can not verify IR node (via `beforeMatchingNameRegex`) in PrintIdeal phase, as it does not mean that the special instruct are generated in FINAL CODE phase, it could be a combination of multiple different instruct. > > As riscv use different instruct names, so I create new ones in IRNode.java. > > Thanks This pull request has now been integrated. Changeset: 8545e135 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/8545e1357142db2e008970095a3f74f8121dbcf2 Stats: 57 lines in 2 files changed: 34 ins; 0 del; 23 mod 8356642: RISC-V: enable hotspot/jtreg/compiler/vectorapi/VectorFusedMultiplyAddSubTest.java Reviewed-by: fyang, fjiang, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/25145 From shade at openjdk.org Mon May 12 13:01:52 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 May 2025 13:01:52 GMT Subject: RFR: 8356192: Enable AOT code caching only on supported platforms [v2] In-Reply-To: References: <0jJANchQvZPwpb-L02B47hWK9eUIPoZviMEWx1a4Gpo=.d75f0315-bda0-48a7-bffe-4a3af898e24d@github.com> Message-ID: On Fri, 9 May 2025 23:23:04 GMT, Vladimir Kozlov wrote: >> @TheRealMDoerr reported failures in `runtime/cds/appcds` testing on PPC64 after [JDK-8350209](https://bugs.openjdk.org/browse/JDK-8350209) integration. >> >> AOT code caching should be limited to supported platforms: x64 and aarch64. >> >> Testing: GHA > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Limit platforms to run AOTCode test src/hotspot/share/code/aotCodeCache.cpp line 92: > 90: > 91: void AOTCodeCache::initialize() { > 92: #if !(defined(AMD64) || defined(AARCH64)) Sounds like we need to also take care about Zero? E.g. `defined(ZERO) || (!defined(AMD64) && !defined(AARCH64))`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25158#discussion_r2084615238 From shade at openjdk.org Mon May 12 13:01:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 May 2025 13:01:53 GMT Subject: RFR: 8356192: Enable AOT code caching only on supported platforms [v2] In-Reply-To: References: <0jJANchQvZPwpb-L02B47hWK9eUIPoZviMEWx1a4Gpo=.d75f0315-bda0-48a7-bffe-4a3af898e24d@github.com> Message-ID: On Mon, 12 May 2025 12:56:07 GMT, Aleksey Shipilev wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Limit platforms to run AOTCode test > > src/hotspot/share/code/aotCodeCache.cpp line 92: > >> 90: >> 91: void AOTCodeCache::initialize() { >> 92: #if !(defined(AMD64) || defined(AARCH64)) > > Sounds like we need to also take care about Zero? E.g. `defined(ZERO) || (!defined(AMD64) && !defined(AARCH64))`? Or invert it and put it at `else`, reads better and matches other blocks around Hotspot. #if !defined(ZERO) && (defined(AMD64) || defined(AARCH64)) ... #else log_info(aot, codecache, init)("AOT Code Cache is not supported on this platform."); AOTAdapterCaching = false; return; #endif ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25158#discussion_r2084619766 From mhaessig at openjdk.org Mon May 12 13:08:24 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 12 May 2025 13:08:24 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases Message-ID: This PR introduces the flag `-XX:PrintPhaseLevel` that works like the flag `-XX:PrintIdealGraphLevel` and prints the name phases of a C2 compilation (essentially what we have in the left side bar in IGV) to the terminal. This allows redirecting the output to a file and comparing phase decisions between two compilations. Further, it is useful in conjunction with loop opts tracing to immediately see in which phase a certain optimization happened.

Output with `-XX:PrintPhaseLevel=2` > java-fastdebug -Xbatch -XX:CompileCommand=compileonly,TestLoop.test10 -XX:CompileCommand=printcompilation,TestLoop.test* -XX:PrintPhaseLevel=2 TestLoop.java CompileCommand: compileonly TestLoop.test10 bool compileonly = true CompileCommand: PrintCompilation TestLoop.test* bool PrintCompilation = true 3577 98 % b 3 TestLoop::test10 @ 2 (64 bytes) 3584 99 b 3 TestLoop::test10 (64 bytes) 3648 100 % b 4 TestLoop::test10 @ 2 (64 bytes) 1. After Parsing 2. Iter GVN 1 3. Incremental Inline 4. Incremental Boxing Inline 5. Before Loop Optimizations 6. PhaseIdealLoop 1 7. PhaseIdealLoop 2 8. PhaseIdealLoop 3 9. Before PhaseCCP 1 10. PhaseCCP 1 11. Iter GVN 2 12. PhaseIdealLoop iterations 13. After Loop Optimizations 14. After Macro Expansion 15. Barrier expand 16. Optimize finished 17. Before matching 18. After matching 19. Global code motion 20. Register Allocation 21. Final Code 3668 103 b 4 TestLoop::test10 (64 bytes) 1. After Parsing 2. Iter GVN 1 3. Incremental Inline 4. Incremental Boxing Inline 5. Before Loop Optimizations 6. PhaseIdealLoop 1 7. PhaseIdealLoop 2 8. PhaseIdealLoop 3 9. Before PhaseCCP 1 10. PhaseCCP 1 11. Iter GVN 2 12. PhaseIdealLoop iterations 13. PhaseIdealLoop iterations 2 14. PhaseIdealLoop iterations 3 15. PhaseIdealLoop iterations 4 16. PhaseIdealLoop iterations 5 17. PhaseIdealLoop iterations 6 18. PhaseIdealLoop iterations 7 19. PhaseIdealLoop iterations 8 20. PhaseIdealLoop iterations 9 21. After Loop Optimizations 22. After Macro Expansion 23. Barrier expand 24. Optimize finished 25. Before matching 26. After matching 27. Global code motion 28. Register Allocation 29. Final Code
Output with `-XX:PrintPhaseLevel=2` in conjunction with loop opt tracing > java-fastdebug -Xbatch -XX:CompileCommand=compileonly,TestLoop.test* -XX:CompileCommand=printcompilation,TestLoop.test* -XX:+TraceNewVectors -XX:+TraceLoopOpts -XX:PrintPhaseLevel=2 TestLoop.java CompileCommand: compileonly TestLoop.test* bool compileonly = true CompileCommand: PrintCompilation TestLoop.test* bool PrintCompilation = true 3016 98 % b 3 TestLoop::test10 @ 2 (64 bytes) 3040 99 b 3 TestLoop::test10 (64 bytes) 3097 100 % b 4 TestLoop::test10 @ 2 (64 bytes) 1. After Parsing 2. Iter GVN 1 3. Incremental Inline 4. Incremental Boxing Inline 5. Before Loop Optimizations Loop: N0/N0 has_sfpt Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } Predicate IC Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } Predicate IC Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } Predicate IC Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } Predicate IC Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } Predicate IC Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } Predicate IC Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } Predicate IC Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } Predicate IC Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } Predicate IC Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } Predicate IC Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } Predicate IC Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } Predicate IC Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } Predicate IC Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } Predicate IC Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } Predicate IC Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } Predicate IC Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } Predicate IC Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } 6. PhaseIdealLoop 1 Loop: N0/N0 has_sfpt Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } 7. PhaseIdealLoop 2 Loop: N0/N0 has_sfpt Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } 8. PhaseIdealLoop 3 9. Before PhaseCCP 1 10. PhaseCCP 1 11. Iter GVN 2 Loop: N0/N0 has_sfpt Loop: N1894/N1845 limit_check profile_predicated predicated sfpts={ 1845 } PredicatesOff 12. PhaseIdealLoop iterations Loop: N0/N0 has_sfpt Loop: N1894/N1845 profile_predicated predicated sfpts={ 1845 } 13. After Loop Optimizations 14. After Macro Expansion 15. Barrier expand 16. Optimize finished 17. Before matching 18. After matching 19. Global code motion 20. Register Allocation 21. Final Code 3126 103 b 4 TestLoop::test10 (64 bytes) 1. After Parsing 2. Iter GVN 1 3. Incremental Inline 4. Incremental Boxing Inline 5. Before Loop Optimizations Loop: N0/N0 has_sfpt Loop: N1912/N1855 limit_check profile_predicated predicated sfpts={ 1777 } 6. PhaseIdealLoop 1 Counted Loop: N1924/N1855 limit_check profile_predicated predicated sfpts={ 1777 } Loop: N0/N0 has_sfpt Loop: N1924/N1855 limit_check profile_predicated predicated sfpts={ 1777 } Predicate IC Loop: N1924/N1855 limit_check profile_predicated predicated sfpts={ 1777 } Predicate IC Loop: N1924/N1855 limit_check profile_predicated predicated sfpts={ 1777 } Predicate IC Loop: N1924/N1855 limit_check profile_predicated predicated sfpts={ 1777 } Predicate IC Loop: N1924/N1855 limit_check profile_predicated predicated sfpts={ 1777 } Predicate IC Loop: N1924/N1855 limit_check profile_predicated predicated sfpts={ 1777 } Predicate IC Loop: N1924/N1855 limit_check profile_predicated predicated sfpts={ 1777 } Predicate IC Loop: N1924/N1855 limit_check profile_predicated predicated sfpts={ 1777 } Predicate IC Loop: N1924/N1855 limit_check profile_predicated predicated sfpts={ 1777 } Predicate IC Loop: N1924/N1855 limit_check profile_predicated predicated sfpts={ 1777 } Predicate IC Loop: N1924/N1855 limit_check profile_predicated predicated sfpts={ 1777 } Predicate IC Loop: N1924/N1855 limit_check profile_predicated predicated sfpts={ 1777 } Predicate IC Loop: N1924/N1855 limit_check profile_predicated predicated sfpts={ 1777 } Predicate IC Loop: N1924/N1855 limit_check profile_predicated predicated sfpts={ 1777 } Predicate IC Loop: N1924/N1855 limit_check profile_predicated predicated sfpts={ 1777 } Predicate IC Loop: N1924/N1855 limit_check profile_predicated predicated sfpts={ 1777 } Predicate IC Loop: N1924/N1855 limit_check profile_predicated predicated sfpts={ 1777 } 7. PhaseIdealLoop 2 Loop: N0/N0 has_sfpt Loop: N1924/N1855 limit_check profile_predicated predicated sfpts={ 1777 } Peel Loop: N2112/N1855 sfpts={ 1777 } Exceeding node budget: 0 < 141 8. PhaseIdealLoop 3 9. Before PhaseCCP 1 10. PhaseCCP 1 11. Iter GVN 2 Counted Loop: N2288/N1855 counted [1,int),+1 (-1 iters) Loop: N0/N0 has_sfpt Loop: N2088/N2085 limit_check profile_predicated predicated sfpts={ 2168 } Loop: N2287/N2286 limit_check profile_predicated predicated Loop: N2288/N1855 limit_check profile_predicated predicated counted [1,int),+1 (-1 iters) has_sfpt strip_mined Predicate RC Loop: N2288/N1855 limit_check profile_predicated predicated counted [1,int),+1 (40920 iters) has_sfpt rce strip_mined 12. PhaseIdealLoop iterations Loop: N0/N0 has_sfpt Loop: N2088/N2085 limit_check profile_predicated predicated sfpts={ 2168 } Loop: N2287/N2286 limit_check profile_predicated predicated sfpts={ 2289 } Loop: N2288/N1855 limit_check profile_predicated predicated counted [1,int),+1 (40920 iters) rc has_sfpt strip_mined PreMainPost Loop: N2288/N1855 limit_check profile_predicated predicated counted [1,int),+1 (40920 iters) rc has_sfpt strip_mined Unroll 2 Loop: N2288/N1855 limit_check counted [int,int),+1 (40920 iters) main rc has_sfpt strip_mined 13. PhaseIdealLoop iterations 2 Loop: N0/N0 has_sfpt Loop: N2088/N2085 limit_check profile_predicated predicated sfpts={ 2168 } Loop: N2499/N2498 limit_check profile_predicated predicated counted [1,int),+1 (4 iters) pre rc Loop: N2287/N2286 limit_check sfpts={ 2289 } Loop: N2611/N1855 limit_check counted [int,int),+2 (40920 iters) main rc has_sfpt strip_mined Loop: N2402/N2401 limit_check counted [int,int),+1 (4 iters) post rc Unroll 4 Loop: N2611/N1855 limit_check counted [int,int),+2 (40920 iters) main rc has_sfpt rce strip_mined 14. PhaseIdealLoop iterations 3 Loop: N0/N0 has_sfpt Loop: N2088/N2085 limit_check profile_predicated predicated sfpts={ 2168 } Loop: N2499/N2498 limit_check profile_predicated predicated counted [1,int),+1 (4 iters) pre rc Loop: N2287/N2286 limit_check sfpts={ 2289 } Loop: N2741/N1855 limit_check counted [int,int),+4 (40920 iters) main rc has_sfpt strip_mined Loop: N2402/N2401 limit_check counted [int,int),+1 (4 iters) post rc Peel Loop: N2875/N1855 has_sfpt rce Exceeding node budget: 0 < 258 15. PhaseIdealLoop iterations 4 Counted Loop: N3236/N1855 counted [4,int),+4 (-1 iters) Loop: N0/N0 has_sfpt Loop: N2088/N2085 limit_check profile_predicated predicated sfpts={ 2168 } Loop: N2499/N2498 limit_check profile_predicated predicated counted [1,int),+1 (4 iters) pre rc Loop: N2852/N2849 limit_check sfpts={ 3057 } Loop: N3235/N3234 limit_check profile_predicated predicated Loop: N3236/N1855 limit_check profile_predicated predicated counted [4,int),+4 (-1 iters) has_sfpt strip_mined Loop: N2402/N2401 limit_check counted [int,int),+1 (4 iters) post rc Predicate RC Loop: N3236/N1855 limit_check profile_predicated predicated counted [4,int),+4 (40920 iters) has_sfpt rce strip_mined Predicate RC Loop: N3236/N1855 limit_check profile_predicated predicated counted [4,int),+4 (40920 iters) has_sfpt rce strip_mined Predicate RC Loop: N3236/N1855 limit_check profile_predicated predicated counted [4,int),+4 (40920 iters) has_sfpt rce strip_mined 16. PhaseIdealLoop iterations 5 Loop: N0/N0 has_sfpt Loop: N2088/N2085 limit_check profile_predicated predicated sfpts={ 2168 } Loop: N2499/N2498 limit_check profile_predicated predicated counted [1,int),+1 (4 iters) pre rc Loop: N2852/N2849 limit_check sfpts={ 3057 } Loop: N3235/N3234 limit_check profile_predicated predicated sfpts={ 3237 } Loop: N3236/N1855 limit_check profile_predicated predicated counted [4,int),+4 (40920 iters) rc has_sfpt strip_mined Loop: N2402/N2401 limit_check counted [int,int),+1 (4 iters) post rc PreMainPost Loop: N3236/N1855 limit_check profile_predicated predicated counted [4,int),+4 (40920 iters) rc has_sfpt strip_mined Unroll 2 Loop: N3236/N1855 limit_check counted [int,int),+4 (40920 iters) main rc has_sfpt strip_mined Exceeding node budget: 479 < 569 17. PhaseIdealLoop iterations 6 Loop: N0/N0 has_sfpt Loop: N2088/N2085 limit_check profile_predicated predicated sfpts={ 2168 } Loop: N2499/N2498 limit_check profile_predicated predicated counted [1,int),+1 (4 iters) pre rc Loop: N2852/N2849 limit_check sfpts={ 3057 } Loop: N3795/N3799 limit_check profile_predicated predicated counted [4,int),+4 (4 iters) pre rc Loop: N3235/N3234 limit_check sfpts={ 3237 } Loop: N4050/N1855 limit_check counted [int,int),+8 (40920 iters) main rc has_sfpt strip_mined Loop: N3562/N3566 limit_check counted [int,int),+4 (4 iters) post rc Loop: N2402/N2401 limit_check counted [int,int),+1 (4 iters) post rc Peel Loop: N4281/N1855 has_sfpt rce Exceeding node budget: 0 < 196 18. PhaseIdealLoop iterations 7 Counted Loop: N4512/N1855 counted [8,int),+8 (-1 iters) Loop: N0/N0 has_sfpt Loop: N2088/N2085 limit_check profile_predicated predicated sfpts={ 2168 } Loop: N2499/N2498 limit_check profile_predicated predicated counted [1,int),+1 (4 iters) pre rc Loop: N2852/N2849 limit_check sfpts={ 3057 } Loop: N3795/N3799 limit_check profile_predicated predicated counted [4,int),+4 (4 iters) pre rc Loop: N4258/N4255 limit_check sfpts={ 4402 } Loop: N4511/N4510 limit_check profile_predicated predicated Loop: N4512/N1855 limit_check profile_predicated predicated counted [8,int),+8 (-1 iters) has_sfpt strip_mined Loop: N3562/N3566 limit_check counted [int,int),+4 (4 iters) post rc Loop: N2402/N2401 limit_check counted [int,int),+1 (4 iters) post rc PreMainPost Loop: N4512/N1855 limit_check profile_predicated predicated counted [8,int),+8 (40920 iters) has_sfpt rce strip_mined RangeCheck Loop: N4512/N1855 counted [int,int),+8 (40920 iters) main has_sfpt rce strip_mined 19. PhaseIdealLoop iterations 8 Loop: N0/N0 has_sfpt Loop: N2088/N2085 limit_check profile_predicated predicated sfpts={ 2168 } Loop: N2499/N2498 limit_check profile_predicated predicated counted [1,int),+1 (4 iters) pre rc Loop: N2852/N2849 limit_check sfpts={ 3057 } Loop: N3795/N3799 limit_check profile_predicated predicated counted [4,int),+4 (4 iters) pre rc Loop: N4258/N4255 limit_check sfpts={ 4402 } Loop: N4731/N4737 limit_check profile_predicated predicated counted [8,int),+8 (4 iters) pre rc Loop: N4511/N4510 limit_check sfpts={ 4513 } Loop: N4512/N1855 limit_check counted [int,int),+8 (40920 iters) main rc has_sfpt strip_mined Loop: N4616/N4622 counted [int,int),+8 (4 iters) post rc Loop: N3562/N3566 limit_check counted [int,int),+4 (4 iters) post rc Loop: N2402/N2401 limit_check counted [int,int),+1 (4 iters) post rc PredicatesOff 20. PhaseIdealLoop iterations 9 Loop: N0/N0 has_sfpt Loop: N2088/N2085 predicated sfpts={ 2168 } Loop: N2499/N2498 predicated counted [1,int),+1 (4 iters) pre rc Loop: N2852/N2849 limit_check sfpts={ 3057 } Loop: N3795/N3799 predicated counted [4,int),+4 (4 iters) pre rc Loop: N4258/N4255 limit_check sfpts={ 4402 } Loop: N4731/N4737 counted [8,int),+8 (4 iters) pre rc Loop: N4511/N4510 limit_check sfpts={ 4513 } Loop: N4512/N1855 limit_check counted [int,int),+8 (40920 iters) main rc has_sfpt strip_mined Loop: N4616/N4622 counted [int,int),+8 (4 iters) post rc Loop: N3562/N3566 limit_check counted [int,int),+4 (4 iters) post rc Loop: N2402/N2401 limit_check counted [int,int),+1 (4 iters) post rc 21. After Loop Optimizations 22. After Macro Expansion 23. Barrier expand 24. Optimize finished 25. Before matching 26. After matching 27. Global code motion 28. Register Allocation 29. Final Code
## Testing - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14972614510) - [ ] tier1 through tier3 on Oracle supported platforms and OSs plus Oracle internal testing - [ ] tier1 through tier2 with `-XX:PrintPhaseLevel=4` ------------- Commit messages: - Introduce PrintPhaseLevel flag - Rename IR printing functions Changes: https://git.openjdk.org/jdk/pull/25183/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25183&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355970 Stats: 41 lines in 5 files changed: 33 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25183.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25183/head:pull/25183 PR: https://git.openjdk.org/jdk/pull/25183 From chagedorn at openjdk.org Mon May 12 13:11:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 12 May 2025 13:11:06 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v6] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 18:10:09 GMT, Daniel Lund?n wrote: >> The current documentation for `PhaseCFG::insert_anti_dependences` is difficult to follow and sometimes even misleading. We should ensure the method is appropriately documented. >> >> ### Changeset >> >> - Rename `PhaseCFG::insert_anti_dependences` to `PhaseCFG::raise_above_anti_dependences`. The purpose of `PhaseCFG::raise_above_anti_dependences` is twofold: raise the load's LCA so that the load is scheduled before anti-dependent stores, and if necessary add anti-dependence edges between the load and certain anti-dependent stores (to ensure we later "raise" the load before anti-dependent stores in LCM). The name `PhaseCFG::insert_anti_dependences` suggests that we only add anti-dependence edges. The name `PhaseCFG::raise_above_anti_dependences`, therefore, seems more appropriate. >> - Significantly add to and revise the source code documentation of `PhaseCFG::raise_above_anti_dependences`. >> - Add, move, and revise `assert`s in `PhaseCFG::raise_above_anti_dependences`, including improved `assert` messages in a few places. >> - In the main worklist loop of `PhaseCFG::raise_above_anti_dependences`: >> - Clean up how we identify the search root (avoid mutation). >> - Add a missing early exit for `Phi` nodes when `LCA == early`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14706896111) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/gcm.cpp > > Co-authored-by: Roberto Casta?eda Lozano Nice documentation! A lot of minor comments/suggestions while reading through the comments in the entire method. src/hotspot/share/opto/gcm.cpp line 666: > 664: }; > 665: > 666: //------------------------raise_above_anti_dependences--------------------------- These legacy headers can be removed when we touch them. Suggestion: src/hotspot/share/opto/gcm.cpp line 667: > 665: > 666: //------------------------raise_above_anti_dependences--------------------------- > 667: // Enforce a scheduling of the argument load that ensures anti-dependent stores I suggest to add `'` to make a mapping to the parameter: Suggestion: // Enforce a scheduling of the given 'load' that ensures anti-dependent stores Same on L670 but somehow I cannot make a suggestion there. Maybe due to the existing PR comment? src/hotspot/share/opto/gcm.cpp line 672: > 670: // The argument load has a current scheduling range in the dominator tree that > 671: // starts at the load's early block (computed in schedule_early) and ends at > 672: // the argument LCA block. However, there may still exist anti-dependent stores argument = load? Suggestion: // the load's LCA block. However, there may still exist anti-dependent stores src/hotspot/share/opto/gcm.cpp line 673: > 671: // starts at the load's early block (computed in schedule_early) and ends at > 672: // the argument LCA block. However, there may still exist anti-dependent stores > 673: // in between the early block and the LCA that overwrite memory that the load Suggestion: // between the early block and the LCA that overwrite memory that the load src/hotspot/share/opto/gcm.cpp line 679: > 677: // latest in the store's block, and > 678: // 2. if the load may get scheduled in the store's block, additionally insert > 679: // an anti-dependence edge from the load to the store to ensure LCM Maybe you can also mention here that this is done by adding a precedence edge: Suggestion: // an anti-dependence edge (i.e. precedence edge) from the load to the store to ensure LCM src/hotspot/share/opto/gcm.cpp line 685: > 683: // path relative to the load if there are no paths from early to LCA that go > 684: // through the store's block. Such stores are not anti-dependent, and there is > 685: // no need to update the LCA nor to add anti-dependence edges. Suggestion: // no need to update the load's LCA nor to add anti-dependence edges. src/hotspot/share/opto/gcm.cpp line 721: > 719: // > 720: // The raise_above_anti_dependences method returns the updated LCA and ensures > 721: // there are no anti-dependent stores between the load's early block and the Maybe to be explicit: Suggestion: // The raise_above_anti_dependences method returns the updated LCA and ensures // there are no anti-dependent stores in any block between the load's early block and the src/hotspot/share/opto/gcm.cpp line 724: > 722: // updated LCA. Any stores in the updated LCA will have new anti-dependence > 723: // edges back to the load. The caller may schedule the load in the LCA, or it > 724: // may hoist the load above the LCA, if it is not the early block. Suggestion: // may hoist the load above the LCA, if the updated LCA is not the early block. src/hotspot/share/opto/gcm.cpp line 725: > 723: // edges back to the load. The caller may schedule the load in the LCA, or it > 724: // may hoist the load above the LCA, if it is not the early block. > 725: Block* PhaseCFG::raise_above_anti_dependences(Block* LCA, Node* load, bool verify) { Could not hurt: Suggestion: Block* PhaseCFG::raise_above_anti_dependences(Block* LCA, Node* load, const bool verify) { src/hotspot/share/opto/gcm.cpp line 758: > 756: node_idx_t load_index = load->_idx; > 757: > 758: // Note the earliest legal placement of 'load', as determined by Note = get? Suggestion: // Get the earliest legal placement of 'load', as determined by src/hotspot/share/opto/gcm.cpp line 760: > 758: // Note the earliest legal placement of 'load', as determined by > 759: // the unique point in the dominator tree where all memory effects > 760: // and other inputs are first available. (Computed by schedule_early.) Suggestion: // and other inputs are first available (computed by schedule_early). src/hotspot/share/opto/gcm.cpp line 762: > 760: // and other inputs are first available. (Computed by schedule_early.) > 761: // For normal loads, 'early' is the shallowest place (dom graph wise) > 762: // to look for anti-deps between this load and any store. Just noticed when reading through the method. Cannot suggest since it's hidden: L766-768: - different than the schedule_early block in that it could be -> different from the schedule_early block when it is - anti-dependences -> anti-dependencies. src/hotspot/share/opto/gcm.cpp line 780: > 778: ResourceArea* area = Thread::current()->resource_area(); > 779: > 780: // Bookkeeping of possibly anti-dependent stores that we find outside of the Suggestion: // Bookkeeping of possibly anti-dependent stores that we find outside the src/hotspot/share/opto/gcm.cpp line 797: > 795: // The input load uses some memory state (initial_mem). > 796: Node* initial_mem = load->in(MemNode::Memory); > 797: // To find anti-dependences we must look for users of the same memory state. Suggestion: // To find anti-dependencies, we must look for users of the same memory state. src/hotspot/share/opto/gcm.cpp line 854: > 852: // - just past a MergeMem with the edge (MergeMem, use_mem_state). > 853: assert(def_mem_state == nullptr || def_mem_state == initial_mem || > 854: def_mem_state->is_MergeMem(), Suggestion: assert(def_mem_state == nullptr || def_mem_state == initial_mem || def_mem_state->is_MergeMem(), src/hotspot/share/opto/gcm.cpp line 857: > 855: "unexpected memory state"); > 856: > 857: uint op = use_mem_state->Opcode(); For good measure: Suggestion: const uint op = use_mem_state->Opcode(); src/hotspot/share/opto/gcm.cpp line 894: > 892: > 893: assert(!use_mem_state->is_MergeMem(), > 894: "use_mem_state should be either a store or a memory Phi"); Suggestion: assert(!use_mem_state->is_MergeMem(), "use_mem_state should be either a store or a memory Phi"); src/hotspot/share/opto/gcm.cpp line 964: > 962: if (use_mem_state->is_Phi()) { > 963: // We have reached a memory Phi node. On our search from initial_mem to > 964: // the Phi, we have found no anti-dependences (otherwise, we would have Suggestion: // the Phi, we have found no anti-dependencies (otherwise, we would have src/hotspot/share/opto/gcm.cpp line 973: > 971: // |?? > 972: // ||| > 973: // Phi How about: Suggestion: // def_mem_state // | // | ? ? // \ | / // Phi src/hotspot/share/opto/gcm.cpp line 1026: > 1024: } else if (use_mem_state_block != early) { > 1025: // We found an anti-dependent store outside the load's 'early' block. > 1026: // The store may be between the current LCA and earliest possible block Suggestion: // The store may be between the current LCA and the earliest possible block src/hotspot/share/opto/gcm.cpp line 1054: > 1052: } > 1053: } > 1054: // Worklist is now empty; we have visited all possible anti-dependences. Suggestion: // Worklist is now empty; we have visited all possible anti-dependencies. src/hotspot/share/opto/gcm.cpp line 1058: > 1056: // Finished if 'load' must be scheduled in its 'early' block. > 1057: // If we found any stores there, they have already been given > 1058: // precedence edges. Might be clearer since we always talked about anti-dependency edges while the concept to implement them are precedence edges. Suggestion: // anti-dependency edges. src/hotspot/share/opto/gcm.cpp line 1087: > 1085: // load from sinking past any block containing a store that may overwrite > 1086: // memory that the load must witness. > 1087: Suggestion: // // The raised LCA will be a lower bound for placing the load, preventing the // load from sinking past any block containing a store that may overwrite // memory that the load must witness. // ------------- PR Review: https://git.openjdk.org/jdk/pull/24926#pullrequestreview-2832880054 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084468882 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084471291 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084473750 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084474619 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084481473 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084486896 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084494420 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084496026 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084496602 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084585843 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084586372 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084595136 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084587574 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084599234 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084606423 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084608271 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084612824 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084620462 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084622731 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084628087 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084629817 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084630749 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2084634773 From shade at openjdk.org Mon May 12 13:11:17 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 May 2025 13:11:17 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v13] In-Reply-To: References: Message-ID: <2ydVKTAbomGLgJTwl-1jRBxgF4MRz0h-2CQmr9yHTxg=.0e094037-94b2-4627-92ef-01946fed014b@github.com> > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with two additional commits since the last revision: - More thorough locking and redefinition escape hatch - Fix build failures: add more headers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/1cdbed2b..ce737c5a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=11-12 Stats: 114 lines in 7 files changed: 58 ins; 20 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From rcastanedalo at openjdk.org Mon May 12 13:43:55 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 May 2025 13:43:55 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases In-Reply-To: References: Message-ID: On Mon, 12 May 2025 13:03:16 GMT, Manuel H?ssig wrote: > This PR introduces the flag `-XX:PrintPhaseLevel` that works like the flag `-XX:PrintIdealGraphLevel` and prints the name phases of a C2 compilation (essentially what we have in the left side bar in IGV) to the terminal. This allows redirecting the output to a file and comparing phase decisions between two compilations. Further, it is useful in conjunction with loop opts tracing to immediately see in which phase a certain optimization happened. > >
> Output with `-XX:PrintPhaseLevel=2` > > >> java-fastdebug -Xbatch -XX:CompileCommand=compileonly,TestLoop.test10 -XX:CompileCommand=printcompilation,TestLoop.test* -XX:PrintPhaseLevel=2 TestLoop.java > CompileCommand: compileonly TestLoop.test10 bool compileonly = true > CompileCommand: PrintCompilation TestLoop.test* bool PrintCompilation = true > 3577 98 % b 3 TestLoop::test10 @ 2 (64 bytes) > 3584 99 b 3 TestLoop::test10 (64 bytes) > 3648 100 % b 4 TestLoop::test10 @ 2 (64 bytes) > 1. After Parsing > 2. Iter GVN 1 > 3. Incremental Inline > 4. Incremental Boxing Inline > 5. Before Loop Optimizations > 6. PhaseIdealLoop 1 > 7. PhaseIdealLoop 2 > 8. PhaseIdealLoop 3 > 9. Before PhaseCCP 1 > 10. PhaseCCP 1 > 11. Iter GVN 2 > 12. PhaseIdealLoop iterations > 13. After Loop Optimizations > 14. After Macro Expansion > 15. Barrier expand > 16. Optimize finished > 17. Before matching > 18. After matching > 19. Global code motion > 20. Register Allocation > 21. Final Code > 3668 103 b 4 TestLoop::test10 (64 bytes) > 1. After Parsing > 2. Iter GVN 1 > 3. Incremental Inline > 4. Incremental Boxing Inline > 5. Before Loop Optimizations > 6. PhaseIdealLoop 1 > 7. PhaseIdealLoop 2 > 8. PhaseIdealLoop 3 > 9. Before PhaseCCP 1 > 10. PhaseCCP 1 > 11. Iter GVN 2 > 12. PhaseIdealLoop iterations > 13. PhaseIdealLoop iterations 2 > 14. PhaseIdealLoop iterations 3 > 15. PhaseIdealLoop iterations 4 > 16. PhaseIdealLoop iterations 5 > 17. PhaseIdealLoop iterations 6 > 18. PhaseIdealLoop iterations 7 > 19. PhaseIdealLoop iterations 8 > 20. PhaseIdealLoop iterations 9 > 21. After Loop Optimizations > 22. After Macro Expansion > 23. Barrier expand > 24. Optimize finished > 25. Before matching > 26. After matching > 27. Global code motion > 28. Register Allocation > 29. Final Code > >
> >
> Output with `-XX:PrintPhaseLevel=2` in conjunction with loo... Thanks for working on the Manuel, looks very useful! Have you considered using the Unified Logging (UL) instead of creating a new JVM flag for this? We already have `-Xlog:jit+compilation` that seems related to this. You might print the compile phase information with e.g. `-Xlog:jit+compilation=trace`, or add a new UL tag if necessary. We want to move towards using the UL framework in the JVM compiler components, now that the preparation work by @anton-seoane is completed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25183#issuecomment-2872627435 From epeter at openjdk.org Mon May 12 14:10:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 May 2025 14:10:21 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v14] In-Reply-To: References: Message-ID: <8eoNdEZP4FwcfD-imK12Uq6OJXZ509KKTejqxxYHnzA=.8b2b6d85-163e-4fe1-b88e-afb4e7bd2475@github.com> > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Refactoring TemplateWithArgs -> Filled/Unfilled Template ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/f09c5369..f240f9a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=12-13 Stats: 993 lines in 15 files changed: 366 ins; 313 del; 314 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From shade at openjdk.org Mon May 12 14:15:16 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 May 2025 14:15:16 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v14] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Fix release builds ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/ce737c5a..f239c221 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=12-13 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From epeter at openjdk.org Mon May 12 14:17:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 May 2025 14:17:11 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v15] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix links to Template ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/f240f9a6..78d32491 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=13-14 Stats: 27 lines in 6 files changed: 3 ins; 0 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Mon May 12 14:26:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 May 2025 14:26:37 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more documentation fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/78d32491..0871fcda Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=14-15 Stats: 7 lines in 2 files changed: 5 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Mon May 12 14:26:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 12 May 2025 14:26:42 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:59:48 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - Whitespace >> - Suggestions by Christian >> >> Co-authored-by: Christian Hagedorn >> - typo >> - For Christian: example and more intro >> - fix hashtag >> - manual merge >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - move library >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - ... and 6 more: https://git.openjdk.org/jdk/compare/0844745e...fae7ced6 > > Next batch of comments. Will probably resume tomorrow :-) @chhagedorn Ok, I tried my best with the `(Un)FilledTemplate` refactoring. I'm still not sure if I want to rename `FilledTemplate` to `RenderableTemplate`, it is not super satisfying for a beginner either. Naming is hard. If anybody else has a better idea than `(Un)FilledTemplate`, please let me know ;) I think one can continue reviewing this now! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2872783461 From mhaessig at openjdk.org Mon May 12 14:29:33 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 12 May 2025 14:29:33 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases [v2] In-Reply-To: References: Message-ID: > This PR introduces the flag `-XX:PrintPhaseLevel` that works like the flag `-XX:PrintIdealGraphLevel` and prints the name phases of a C2 compilation (essentially what we have in the left side bar in IGV) to the terminal. This allows redirecting the output to a file and comparing phase decisions between two compilations. Further, it is useful in conjunction with loop opts tracing to immediately see in which phase a certain optimization happened. > >
> Output with `-XX:PrintPhaseLevel=2` > > >> java-fastdebug -Xbatch -XX:CompileCommand=compileonly,TestLoop.test10 -XX:CompileCommand=printcompilation,TestLoop.test* -XX:PrintPhaseLevel=2 TestLoop.java > CompileCommand: compileonly TestLoop.test10 bool compileonly = true > CompileCommand: PrintCompilation TestLoop.test* bool PrintCompilation = true > 3577 98 % b 3 TestLoop::test10 @ 2 (64 bytes) > 3584 99 b 3 TestLoop::test10 (64 bytes) > 3648 100 % b 4 TestLoop::test10 @ 2 (64 bytes) > 1. After Parsing > 2. Iter GVN 1 > 3. Incremental Inline > 4. Incremental Boxing Inline > 5. Before Loop Optimizations > 6. PhaseIdealLoop 1 > 7. PhaseIdealLoop 2 > 8. PhaseIdealLoop 3 > 9. Before PhaseCCP 1 > 10. PhaseCCP 1 > 11. Iter GVN 2 > 12. PhaseIdealLoop iterations > 13. After Loop Optimizations > 14. After Macro Expansion > 15. Barrier expand > 16. Optimize finished > 17. Before matching > 18. After matching > 19. Global code motion > 20. Register Allocation > 21. Final Code > 3668 103 b 4 TestLoop::test10 (64 bytes) > 1. After Parsing > 2. Iter GVN 1 > 3. Incremental Inline > 4. Incremental Boxing Inline > 5. Before Loop Optimizations > 6. PhaseIdealLoop 1 > 7. PhaseIdealLoop 2 > 8. PhaseIdealLoop 3 > 9. Before PhaseCCP 1 > 10. PhaseCCP 1 > 11. Iter GVN 2 > 12. PhaseIdealLoop iterations > 13. PhaseIdealLoop iterations 2 > 14. PhaseIdealLoop iterations 3 > 15. PhaseIdealLoop iterations 4 > 16. PhaseIdealLoop iterations 5 > 17. PhaseIdealLoop iterations 6 > 18. PhaseIdealLoop iterations 7 > 19. PhaseIdealLoop iterations 8 > 20. PhaseIdealLoop iterations 9 > 21. After Loop Optimizations > 22. After Macro Expansion > 23. Barrier expand > 24. Optimize finished > 25. Before matching > 26. After matching > 27. Global code motion > 28. Register Allocation > 29. Final Code > >
> >
> Output with `-XX:PrintPhaseLevel=2` in conjunction with loo... Manuel H?ssig has updated the pull request incrementally with three additional commits since the last revision: - Make should_print_ideal_phase const - Clearer condition to prevent printing phases for stubs - Fix unintentional uintx ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25183/files - new: https://git.openjdk.org/jdk/pull/25183/files/d6403661..7e67b6f1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25183&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25183&range=00-01 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25183.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25183/head:pull/25183 PR: https://git.openjdk.org/jdk/pull/25183 From shade at openjdk.org Mon May 12 14:33:40 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 May 2025 14:33:40 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v15] In-Reply-To: References: Message-ID: <2LlyHKO14TOr7qVXQbyjy4ZWrGo8fCo3muVoa6VlFzc=.50816f66-e90b-4bb6-b953-64f6a675d664@github.com> > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: More touchups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/f239c221..33e545ea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=13-14 Stats: 26 lines in 3 files changed: 14 ins; 4 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From mhaessig at openjdk.org Mon May 12 14:36:34 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 12 May 2025 14:36:34 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases [v3] In-Reply-To: References: Message-ID: > This PR introduces the flag `-XX:PrintPhaseLevel` that works like the flag `-XX:PrintIdealGraphLevel` and prints the name phases of a C2 compilation (essentially what we have in the left side bar in IGV) to the terminal. This allows redirecting the output to a file and comparing phase decisions between two compilations. Further, it is useful in conjunction with loop opts tracing to immediately see in which phase a certain optimization happened. > >
> Output with `-XX:PrintPhaseLevel=2` > > >> java-fastdebug -Xbatch -XX:CompileCommand=compileonly,TestLoop.test10 -XX:CompileCommand=printcompilation,TestLoop.test* -XX:PrintPhaseLevel=2 TestLoop.java > CompileCommand: compileonly TestLoop.test10 bool compileonly = true > CompileCommand: PrintCompilation TestLoop.test* bool PrintCompilation = true > 3577 98 % b 3 TestLoop::test10 @ 2 (64 bytes) > 3584 99 b 3 TestLoop::test10 (64 bytes) > 3648 100 % b 4 TestLoop::test10 @ 2 (64 bytes) > 1. After Parsing > 2. Iter GVN 1 > 3. Incremental Inline > 4. Incremental Boxing Inline > 5. Before Loop Optimizations > 6. PhaseIdealLoop 1 > 7. PhaseIdealLoop 2 > 8. PhaseIdealLoop 3 > 9. Before PhaseCCP 1 > 10. PhaseCCP 1 > 11. Iter GVN 2 > 12. PhaseIdealLoop iterations > 13. After Loop Optimizations > 14. After Macro Expansion > 15. Barrier expand > 16. Optimize finished > 17. Before matching > 18. After matching > 19. Global code motion > 20. Register Allocation > 21. Final Code > 3668 103 b 4 TestLoop::test10 (64 bytes) > 1. After Parsing > 2. Iter GVN 1 > 3. Incremental Inline > 4. Incremental Boxing Inline > 5. Before Loop Optimizations > 6. PhaseIdealLoop 1 > 7. PhaseIdealLoop 2 > 8. PhaseIdealLoop 3 > 9. Before PhaseCCP 1 > 10. PhaseCCP 1 > 11. Iter GVN 2 > 12. PhaseIdealLoop iterations > 13. PhaseIdealLoop iterations 2 > 14. PhaseIdealLoop iterations 3 > 15. PhaseIdealLoop iterations 4 > 16. PhaseIdealLoop iterations 5 > 17. PhaseIdealLoop iterations 6 > 18. PhaseIdealLoop iterations 7 > 19. PhaseIdealLoop iterations 8 > 20. PhaseIdealLoop iterations 9 > 21. After Loop Optimizations > 22. After Macro Expansion > 23. Barrier expand > 24. Optimize finished > 25. Before matching > 26. After matching > 27. Global code motion > 28. Register Allocation > 29. Final Code > >
> >
> Output with `-XX:PrintPhaseLevel=2` in conjunction with loo... Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Make all of it const... ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25183/files - new: https://git.openjdk.org/jdk/pull/25183/files/7e67b6f1..2762193b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25183&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25183&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25183.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25183/head:pull/25183 PR: https://git.openjdk.org/jdk/pull/25183 From mhaessig at openjdk.org Mon May 12 14:39:51 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 12 May 2025 14:39:51 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases [v3] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 14:36:34 GMT, Manuel H?ssig wrote: >> This PR introduces the flag `-XX:PrintPhaseLevel` that works like the flag `-XX:PrintIdealGraphLevel` and prints the name phases of a C2 compilation (essentially what we have in the left side bar in IGV) to the terminal. This allows redirecting the output to a file and comparing phase decisions between two compilations. Further, it is useful in conjunction with loop opts tracing to immediately see in which phase a certain optimization happened. >> >>
>> Output with `-XX:PrintPhaseLevel=2` >> >> >>> java-fastdebug -Xbatch -XX:CompileCommand=compileonly,TestLoop.test10 -XX:CompileCommand=printcompilation,TestLoop.test* -XX:PrintPhaseLevel=2 TestLoop.java >> CompileCommand: compileonly TestLoop.test10 bool compileonly = true >> CompileCommand: PrintCompilation TestLoop.test* bool PrintCompilation = true >> 3577 98 % b 3 TestLoop::test10 @ 2 (64 bytes) >> 3584 99 b 3 TestLoop::test10 (64 bytes) >> 3648 100 % b 4 TestLoop::test10 @ 2 (64 bytes) >> 1. After Parsing >> 2. Iter GVN 1 >> 3. Incremental Inline >> 4. Incremental Boxing Inline >> 5. Before Loop Optimizations >> 6. PhaseIdealLoop 1 >> 7. PhaseIdealLoop 2 >> 8. PhaseIdealLoop 3 >> 9. Before PhaseCCP 1 >> 10. PhaseCCP 1 >> 11. Iter GVN 2 >> 12. PhaseIdealLoop iterations >> 13. After Loop Optimizations >> 14. After Macro Expansion >> 15. Barrier expand >> 16. Optimize finished >> 17. Before matching >> 18. After matching >> 19. Global code motion >> 20. Register Allocation >> 21. Final Code >> 3668 103 b 4 TestLoop::test10 (64 bytes) >> 1. After Parsing >> 2. Iter GVN 1 >> 3. Incremental Inline >> 4. Incremental Boxing Inline >> 5. Before Loop Optimizations >> 6. PhaseIdealLoop 1 >> 7. PhaseIdealLoop 2 >> 8. PhaseIdealLoop 3 >> 9. Before PhaseCCP 1 >> 10. PhaseCCP 1 >> 11. Iter GVN 2 >> 12. PhaseIdealLoop iterations >> 13. PhaseIdealLoop iterations 2 >> 14. PhaseIdealLoop iterations 3 >> 15. PhaseIdealLoop iterations 4 >> 16. PhaseIdealLoop iterations 5 >> 17. PhaseIdealLoop iterations 6 >> 18. PhaseIdealLoop iterations 7 >> 19. PhaseIdealLoop iterations 8 >> 20. PhaseIdealLoop iterations 9 >> 21. After Loop Optimizations >> 22. After Macro Expansion >> 23. Barrier expand >> 24. Optimize finished >> 25. Before matching >> 26. After matching >> 27. Global code motion >> 28. Registe... > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Make all of it const... After some offline discussions with @chhagedorn I made some changes: - fixed `Compile::_phase_count` having type `uintx` and fixed the related format string, - in `Compile::should_print_phase()` I now use `_method != nullptr` to detect that we are not compiling a stub, - made `Compile::should_print_ideal_phase()` const. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25183#issuecomment-2872838896 From rcastanedalo at openjdk.org Mon May 12 14:48:59 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 12 May 2025 14:48:59 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 6 May 2025 13:28:28 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). Thanks for looking at this PR, Emanuel! > It is a limitation that we require the first operation to be the memory access. But the alternative would probably be significantly more complicated, i.e. to track the location of all the memory locations. Right, I have prototyped this alternative in the wider context of [JDK-8344627](https://bugs.openjdk.org/browse/JDK-8344627) since it would be required for using writes as implicit null checks (both in ZGC and G1), and it indeed adds some complexity to `PhaseOutput` and other places (see https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-implicit-null-checks). I ran some preliminary experiments and could not see enough benefits to justify the additional complexity. > In our offline discussion, I had some hesitation about the case where the load is at the beginning, but the barrier may have more loads. I wondered: what if the first load does not trigger the NullPointerException, but a later load then encounters the null pointer. This cannot happen because the address we are loading from is constant through the barrier, see e.g. the code generated for a zLoadP in x64 (AT&T syntax): 0x00007514c47d6aa0: movq 0x10(%rsi), %rax ; main OOP load with implicit exception: dispatches to 0x00007514c47d6abe 0x00007514c47d6aa4: shrq $0xd, %rax ; uncolor, destroys the OOP loaded in %rax 0x00007514c47d6aa8: ja 0x36 ; jump to barrier stub (slow path) (...) 0x00007514c47d6abe: trigger uncommon trap (null_check) (...) barrier stub (slow path): 0x00007514c47d6ae4: movq 0x10(%rsi), %rax ; re-load OOP that was destroyed by uncoloring (...) ; call into runtime (ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded(oopDesc*, oop*)) 0x00007514c47d6b09: jmp -0x5d ; go back to main code section Note how the address we might fault on (triggering the implicit exception) is stored on `%rsi` (base address) + `0x10` (field offset), which is not changed between the main load and the slow-path reload. > I think I was also worried that we would re-load the pointer itself. Then the old pointer may be non-null, but once we load the pointer again it may be null because another thread changed the reference. But now I thought about that again: that would really violate the Java Memory Model, you cannot duplicate the load of the pointer. So I suppose rather we got the old pointer from somewhere, and then we check if that old pointer is still valid in the barrier, and if not, we somehow directly translate the old pointer to a new pointer? Is that what the oop map is used for? I am not sure I understand the question, could you perhaps re-formulate it using some example to make it more concrete? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2872870543 From mhaessig at openjdk.org Mon May 12 14:59:29 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 12 May 2025 14:59:29 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases [v4] In-Reply-To: References: Message-ID: > This PR introduces the flag `-XX:PrintPhaseLevel` that works like the flag `-XX:PrintIdealGraphLevel` and prints the name phases of a C2 compilation (essentially what we have in the left side bar in IGV) to the terminal. This allows redirecting the output to a file and comparing phase decisions between two compilations. Further, it is useful in conjunction with loop opts tracing to immediately see in which phase a certain optimization happened. > >
> Output with `-XX:PrintPhaseLevel=2` > > >> java-fastdebug -Xbatch -XX:CompileCommand=compileonly,TestLoop.test10 -XX:CompileCommand=printcompilation,TestLoop.test* -XX:PrintPhaseLevel=2 TestLoop.java > CompileCommand: compileonly TestLoop.test10 bool compileonly = true > CompileCommand: PrintCompilation TestLoop.test* bool PrintCompilation = true > 3577 98 % b 3 TestLoop::test10 @ 2 (64 bytes) > 3584 99 b 3 TestLoop::test10 (64 bytes) > 3648 100 % b 4 TestLoop::test10 @ 2 (64 bytes) > 1. After Parsing > 2. Iter GVN 1 > 3. Incremental Inline > 4. Incremental Boxing Inline > 5. Before Loop Optimizations > 6. PhaseIdealLoop 1 > 7. PhaseIdealLoop 2 > 8. PhaseIdealLoop 3 > 9. Before PhaseCCP 1 > 10. PhaseCCP 1 > 11. Iter GVN 2 > 12. PhaseIdealLoop iterations > 13. After Loop Optimizations > 14. After Macro Expansion > 15. Barrier expand > 16. Optimize finished > 17. Before matching > 18. After matching > 19. Global code motion > 20. Register Allocation > 21. Final Code > 3668 103 b 4 TestLoop::test10 (64 bytes) > 1. After Parsing > 2. Iter GVN 1 > 3. Incremental Inline > 4. Incremental Boxing Inline > 5. Before Loop Optimizations > 6. PhaseIdealLoop 1 > 7. PhaseIdealLoop 2 > 8. PhaseIdealLoop 3 > 9. Before PhaseCCP 1 > 10. PhaseCCP 1 > 11. Iter GVN 2 > 12. PhaseIdealLoop iterations > 13. PhaseIdealLoop iterations 2 > 14. PhaseIdealLoop iterations 3 > 15. PhaseIdealLoop iterations 4 > 16. PhaseIdealLoop iterations 5 > 17. PhaseIdealLoop iterations 6 > 18. PhaseIdealLoop iterations 7 > 19. PhaseIdealLoop iterations 8 > 20. PhaseIdealLoop iterations 9 > 21. After Loop Optimizations > 22. After Macro Expansion > 23. Barrier expand > 24. Optimize finished > 25. Before matching > 26. After matching > 27. Global code motion > 28. Register Allocation > 29. Final Code > >
> >
> Output with `-XX:PrintPhaseLevel=2` in conjunction with loo... Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: No tty lock needed for one print ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25183/files - new: https://git.openjdk.org/jdk/pull/25183/files/2762193b..a1b120f6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25183&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25183&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25183.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25183/head:pull/25183 PR: https://git.openjdk.org/jdk/pull/25183 From mhaessig at openjdk.org Mon May 12 14:59:30 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 12 May 2025 14:59:30 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases [v3] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 14:36:34 GMT, Manuel H?ssig wrote: >> This PR introduces the flag `-XX:PrintPhaseLevel` that works like the flag `-XX:PrintIdealGraphLevel` and prints the name phases of a C2 compilation (essentially what we have in the left side bar in IGV) to the terminal. This allows redirecting the output to a file and comparing phase decisions between two compilations. Further, it is useful in conjunction with loop opts tracing to immediately see in which phase a certain optimization happened. >> >>
>> Output with `-XX:PrintPhaseLevel=2` >> >> >>> java-fastdebug -Xbatch -XX:CompileCommand=compileonly,TestLoop.test10 -XX:CompileCommand=printcompilation,TestLoop.test* -XX:PrintPhaseLevel=2 TestLoop.java >> CompileCommand: compileonly TestLoop.test10 bool compileonly = true >> CompileCommand: PrintCompilation TestLoop.test* bool PrintCompilation = true >> 3577 98 % b 3 TestLoop::test10 @ 2 (64 bytes) >> 3584 99 b 3 TestLoop::test10 (64 bytes) >> 3648 100 % b 4 TestLoop::test10 @ 2 (64 bytes) >> 1. After Parsing >> 2. Iter GVN 1 >> 3. Incremental Inline >> 4. Incremental Boxing Inline >> 5. Before Loop Optimizations >> 6. PhaseIdealLoop 1 >> 7. PhaseIdealLoop 2 >> 8. PhaseIdealLoop 3 >> 9. Before PhaseCCP 1 >> 10. PhaseCCP 1 >> 11. Iter GVN 2 >> 12. PhaseIdealLoop iterations >> 13. After Loop Optimizations >> 14. After Macro Expansion >> 15. Barrier expand >> 16. Optimize finished >> 17. Before matching >> 18. After matching >> 19. Global code motion >> 20. Register Allocation >> 21. Final Code >> 3668 103 b 4 TestLoop::test10 (64 bytes) >> 1. After Parsing >> 2. Iter GVN 1 >> 3. Incremental Inline >> 4. Incremental Boxing Inline >> 5. Before Loop Optimizations >> 6. PhaseIdealLoop 1 >> 7. PhaseIdealLoop 2 >> 8. PhaseIdealLoop 3 >> 9. Before PhaseCCP 1 >> 10. PhaseCCP 1 >> 11. Iter GVN 2 >> 12. PhaseIdealLoop iterations >> 13. PhaseIdealLoop iterations 2 >> 14. PhaseIdealLoop iterations 3 >> 15. PhaseIdealLoop iterations 4 >> 16. PhaseIdealLoop iterations 5 >> 17. PhaseIdealLoop iterations 6 >> 18. PhaseIdealLoop iterations 7 >> 19. PhaseIdealLoop iterations 8 >> 20. PhaseIdealLoop iterations 9 >> 21. After Loop Optimizations >> 22. After Macro Expansion >> 23. Barrier expand >> 24. Optimize finished >> 25. Before matching >> 26. After matching >> 27. Global code motion >> 28. Registe... > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Make all of it const... One last point from the offline discussion with Christian: taking the `tty_lock` is not required for a single print. Thus, I simplified the `print_phase()` method and removed the `tty_lock`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25183#issuecomment-2872913452 From adinn at openjdk.org Mon May 12 15:19:57 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Mon, 12 May 2025 15:19:57 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 21:13:24 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - Fix win64 compile failures > > Signed-off-by: Ashutosh Mehra > - Fix AOTCodeFlags.java test > > Signed-off-by: Ashutosh Mehra > - Fix compile failure in minimal config > > Signed-off-by: Ashutosh Mehra > - Revert back changes that added AOTRuntimeConstants. > Ensure CompressedOops::base and CompressedKlssPointers::base does not > change in production run > > Signed-off-by: Ashutosh Mehra > - Fix merge conflicts > > Signed-off-by: Ashutosh Mehra > - Store/load AsmRemarks and DbgStrings in aot code cache > > Signed-off-by: Ashutosh Mehra > - Add missing external address in aarch64 > > Signed-off-by: Ashutosh Mehra > - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab Having discussed this with @fisk it appears that the weak reference load performed by the c2i adapters will not attempt a decode. The barrier load_at method only performs a decode when the decorators include `IN_HEAP`. `resolve_weak_handle` passes in the `IN_NATIVE` decorator which implies no decode should be performed. So, this means we can use the adapters even if the compressed oop base differs between training run and production run. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2872988728 From kvn at openjdk.org Mon May 12 15:35:14 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 12 May 2025 15:35:14 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v64] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 12:42:24 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add more intn_t tests This looks fine to me but to be on safe side lets push it into JDK 26 when it is forked. And I don't see link in RFE to recent testing of this. It needs to be tested in all tiers including tier10, xcomp and stress. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-2873033620 From kvn at openjdk.org Mon May 12 15:54:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 12 May 2025 15:54:51 GMT Subject: RFR: 8356192: Enable AOT code caching only on supported platforms [v2] In-Reply-To: References: <0jJANchQvZPwpb-L02B47hWK9eUIPoZviMEWx1a4Gpo=.d75f0315-bda0-48a7-bffe-4a3af898e24d@github.com> Message-ID: On Mon, 12 May 2025 12:58:37 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/code/aotCodeCache.cpp line 92: >> >>> 90: >>> 91: void AOTCodeCache::initialize() { >>> 92: #if !(defined(AMD64) || defined(AARCH64)) >> >> Sounds like we need to also take care about Zero? E.g. `defined(ZERO) || (!defined(AMD64) && !defined(AARCH64))`? > > Or invert it and put it at `else`, reads better and matches other blocks around Hotspot. > > > #if !defined(ZERO) && (defined(AMD64) || defined(AARCH64)) > ... > #else > log_info(aot, codecache, init)("AOT Code Cache is not supported on this platform."); > AOTAdapterCaching = false; > return; > #endif Right. Do we have check for Zero in tests? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25158#discussion_r2084987757 From kvn at openjdk.org Mon May 12 16:10:30 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 12 May 2025 16:10:30 GMT Subject: RFR: 8356192: Enable AOT code caching only on supported platforms [v3] In-Reply-To: <0jJANchQvZPwpb-L02B47hWK9eUIPoZviMEWx1a4Gpo=.d75f0315-bda0-48a7-bffe-4a3af898e24d@github.com> References: <0jJANchQvZPwpb-L02B47hWK9eUIPoZviMEWx1a4Gpo=.d75f0315-bda0-48a7-bffe-4a3af898e24d@github.com> Message-ID: > @TheRealMDoerr reported failures in `runtime/cds/appcds` testing on PPC64 after [JDK-8350209](https://bugs.openjdk.org/browse/JDK-8350209) integration. > > AOT code caching should be limited to supported platforms: x64 and aarch64. > > Testing: GHA Vladimir Kozlov has updated the pull request incrementally with two additional commits since the last revision: - Exclude Zero from test - Exclude Zero ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25158/files - new: https://git.openjdk.org/jdk/pull/25158/files/0394bbc6..850cf4f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25158&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25158&range=01-02 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25158.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25158/head:pull/25158 PR: https://git.openjdk.org/jdk/pull/25158 From shade at openjdk.org Mon May 12 16:10:30 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 May 2025 16:10:30 GMT Subject: RFR: 8356192: Enable AOT code caching only on supported platforms [v3] In-Reply-To: References: <0jJANchQvZPwpb-L02B47hWK9eUIPoZviMEWx1a4Gpo=.d75f0315-bda0-48a7-bffe-4a3af898e24d@github.com> Message-ID: On Mon, 12 May 2025 16:07:41 GMT, Vladimir Kozlov wrote: >> @TheRealMDoerr reported failures in `runtime/cds/appcds` testing on PPC64 after [JDK-8350209](https://bugs.openjdk.org/browse/JDK-8350209) integration. >> >> AOT code caching should be limited to supported platforms: x64 and aarch64. >> >> Testing: GHA > > Vladimir Kozlov has updated the pull request incrementally with two additional commits since the last revision: > > - Exclude Zero from test > - Exclude Zero Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25158#pullrequestreview-2833771502 From shade at openjdk.org Mon May 12 16:10:30 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 May 2025 16:10:30 GMT Subject: RFR: 8356192: Enable AOT code caching only on supported platforms [v2] In-Reply-To: References: <0jJANchQvZPwpb-L02B47hWK9eUIPoZviMEWx1a4Gpo=.d75f0315-bda0-48a7-bffe-4a3af898e24d@github.com> Message-ID: On Mon, 12 May 2025 15:52:04 GMT, Vladimir Kozlov wrote: > Do we have check for Zero in tests? I don't think we have a good way to do this with `@requires`, so it is fine to skip it for this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25158#discussion_r2085004455 From kvn at openjdk.org Mon May 12 16:10:30 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 12 May 2025 16:10:30 GMT Subject: RFR: 8356192: Enable AOT code caching only on supported platforms [v2] In-Reply-To: References: <0jJANchQvZPwpb-L02B47hWK9eUIPoZviMEWx1a4Gpo=.d75f0315-bda0-48a7-bffe-4a3af898e24d@github.com> Message-ID: <9_YpaMSnRV8XrV0jEkjQGc00dm1q002WxnafPF0gIX8=.880f633e-6c36-4f90-a7b4-49569f74ed29@github.com> On Mon, 12 May 2025 16:01:31 GMT, Aleksey Shipilev wrote: >> Right. Do we have check for Zero in tests? > >> Do we have check for Zero in tests? > > I don't think we have a good way to do this with `@requires`, so it is fine to skip it for this PR. I will keep current `#if/#else` order. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25158#discussion_r2085006755 From shade at openjdk.org Mon May 12 16:14:53 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 12 May 2025 16:14:53 GMT Subject: RFR: 8356192: Enable AOT code caching only on supported platforms [v3] In-Reply-To: References: <0jJANchQvZPwpb-L02B47hWK9eUIPoZviMEWx1a4Gpo=.d75f0315-bda0-48a7-bffe-4a3af898e24d@github.com> Message-ID: On Mon, 12 May 2025 16:10:30 GMT, Vladimir Kozlov wrote: >> @TheRealMDoerr reported failures in `runtime/cds/appcds` testing on PPC64 after [JDK-8350209](https://bugs.openjdk.org/browse/JDK-8350209) integration. >> >> AOT code caching should be limited to supported platforms: x64 and aarch64. >> >> Testing: GHA > > Vladimir Kozlov has updated the pull request incrementally with two additional commits since the last revision: > > - Exclude Zero from test > - Exclude Zero Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25158#pullrequestreview-2833791340 From kvn at openjdk.org Mon May 12 16:14:57 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 12 May 2025 16:14:57 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v14] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 21:50:34 GMT, Igor Veresov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24886#pullrequestreview-2833790759 From kvn at openjdk.org Mon May 12 16:21:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 12 May 2025 16:21:52 GMT Subject: RFR: 8356192: Enable AOT code caching only on supported platforms [v2] In-Reply-To: References: <0jJANchQvZPwpb-L02B47hWK9eUIPoZviMEWx1a4Gpo=.d75f0315-bda0-48a7-bffe-4a3af898e24d@github.com> Message-ID: On Mon, 12 May 2025 08:08:24 GMT, Martin Doerr wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Limit platforms to run AOTCode test > > LGTM. Thanks! Thank you @TheRealMDoerr and @shipilev for reviews. I will wait GHA finish before integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25158#issuecomment-2873201966 From kvn at openjdk.org Mon May 12 16:30:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 12 May 2025 16:30:55 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases [v4] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 14:59:29 GMT, Manuel H?ssig wrote: >> This PR introduces the flag `-XX:PrintPhaseLevel` that works like the flag `-XX:PrintIdealGraphLevel` and prints the name phases of a C2 compilation (essentially what we have in the left side bar in IGV) to the terminal. This allows redirecting the output to a file and comparing phase decisions between two compilations. Further, it is useful in conjunction with loop opts tracing to immediately see in which phase a certain optimization happened. >> >>
>> Output with `-XX:PrintPhaseLevel=2` >> >> >>> java-fastdebug -Xbatch -XX:CompileCommand=compileonly,TestLoop.test10 -XX:CompileCommand=printcompilation,TestLoop.test* -XX:PrintPhaseLevel=2 TestLoop.java >> CompileCommand: compileonly TestLoop.test10 bool compileonly = true >> CompileCommand: PrintCompilation TestLoop.test* bool PrintCompilation = true >> 3577 98 % b 3 TestLoop::test10 @ 2 (64 bytes) >> 3584 99 b 3 TestLoop::test10 (64 bytes) >> 3648 100 % b 4 TestLoop::test10 @ 2 (64 bytes) >> 1. After Parsing >> 2. Iter GVN 1 >> 3. Incremental Inline >> 4. Incremental Boxing Inline >> 5. Before Loop Optimizations >> 6. PhaseIdealLoop 1 >> 7. PhaseIdealLoop 2 >> 8. PhaseIdealLoop 3 >> 9. Before PhaseCCP 1 >> 10. PhaseCCP 1 >> 11. Iter GVN 2 >> 12. PhaseIdealLoop iterations >> 13. After Loop Optimizations >> 14. After Macro Expansion >> 15. Barrier expand >> 16. Optimize finished >> 17. Before matching >> 18. After matching >> 19. Global code motion >> 20. Register Allocation >> 21. Final Code >> 3668 103 b 4 TestLoop::test10 (64 bytes) >> 1. After Parsing >> 2. Iter GVN 1 >> 3. Incremental Inline >> 4. Incremental Boxing Inline >> 5. Before Loop Optimizations >> 6. PhaseIdealLoop 1 >> 7. PhaseIdealLoop 2 >> 8. PhaseIdealLoop 3 >> 9. Before PhaseCCP 1 >> 10. PhaseCCP 1 >> 11. Iter GVN 2 >> 12. PhaseIdealLoop iterations >> 13. PhaseIdealLoop iterations 2 >> 14. PhaseIdealLoop iterations 3 >> 15. PhaseIdealLoop iterations 4 >> 16. PhaseIdealLoop iterations 5 >> 17. PhaseIdealLoop iterations 6 >> 18. PhaseIdealLoop iterations 7 >> 19. PhaseIdealLoop iterations 8 >> 20. PhaseIdealLoop iterations 9 >> 21. After Loop Optimizations >> 22. After Macro Expansion >> 23. Barrier expand >> 24. Optimize finished >> 25. Before matching >> 26. After matching >> 27. Global code motion >> 28. Registe... > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > No tty lock needed for one print src/hotspot/share/opto/compile.hpp line 671: > 669: bool should_print_igv(int level); > 670: bool should_print_phase(int level) const; > 671: bool should_print_ideal_phase(CompilerPhaseType cpt) const; You can use macro `PRODUCT_RETURN_(return false;);` and put both methods under `#ifndef PRODUCT` in .cpp file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25183#discussion_r2085046167 From iklam at openjdk.org Mon May 12 17:20:56 2025 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 12 May 2025 17:20:56 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v14] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 21:50:34 GMT, Igor Veresov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments LGTM. Small nits about logging and testing. src/hotspot/share/cds/filemap.cpp line 1955: > 1953: " does not equal the current SpecTrapLimitExtraEntries setting (%d).", file_type, > 1954: _spec_trap_limit_extra_entries, SpecTrapLimitExtraEntries); > 1955: return false; The `log_info(cds)` should be replaced with `MetaspaceShared::report_loading_error`. (The few `log_info` lines above this block will be fixed in [JDK-8356807](https://bugs.openjdk.org/browse/JDK-8356807)) Also, could you add a new jtreg test case for this? You can see examples in `negativeTests` in the existing AOTFlags.java test case. I think you can add your checks into the new AOTProfileFlags.java test. ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24886#pullrequestreview-2833939411 PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2085115469 From mbaesken at openjdk.org Mon May 12 18:02:07 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Mon, 12 May 2025 18:02:07 GMT Subject: RFR: 8356778: Compiler add event logging in case of failures Message-ID: <2ADILKC05CmadEbbFmEJ9HIrkDEY0mfPc2XkumnuGMI=.3935341f-d8ef-4702-8b84-9aa4c7c36c2c@github.com> We should add event logging to some related hotspot methods. While testing this functionality it turned out that sometimes the CompileTask pointer is 0, so this needs to be check to avoid crashes. ------------- Commit messages: - JDK-8356778 Changes: https://git.openjdk.org/jdk/pull/25188/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25188&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356778 Stats: 24 lines in 3 files changed: 23 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25188.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25188/head:pull/25188 PR: https://git.openjdk.org/jdk/pull/25188 From asmehra at openjdk.org Mon May 12 18:24:30 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Mon, 12 May 2025 18:24:30 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v3] In-Reply-To: References: Message-ID: > [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. > This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. > `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. Ashutosh Mehra has updated the pull request incrementally with two additional commits since the last revision: - Add test for using AOTCodeCache with different CompressedOops configuration Signed-off-by: Ashutosh Mehra - Add check for compressed oops base address; minor refacotring Signed-off-by: Ashutosh Mehra ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25019/files - new: https://git.openjdk.org/jdk/pull/25019/files/ba612dab..9fcc91b3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25019&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25019&range=01-02 Stats: 233 lines in 5 files changed: 202 ins; 14 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/25019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25019/head:pull/25019 PR: https://git.openjdk.org/jdk/pull/25019 From lmesnik at openjdk.org Mon May 12 18:35:52 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 12 May 2025 18:35:52 GMT Subject: RFR: 8356702: CTW: Update modules In-Reply-To: <5_pxWyLzGtPZEDsJKkq6i5wFIemDsY-OeXTgkVO_kuk=.ed16944a-2e41-4c19-a27c-6c1a8269da42@github.com> References: <5_pxWyLzGtPZEDsJKkq6i5wFIemDsY-OeXTgkVO_kuk=.ed16944a-2e41-4c19-a27c-6c1a8269da42@github.com> Message-ID: On Mon, 12 May 2025 05:57:46 GMT, Evgeny Nikitin wrote: > This PR enhances CTW test wrappers generator in order to make it more user-friendly. Added features are: > > 1. Automatic scanning for modules list under `open/src` > 2. Automatic recognition of current year; > 3. Multi-wrapper modules support (allows for splitting huge modules into 2 and more wrappers) > 4. ability to exclude modules; > > The updated generator have been used to refresh JTReg module wrappers. > The most meaningful change is contained in the `generate.bash` > Testing: `open/test/hotspot/jtreg/applications/ctw/modules` with the supported platforms, no failures spotted. Changes requested by lmesnik (Reviewer). test/hotspot/jtreg/applications/ctw/modules/java_base.java line 2: > 1: /* > 2: * Copyright (c) 2017, 2025, Oracle and/or its affiliates. All rights reserved. It is not needed to update copyrights if nothing was changed in this file. Please just remove such files from commit. ------------- PR Review: https://git.openjdk.org/jdk/pull/25175#pullrequestreview-2834136754 PR Review Comment: https://git.openjdk.org/jdk/pull/25175#discussion_r2085234069 From dlong at openjdk.org Mon May 12 18:49:58 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 12 May 2025 18:49:58 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock [v3] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 09:25:43 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> This PR addresses an `assert(bb->is_reachable())` that is triggered in the code for `-XX:+VerifyStack` after a deoptimization with reason `null_assert_or_unreached0` at a `getstatic` bytecode. Following the `getstatic` is an `areturn` and then an unreachable bytecode. When the code for `VerifyStack` tries to compute an oop map for the basic block of the unreachable bytecode, the assert triggers: >> >> getstatic Field A.val:"LB"; // if class B is not loaded, C2 deopts with reason "null_assert_or_unreached0" >> areturn; >> // The following is unreachable >> iconst_0; >> >> >> This is a similar problem to [JDK-8271055](https://bugs.openjdk.org/browse/JDK-8271055) (#7331), but this particular deopt with reason `null_assert_or_unreached0` at `getstatic` of a field containing an object reference [deopts at the next bytecode](https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/opto/parse3.cpp#L176-L199). The aforementioned issue introduced a check to skip stack verification of the next bytecode in the code if the execution after the deopted bytecode does not continue at the next bytecode in the code, i.e. falls through to the next bytecode. Unfortunately, this check did not include `areturn` as a bytecode that does not fall-through: >> https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/runtime/deoptimization.cpp#L845-L856 >> >> # Change Summary >> >> To fix the immediate issue described above, this PR adds `areturn` to the list of bytecodes that does not fall through. However, all return bytecodes exhibit the same behavior and might be susceptible to a similar issue. Even though I was not able to reproduce the same crash with `{d,f,i,l}return` because I could not get those or the preceding bytecode to deopt, I also added them to the `falls_through()` function. For the remaining bytecodes in `falls_through()` with the exception of `athrow` I wrote a regression test. >> >> # Testing >> >> - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14595928439) >> - [x] tier1 through tier3 on Oracle supported platforms and OSs plus Oracle internal testing >> >> # Acknowledgements >> Special thanks to @eme64 for his hard work on reducing a reproducer that works on all platforms. > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Add jsr to falls_through() What makes the debug info "before" or "after" is the state of the stack relative to the bci. So if we are doing a getfield, and have already popped the object off the stack and pushed the result, then the state would be "after". However, we then advance the bci, making the state "before", because we are now on a different bytecode. Since uncommon traps have reexecute semantics, the interpreter will continue executing at that bci without advancing it first, so we need a "before" state in that case. That's why trying to compute an "after" state doesn't make sense to me for uncommon traps. They should all use a "before" state. In other deoptimization cases without reexecute semantics, we may cause the interpreter to advance to the bci that follows the one given in the debug info. In this situation, the debug info would need to be the "after" state, and bci in the debug info would need to be one that falls through. I plan to take a closer look at this VerifyStack code and possibly improve or remove it, but not as part of this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25118#issuecomment-2873653422 From iveresov at openjdk.org Mon May 12 19:29:18 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Mon, 12 May 2025 19:29:18 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v15] In-Reply-To: References: Message-ID: <-1w2fZRYCt4wwXkNQAmceLjlcPrE8URSeJKzB43PWBQ=.cb013019-c6ae-4cce-b750-c0d38e420c92@github.com> > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 45 commits: - Merge branch 'master' into pp2 - Address review comments - Merge branch 'master' into pp2 - Fix compile - Fix additional issues - Make sure command line flags that affect MDO layout are consistent - Fix semantics change from the previous commit - Port 8355915: [leyden] Crash in MDO clearing the unloaded array type - Fix flag behavior - Fix log tags - ... and 35 more: https://git.openjdk.org/jdk/compare/45dfc2c6...ae332887 ------------- Changes: https://git.openjdk.org/jdk/pull/24886/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=14 Stats: 3226 lines in 59 files changed: 3011 ins; 100 del; 115 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From heidinga at openjdk.org Mon May 12 20:05:55 2025 From: heidinga at openjdk.org (Dan Heidinga) Date: Mon, 12 May 2025 20:05:55 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v15] In-Reply-To: <-1w2fZRYCt4wwXkNQAmceLjlcPrE8URSeJKzB43PWBQ=.cb013019-c6ae-4cce-b750-c0d38e420c92@github.com> References: <-1w2fZRYCt4wwXkNQAmceLjlcPrE8URSeJKzB43PWBQ=.cb013019-c6ae-4cce-b750-c0d38e420c92@github.com> Message-ID: <931WapeeMqp8gbb720PmF0inR4k_kD9CL9UL9SGDyWY=.2406d164-dc7e-49ce-9e26-78f0a7f3f9e6@github.com> On Mon, 12 May 2025 19:29:18 GMT, Igor Veresov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 45 commits: > > - Merge branch 'master' into pp2 > - Address review comments > - Merge branch 'master' into pp2 > - Fix compile > - Fix additional issues > - Make sure command line flags that affect MDO layout are consistent > - Fix semantics change from the previous commit > - Port 8355915: [leyden] Crash in MDO clearing the unloaded array type > - Fix flag behavior > - Fix log tags > - ... and 35 more: https://git.openjdk.org/jdk/compare/45dfc2c6...ae332887 src/hotspot/share/cds/filemap.hpp line 120: > 118: bool _compressed_class_ptrs; // save the flag UseCompressedClassPointers > 119: int _narrow_klass_pointer_bits; // save number of bits in narrowKlass > 120: int _narrow_klass_shift; // save shift width used to pre-compute narrowKlass IDs in archived heap objectsa Suggestion: int _narrow_klass_shift; // save shift width used to pre-compute narrowKlass IDs in archived heap objects Minor typo, don't bother fixing if it will result in a re-review cycle ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2085360465 From asmehra at openjdk.org Mon May 12 20:10:13 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Mon, 12 May 2025 20:10:13 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v4] In-Reply-To: References: Message-ID: > [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. > This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. > `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. Ashutosh Mehra has updated the pull request incrementally with two additional commits since the last revision: - Remove more unused code Signed-off-by: Ashutosh Mehra - Fix whitespace issue. Remove unused code. Signed-off-by: Ashutosh Mehra ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25019/files - new: https://git.openjdk.org/jdk/pull/25019/files/9fcc91b3..98e5fa07 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25019&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25019&range=02-03 Stats: 11 lines in 1 file changed: 0 ins; 10 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25019/head:pull/25019 PR: https://git.openjdk.org/jdk/pull/25019 From dnsimon at openjdk.org Mon May 12 20:17:02 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 12 May 2025 20:17:02 GMT Subject: RFR: 8356447: Change default for EagerJVMCI to true Message-ID: By default, JVMCI and Graal initialization only occurs on the first top-tier (i.e. tier 4) JIT compilation request. This made sense prior to libgraal where the initialization was interpreted and so noticeably contributed to VM startup. However, with libgraal the initialization is sufficiently fast to not impact startup noticeably. The motivation for JVMCI and Graal eager initialization by default is to make Graal command line option processing happen in the same VM phase as handling of all other VM command line flags. That is, errors in Graal options should: 1. Happen deterministically, not just for apps that run long enough to trigger a top tier JIT compilation. For example: `java -XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT --version`. In a JDK build that does not include Graal, this may succeed (and print out the version info) or result in a VM error ("Cannot use JVMCI compiler: No JVMCI compiler found"). 2. Stop the VM before any application code can be executed. This is just good hygiene. This PR makes JVMCI initialization eager by default if `UseJVMCICompiler` is true. This is done for both libgraal and jargraal so that the behavior is uniform. Since jargraal is now a development configuration, VM startup costs are not critical. ------------- Commit messages: - only fail-fast for a missing JVMCI compiler on a HotSpot JIT thread - default EagerJVMCI to true if UseJVMCICompiler is true Changes: https://git.openjdk.org/jdk/pull/25121/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25121&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356447 Stats: 34 lines in 6 files changed: 31 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25121.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25121/head:pull/25121 PR: https://git.openjdk.org/jdk/pull/25121 From kvn at openjdk.org Mon May 12 20:32:56 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 12 May 2025 20:32:56 GMT Subject: Integrated: 8356192: Enable AOT code caching only on supported platforms In-Reply-To: <0jJANchQvZPwpb-L02B47hWK9eUIPoZviMEWx1a4Gpo=.d75f0315-bda0-48a7-bffe-4a3af898e24d@github.com> References: <0jJANchQvZPwpb-L02B47hWK9eUIPoZviMEWx1a4Gpo=.d75f0315-bda0-48a7-bffe-4a3af898e24d@github.com> Message-ID: On Fri, 9 May 2025 21:53:24 GMT, Vladimir Kozlov wrote: > @TheRealMDoerr reported failures in `runtime/cds/appcds` testing on PPC64 after [JDK-8350209](https://bugs.openjdk.org/browse/JDK-8350209) integration. > > AOT code caching should be limited to supported platforms: x64 and aarch64. > > Testing: GHA This pull request has now been integrated. Changeset: 2595fcc7 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/2595fcc7cc49912d8ac54803a5f74e6f0a45f06f Stats: 9 lines in 2 files changed: 9 ins; 0 del; 0 mod 8356192: Enable AOT code caching only on supported platforms Reviewed-by: shade, mdoerr, fyang ------------- PR: https://git.openjdk.org/jdk/pull/25158 From kvn at openjdk.org Mon May 12 20:39:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 12 May 2025 20:39:51 GMT Subject: RFR: 8356447: Change default for EagerJVMCI to true In-Reply-To: References: Message-ID: On Thu, 8 May 2025 14:44:55 GMT, Doug Simon wrote: > By default, JVMCI and Graal initialization only occurs on the first top-tier (i.e. tier 4) JIT compilation request. This made sense prior to libgraal where the initialization was interpreted and so noticeably contributed to VM startup. However, with libgraal the initialization is sufficiently fast to not impact startup noticeably. > > The motivation for JVMCI and Graal eager initialization by default is to make Graal command line option processing happen in the same VM phase as handling of all other VM command line flags. That is, errors in Graal options should: > 1. Happen deterministically, not just for apps that run long enough to trigger a top tier JIT compilation. For example: `java -XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT --version`. In a JDK build that does not include Graal, this may succeed (and print out the version info) or result in a VM error ("Cannot use JVMCI compiler: No JVMCI compiler found"). > 2. Stop the VM before any application code can be executed. This is just good hygiene. > > This PR makes JVMCI initialization eager by default if `UseJVMCICompiler` is true. > This is done for both libgraal and jargraal so that the behavior is uniform. Since jargraal is now a development configuration, VM startup costs are not critical. src/hotspot/share/jvmci/jvmci_globals.cpp line 91: > 89: if (FLAG_IS_DEFAULT(EagerJVMCI) && !EagerJVMCI) { > 90: FLAG_SET_DEFAULT(EagerJVMCI, true); > 91: } The default value is `false` - I don't think you need check it. You can use `FLAG_SET_ERGO_IF_DEFAULT(EagerJVMCI, true);` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25121#discussion_r2085425314 From vlivanov at openjdk.org Mon May 12 21:04:58 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 12 May 2025 21:04:58 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Sat, 10 May 2025 05:24:02 GMT, Quan Anh Mai wrote: > I think a very simple approach you can take is having CallPureNode as a pure data node It's not as simple as it seems. In order to work reliably it requires full control of the code being called, so without extra work it is appropriate for generated stubs only. If you want to call some native code VM doesn't control, then either all caller-saved registers should be preserved across the call (which may be prohibitively expensive) or it should be made explicit there's a call taking place so all ABI effects are taken into account. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2874057369 From iveresov at openjdk.org Mon May 12 21:30:12 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Mon, 12 May 2025 21:30:12 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v16] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Make the test permute through default flag values too ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24886/files - new: https://git.openjdk.org/jdk/pull/24886/files/ae332887..d75cb5dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=14-15 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From sparasa at openjdk.org Mon May 12 22:46:10 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 12 May 2025 22:46:10 GMT Subject: RFR: 8356281: Fix for TestFPComparison failure due to incorrect result [v3] In-Reply-To: References: Message-ID: > This PR fixes the cause of failure in TestFPComparison while using APX NDD instructions. > > The test passes after using this fix as shown below: > > Passed: compiler/c2/irTests/TestFPComparison.java > Test results: passed: 1 > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP > jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison.java > 1 1 0 0 0 > ============================== > TEST SUCCESS Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: add TEMP(dst) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25101/files - new: https://git.openjdk.org/jdk/pull/25101/files/ec2959b2..3fb569cf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25101&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25101&range=01-02 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25101.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25101/head:pull/25101 PR: https://git.openjdk.org/jdk/pull/25101 From sparasa at openjdk.org Mon May 12 22:46:11 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 12 May 2025 22:46:11 GMT Subject: RFR: 8356281: Fix for TestFPComparison failure due to incorrect result [v3] In-Reply-To: References: Message-ID: On Thu, 8 May 2025 16:54:10 GMT, Quan Anh Mai wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> add TEMP(dst) > > src/hotspot/cpu/x86/x86_64.ad line 6273: > >> 6271: ins_cost(200); >> 6272: format %{ "ecmovpl $dst, $src1, $src2\n\t" >> 6273: "cmovnel $dst, $src2" %} > > You need `effect(TEMP dst)` for these nodes, too. Thank you! Please see the `effect(TEMP dst)` added for all the match rules. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25101#discussion_r2085634487 From sviswanathan at openjdk.org Mon May 12 22:49:53 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 12 May 2025 22:49:53 GMT Subject: RFR: 8356281: Fix for TestFPComparison failure due to incorrect result [v3] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 22:46:10 GMT, Srinivas Vamsi Parasa wrote: >> This PR fixes the cause of failure in TestFPComparison while using APX NDD instructions. >> >> The test passes after using this fix as shown below: >> >> Passed: compiler/c2/irTests/TestFPComparison.java >> Test results: passed: 1 >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR SKIP >> jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison.java >> 1 1 0 0 0 >> ============================== >> TEST SUCCESS > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > add TEMP(dst) Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25101#pullrequestreview-2834786476 From kvn at openjdk.org Mon May 12 22:51:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 12 May 2025 22:51:55 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v4] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 20:10:13 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request incrementally with two additional commits since the last revision: > > - Remove more unused code > > Signed-off-by: Ashutosh Mehra > - Fix whitespace issue. Remove unused code. > > Signed-off-by: Ashutosh Mehra src/hotspot/share/code/aotCodeCache.hpp line 169: > 167: class Config { > 168: uint _compressedOopShift; > 169: address _compressedOopBase; Put it fist to avoid gaps between fields. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2085638902 From kvn at openjdk.org Mon May 12 23:04:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 12 May 2025 23:04:54 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v4] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 20:10:13 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request incrementally with two additional commits since the last revision: > > - Remove more unused code > > Signed-off-by: Ashutosh Mehra > - Fix whitespace issue. Remove unused code. > > Signed-off-by: Ashutosh Mehra test/hotspot/jtreg/runtime/cds/appcds/aotCode/AOTCodeCompressedOopsTest.java line 48: > 46: import jdk.test.lib.process.OutputAnalyzer; > 47: > 48: public class AOTCodeCompressedOopsTest { Ioi also suggested test example which I attached to RFE in JBS. Looks like your test is very similar and more. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2085648044 From kvn at openjdk.org Mon May 12 23:09:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 12 May 2025 23:09:55 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: <8W_FRkLbamdZ6l0Lkbn8WqXv_JXPjG-i5hBus2foor4=.4f80cd55-4141-46ff-8436-0cbbc9228461@github.com> References: <8W_FRkLbamdZ6l0Lkbn8WqXv_JXPjG-i5hBus2foor4=.4f80cd55-4141-46ff-8436-0cbbc9228461@github.com> Message-ID: On Thu, 8 May 2025 01:33:54 GMT, Ashutosh Mehra wrote: >> I think for these changes we should not use AOT code when the heap base does not match. >> Something changed in compressed oops code which prevents enforcing encoding. >> We can investigate and fix it later. > >> I think for these changes we should not use AOT code when the heap base does not match. > Something changed in compressed oops code which prevents enforcing encoding. > We can investigate and fix it later. > > @vnkozlov for this PR we are relying on having relocation for COOP base, not on enforcing encoding. And that should be able to handle cases where heap base is different in assembly vs prod. Why do you suggest to not use AOT code when the heap base does not match? @ashu-mehra, this looks good with few comments. After you address them, please merge latest jdk - I pushed small related change to limit platforms to run with AOT. After that I will submit new testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2874413950 From kvn at openjdk.org Mon May 12 23:09:56 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 12 May 2025 23:09:56 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v4] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 20:10:13 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request incrementally with two additional commits since the last revision: > > - Remove more unused code > > Signed-off-by: Ashutosh Mehra > - Fix whitespace issue. Remove unused code. > > Signed-off-by: Ashutosh Mehra src/hotspot/share/code/aotCodeCache.hpp line 374: > 372: > 373: static bool is_dumping_stubs() NOT_CDS_RETURN_(false); > 374: static bool is_using_stubs() NOT_CDS_RETURN_(false); We have singular naming (`is_dumping_stub()`) for these methods in `premain` branch. I was debating to do separate RFE for renaming in mainline or may be you can do it here. It is up to you. I did not pay attention to these when I work on adapter caching. But now I have to merge from mainline to premain and I noticed difference. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2085650099 From bkilambi at openjdk.org Mon May 12 23:24:10 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Mon, 12 May 2025 23:24:10 GMT Subject: Integrated: 8355708: Two Float16 IR tests fail after JDK-8345125 In-Reply-To: References: Message-ID: On Fri, 9 May 2025 13:17:42 GMT, Bhavana Kilambi wrote: > Two FP16 tests fail due to IR verification failure in JTREG. Increased the warmup time to 10000 to make sure it is being compiled by c2 and the expected IR is being generated. > > Testing: > Tested both the testcases with and without these options - `"-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation"` and they pass successfully on aarch64. This pull request has now been integrated. Changeset: 303f4101 Author: Bhavana Kilambi Committer: Hao Sun URL: https://git.openjdk.org/jdk/commit/303f4101d44835b9c62f46d89137ad218228c132 Stats: 5 lines in 3 files changed: 2 ins; 3 del; 0 mod 8355708: Two Float16 IR tests fail after JDK-8345125 Reviewed-by: jbhateja, haosun, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/25141 From qamai at openjdk.org Tue May 13 03:14:55 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 13 May 2025 03:14:55 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: <7e0IhYYv_1dDlLgmUM8rKj5bjDx3lIhY2PRt-fC-rTs=.35437a80-80c7-4332-9339-a6f047b73289@github.com> On Mon, 12 May 2025 21:01:34 GMT, Vladimir Ivanov wrote: >> I think a very simple approach you can take is having `CallPureNode` as a pure data node. It does not have to have anything to do with `CallNode` (no lowering into a `CallNode`, no subclass from `CallNode`) and it can have its mach implementation like this: >> >> instruct pureCall1F(xmm0 dst, xmm0 src) %{ >> match(Set dst (CallPure src)); >> effect(CALL); >> format %{ >> __ call(/*something*/); >> %} >> %} > >> I think a very simple approach you can take is having CallPureNode as a pure data node > > It's not as simple as it seems. In order to work reliably it requires full control of the code being called, so without extra work it is appropriate for generated stubs only. If you want to call some native code VM doesn't control, then either all caller-saved registers should be preserved across the call (which may be prohibitively expensive) or it should be made explicit there's a call taking place so all ABI effects are taken into account. @iwanowww I believe `effect(CALL)` marks that a call is taking place and the register allocator will know how to save the registers accordingly. Note that on arm, long division is implemented as a call: https://github.com/openjdk/jdk/blob/adebfa7ffda6383f5793278ced14a193066c5f6a/src/hotspot/cpu/arm/arm.ad#L5962 And `SharedRuntime::ldiv` is implemented in C++: https://github.com/openjdk/jdk/blob/adebfa7ffda6383f5793278ced14a193066c5f6a/src/hotspot/share/runtime/sharedRuntime.cpp#L272 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2874936879 From duke at openjdk.org Tue May 13 03:17:35 2025 From: duke at openjdk.org (Anjian-Wen) Date: Tue, 13 May 2025 03:17:35 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: <5COzHHI1_lTmKfS_8r2GGxVrOJNlHF6Hh1ACF0RgVmM=.b8546ba0-a9e4-4f38-b2f9-aa6116bd464a@github.com> > From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time > > on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below > > before the patch > `Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op > MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op > MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op > MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op > MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op > MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op > MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op > MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op > MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op > MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op > MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op > MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op > MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op > MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op > MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op > MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op > MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op > MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op > MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/op > MemorySegmentZeroUnsafe.panama false 63 avgt 30... Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: optimize mv and some order ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23890/files - new: https://git.openjdk.org/jdk/pull/23890/files/5b3fc807..85167f80 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=02-03 Stats: 165 lines in 1 file changed: 85 ins; 73 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23890/head:pull/23890 PR: https://git.openjdk.org/jdk/pull/23890 From fyang at openjdk.org Tue May 13 03:20:56 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 13 May 2025 03:20:56 GMT Subject: RFR: 8350960: RISC-V: Add riscv backend for Float16 operations - vectorization In-Reply-To: References: Message-ID: On Mon, 12 May 2025 11:35:40 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > It's a follow-up of https://github.com/openjdk/jdk/commit/9a3f9997b68a1f64e53b9711b878fb073c3c9b90. > Thanks! > > ## Test > > performance test running in progress ... Two comments after a cursory look. Thanks. src/hotspot/cpu/riscv/riscv_v.ad line 125: > 123: return UseZvfh; > 124: case Op_FmaVHF: > 125: return UseZvfh && UseFMA; Maybe group with the existing two cases at L98 and L99 (Op_VectorCastHF2F / Op_VectorCastF2HF)? src/hotspot/cpu/riscv/riscv_v.ad line 382: > 380: ins_encode %{ > 381: assert(UseZvfh, "must"); > 382: BasicType bt = Matcher::vector_element_basic_type(this); Question: What is `bt` calculated here? Seems there isn't one for HF16 in `enum BasicType` definition in file src/hotspot/share/utilities/globalDefinitions.hpp. I only see `T_FLOAT` and `T_DOUBLE`, which I don't think is usable here as we need to set SEW=16 for this instruction. src/hotspot/cpu/riscv/riscv_v.ad line 383: > 381: assert(UseZvfh, "must"); > 382: BasicType bt = Matcher::vector_element_basic_type(this); > 383: __ vsetvli_helper(bt, Matcher::vector_length(this)); Question: What is `bt` calculated here? Seems there isn't one for HF16 in `enum BasicType` definition in file src/hotspot/share/utilities/globalDefinitions.hpp. I only see `T_FLOAT` and `T_DOUBLE`, which I don't think is usable here as we need to set SEW=16 for this instruction. ------------- PR Review: https://git.openjdk.org/jdk/pull/25181#pullrequestreview-2835167722 PR Review Comment: https://git.openjdk.org/jdk/pull/25181#discussion_r2085849317 PR Review Comment: https://git.openjdk.org/jdk/pull/25181#discussion_r2085848619 PR Review Comment: https://git.openjdk.org/jdk/pull/25181#discussion_r2085848726 From iveresov at openjdk.org Tue May 13 03:31:42 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 13 May 2025 03:31:42 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v17] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/cds/filemap.hpp Co-authored-by: Dan Heidinga ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24886/files - new: https://git.openjdk.org/jdk/pull/24886/files/d75cb5dc..da4a3420 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=15-16 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From duke at openjdk.org Tue May 13 03:32:37 2025 From: duke at openjdk.org (Anjian-Wen) Date: Tue, 13 May 2025 03:32:37 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v5] In-Reply-To: References: Message-ID: > From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time > > on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below > > before the patch > `Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op > MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op > MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op > MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op > MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op > MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op > MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op > MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op > MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op > MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op > MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op > MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op > MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op > MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op > MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op > MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op > MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op > MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op > MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/op > MemorySegmentZeroUnsafe.panama false 63 avgt 30... Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: delte some useless whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23890/files - new: https://git.openjdk.org/jdk/pull/23890/files/85167f80..94ae6ad5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23890/head:pull/23890 PR: https://git.openjdk.org/jdk/pull/23890 From iveresov at openjdk.org Tue May 13 03:34:54 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 13 May 2025 03:34:54 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v14] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 17:13:55 GMT, Ioi Lam wrote: >> Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > src/hotspot/share/cds/filemap.cpp line 1955: > >> 1953: " does not equal the current SpecTrapLimitExtraEntries setting (%d).", file_type, >> 1954: _spec_trap_limit_extra_entries, SpecTrapLimitExtraEntries); >> 1955: return false; > > The `log_info(cds)` should be replaced with `MetaspaceShared::report_loading_error`. (The few `log_info` lines above this block will be fixed in [JDK-8356807](https://bugs.openjdk.org/browse/JDK-8356807)) > > Also, could you add a new jtreg test case for this? You can see examples in `negativeTests` in the existing AOTFlags.java test case. I think you can add your checks into the new AOTProfileFlags.java test. Do you want me to leave the existing `log_info` alone? Or should I fix everything in `FileMapHeader::validate()` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2085858973 From dlong at openjdk.org Tue May 13 03:47:51 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 13 May 2025 03:47:51 GMT Subject: RFR: 8353638: C2: deoptimization and re-execution cycle with StringBuilder In-Reply-To: References: Message-ID: On Fri, 9 May 2025 14:57:54 GMT, Marc Chevalier wrote: > Unlike what was assumed at first, it is quite different from [JDK-8346989](https://bugs.openjdk.org/browse/JDK-8346989). The problem is actually unrelated to `StringBuilder`, but has to do with the underlying array allocation. > > Here, the problem is that the array allocation function, that is throwing when given a negative length, causes a deopt rather than using the compiled exception handlers. This is an old workaround, and the flag `StressCompiledExceptionHandlers` to rather use compiled handlers instead of deopting was added in [JDK-8004741](https://bugs.openjdk.org/browse/JDK-8004741) in 2012. This flag is used in testing since october 2022. > > So maybe it's time to use the compiled exception handlers! I propose to turn them on by default, and instead, add a diagnostic flag to deopt instead, in case something goes wrong. Doing so improve the performance to match the ones of C1 (both for direct array allocation, and `StringBuilder` construction). For instance, with the case given in the JBS issue: > > Stop at level 0 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,277s > user 0m4,214s > sys 0m0,117s > > Stop at level 1 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,104s > user 0m4,079s > sys 0m0,106s > > Stop at level 2 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,308s > user 0m4,239s > sys 0m0,145s > > Stop at level 3 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,304s > user 0m4,247s > sys 0m0,132s > > Default (Stop at level 4) > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,086s > user 0m4,059s > sys 0m0,122s > > > > I've run some tests (up to tier10), it seems all fine, ignoring the usual noise. I've checked with @dougxc, it shouldn't impact Graal as it doesn't use `OptoRuntime`. This may be a dumb question, but why would we want to optimize throwing NegativeArraySizeException? Maybe I'm missing something, but I would be surprised if any real apps would be helped by this fix. I know we have some optimizations for exception throwing, but I would expect those optimizations to be most useful for unavoidable exceptions, like NoClassDefFoundError. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25149#issuecomment-2874973027 From duke at openjdk.org Tue May 13 03:48:35 2025 From: duke at openjdk.org (Anjian-Wen) Date: Tue, 13 May 2025 03:48:35 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v6] In-Reply-To: References: Message-ID: > From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time > > on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below > > before the patch > `Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op > MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op > MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op > MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op > MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op > MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op > MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op > MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op > MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op > MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op > MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op > MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op > MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op > MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op > MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op > MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op > MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op > MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op > MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/op > MemorySegmentZeroUnsafe.panama false 63 avgt 30... Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: fix some format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23890/files - new: https://git.openjdk.org/jdk/pull/23890/files/94ae6ad5..50df6444 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=04-05 Stats: 190 lines in 1 file changed: 90 ins; 99 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23890/head:pull/23890 PR: https://git.openjdk.org/jdk/pull/23890 From duke at openjdk.org Tue May 13 05:52:32 2025 From: duke at openjdk.org (Anjian-Wen) Date: Tue, 13 May 2025 05:52:32 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v7] In-Reply-To: References: Message-ID: > From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time > > on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below > > before the patch > `Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op > MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op > MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op > MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op > MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op > MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op > MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op > MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op > MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op > MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op > MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op > MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op > MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op > MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op > MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op > MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op > MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op > MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op > MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/op > MemorySegmentZeroUnsafe.panama false 63 avgt 30... Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: change the unsafe parm to false ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23890/files - new: https://git.openjdk.org/jdk/pull/23890/files/50df6444..918bf5aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23890/head:pull/23890 PR: https://git.openjdk.org/jdk/pull/23890 From chagedorn at openjdk.org Tue May 13 06:10:52 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 13 May 2025 06:10:52 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases In-Reply-To: References: Message-ID: <5K47IsqYMjuNzxd9ZyBuejxLNpw_y9F0-SaVAEFY3yo=.b41fb57a-78a8-425d-b405-319a13800b0a@github.com> On Mon, 12 May 2025 13:40:54 GMT, Roberto Casta?eda Lozano wrote: > Thanks for working on the Manuel, looks very useful! Have you considered using the Unified Logging (UL) instead of creating a new JVM flag for this? We already have `-Xlog:jit+compilation` that seems related to this. You might print the compile phase information with e.g. `-Xlog:jit+compilation=trace`, or add a new UL tag if necessary. > > We want to move towards using the UL framework in the JVM compiler components, now that the preparation work by @anton-seoane is completed. UL is definitely the long-term solution. But given that we have more levels with this new flag (-1 to 6) than UL provides (trace, info, etc.), how could we do it with UL? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25183#issuecomment-2875160148 From epeter at openjdk.org Tue May 13 06:10:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 May 2025 06:10:53 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Mon, 12 May 2025 14:46:22 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Thanks for looking at this PR, Emanuel! > >> It is a limitation that we require the first operation to be the memory access. But the alternative would probably be significantly more complicated, i.e. to track the location of all the memory locations. > > Right, I have prototyped this alternative in the wider context of [JDK-8344627](https://bugs.openjdk.org/browse/JDK-8344627) since it would be required for using writes as implicit null checks (both in ZGC and G1), and it indeed adds some complexity to `PhaseOutput` and other places (see https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-implicit-null-checks). I ran some preliminary experiments and could not see enough benefits to justify the additional complexity. > >> In our offline discussion, I had some hesitation about the case where the load is at the beginning, but the barrier may have more loads. I wondered: what if the first load does not trigger the NullPointerException, but a later load then encounters the null pointer. > > This cannot happen because the address we are loading from is constant through the barrier, see e.g. the code generated for a zLoadP in x64 (AT&T syntax): > > > 0x00007514c47d6aa0: movq 0x10(%rsi), %rax ; main OOP load with implicit exception: dispatches to 0x00007514c47d6abe > 0x00007514c47d6aa4: shrq $0xd, %rax ; uncolor, destroys the OOP loaded in %rax > 0x00007514c47d6aa8: ja 0x36 ; jump to barrier stub (slow path) > > (...) > > 0x00007514c47d6abe: trigger uncommon trap (null_check) > > (...) > > barrier stub (slow path): > 0x00007514c47d6ae4: movq 0x10(%rsi), %rax ; re-load OOP that was destroyed by uncoloring > (...) ; call into runtime (ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded(oopDesc*, oop*)) > 0x00007514c47d6b09: jmp -0x5d ; go back to main code section > > > Note how the address we might fault on (triggering the implicit exception) is stored on `%rsi` (base address) + `0x10` (field offset), which is not changed between the main load and the slow-path reload. > >> I think I was also worried that we would re-load the pointer itself. Then the old pointer may be non-null, but once we load the pointer again it may be null because another thread changed the reference. But now I thought about that again: that would really violate the Java Memory Model, you cannot duplicate the load of the pointer. So I suppose rather we got the old pointer from somewhere, and then we check if that old pointer ... @robcasloz Thanks for the explanations! I have no idea how the GC barriers work, and what addresses they load from. So I just had a list of questions run through my mind, about what could possibly go wrong. But the questions are more speculations, because I really have no idea what the GC barriers do. I think I need to have a look at the GC barrier code myself, to see which things are constant and which things can be mutated (possibly by another thread). What code / documentation do you recommend I look at? Ideally, we would have some sort of semi-formal proof, to guarantee that if we did ever encounter a null-pointer, we would have to encounter it already on that first load. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2875161021 From epeter at openjdk.org Tue May 13 06:19:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 May 2025 06:19:52 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: <_fLiVC2_bMj4oQ8k1__Y07Eyl-vAE4JrdjWbTfIR5QU=.94c2bfde-72ce-4db0-9d62-7b87c5067779@github.com> On Tue, 6 May 2025 13:28:28 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). If I understand your statements above correctly: The first load and any subsequent loads are all from the **exact same** address. Hence, if any were null-pointer, the first one has to be a null-pointer. Assuming this is correct, it seems that this follows: Assuming the pointer is not a null-pointer, then wherever it points to cannot be moved by the GC. In your example code above, `0x10(%rsi)` is the address, and presumably `rsi` refers to the base of some object, and `0x10` is the offset to a field. The object that `rsi` points to can thus not be moved by the GC, correct? But the object that the field at offset `0x10` points to may have been moved, and that is why we check its coloring, and then re-load from that field later. Does that sound correct to you? What guarantees that the object associated with `rsi` is not moved by the GC? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2875176654 From dnsimon at openjdk.org Tue May 13 06:52:27 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 13 May 2025 06:52:27 GMT Subject: RFR: 8356447: Change default for EagerJVMCI to true [v2] In-Reply-To: References: Message-ID: <4rohFHtNW1xFl9DQ47qqySsYnYxtfrO7-UZ--L3CRmA=.06aa514d-1846-47ae-b7bd-7535bed88fcb@github.com> > By default, JVMCI and Graal initialization only occurs on the first top-tier (i.e. tier 4) JIT compilation request. This made sense prior to libgraal where the initialization was interpreted and so noticeably contributed to VM startup. However, with libgraal the initialization is sufficiently fast to not impact startup noticeably. > > The motivation for JVMCI and Graal eager initialization by default is to make Graal command line option processing happen in the same VM phase as handling of all other VM command line flags. That is, errors in Graal options should: > 1. Happen deterministically, not just for apps that run long enough to trigger a top tier JIT compilation. For example: `java -XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT --version`. In a JDK build that does not include Graal, this may succeed (and print out the version info) or result in a VM error ("Cannot use JVMCI compiler: No JVMCI compiler found"). > 2. Stop the VM before any application code can be executed. This is just good hygiene. > > This PR makes JVMCI initialization eager by default if `UseJVMCICompiler` is true. > This is done for both libgraal and jargraal so that the behavior is uniform. Since jargraal is now a development configuration, VM startup costs are not critical. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: use FLAG_SET_ERGO_IF_DEFAULT ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25121/files - new: https://git.openjdk.org/jdk/pull/25121/files/42c351b5..ad4be5dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25121&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25121&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25121.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25121/head:pull/25121 PR: https://git.openjdk.org/jdk/pull/25121 From rcastanedalo at openjdk.org Tue May 13 07:01:51 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 07:01:51 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases In-Reply-To: <5K47IsqYMjuNzxd9ZyBuejxLNpw_y9F0-SaVAEFY3yo=.b41fb57a-78a8-425d-b405-319a13800b0a@github.com> References: <5K47IsqYMjuNzxd9ZyBuejxLNpw_y9F0-SaVAEFY3yo=.b41fb57a-78a8-425d-b405-319a13800b0a@github.com> Message-ID: On Tue, 13 May 2025 06:08:08 GMT, Christian Hagedorn wrote: > > Thanks for working on the Manuel, looks very useful! Have you considered using the Unified Logging (UL) instead of creating a new JVM flag for this? We already have `-Xlog:jit+compilation` that seems related to this. You might print the compile phase information with e.g. `-Xlog:jit+compilation=trace`, or add a new UL tag if necessary. > > We want to move towards using the UL framework in the JVM compiler components, now that the preparation work by @anton-seoane is completed. > > UL is definitely the long-term solution. But given that we have more levels with this new flag (-1 to 6) than UL provides (trace, info, etc.), how could we do it with UL? Good point, I guess we would have to remap the IGV print levels to the (fewer) UL logging levels. I think that would be OK, we probably do not need that many different print levels for IGV anyway. But I am also OK with adding a new JVM flag in the context of this RFE and revisiting it when migrating to UL. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25183#issuecomment-2875272920 From enikitin at openjdk.org Tue May 13 07:24:14 2025 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Tue, 13 May 2025 07:24:14 GMT Subject: RFR: 8356702: CTW: Update modules [v2] In-Reply-To: <5_pxWyLzGtPZEDsJKkq6i5wFIemDsY-OeXTgkVO_kuk=.ed16944a-2e41-4c19-a27c-6c1a8269da42@github.com> References: <5_pxWyLzGtPZEDsJKkq6i5wFIemDsY-OeXTgkVO_kuk=.ed16944a-2e41-4c19-a27c-6c1a8269da42@github.com> Message-ID: <3s1wwqmKLRzxZw2FL-s48FvSEcmzlYle_qZBn7YYvOE=.a09bdd82-7186-405a-a076-a2934b9bb3b3@github.com> > This PR enhances CTW test wrappers generator in order to make it more user-friendly. Added features are: > > 1. Automatic scanning for modules list under `open/src` > 2. Automatic recognition of current year; > 3. Multi-wrapper modules support (allows for splitting huge modules into 2 and more wrappers) > 4. ability to exclude modules; > > The updated generator have been used to refresh JTReg module wrappers. > The most meaningful change is contained in the `generate.bash` > Testing: `open/test/hotspot/jtreg/applications/ctw/modules` with the supported platforms, no failures spotted. Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: Revert "Update modified wrappers" This reverts commit d7122ccbf3b03a3c43917656ad209624910f6230. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25175/files - new: https://git.openjdk.org/jdk/pull/25175/files/d7122ccb..13a0b97b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25175&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25175&range=00-01 Stats: 68 lines in 67 files changed: 1 ins; 0 del; 67 mod Patch: https://git.openjdk.org/jdk/pull/25175.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25175/head:pull/25175 PR: https://git.openjdk.org/jdk/pull/25175 From enikitin at openjdk.org Tue May 13 07:24:14 2025 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Tue, 13 May 2025 07:24:14 GMT Subject: RFR: 8356702: CTW: Update modules [v2] In-Reply-To: References: <5_pxWyLzGtPZEDsJKkq6i5wFIemDsY-OeXTgkVO_kuk=.ed16944a-2e41-4c19-a27c-6c1a8269da42@github.com> Message-ID: <4THPYSmKj3S1xAxKq7kspwHZckApuUJ7FVfjQJUnbW8=.a29583e4-6917-46dc-b579-c861a48410d6@github.com> On Mon, 12 May 2025 18:32:08 GMT, Leonid Mesnik wrote: >> Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert "Update modified wrappers" >> >> This reverts commit d7122ccbf3b03a3c43917656ad209624910f6230. > > test/hotspot/jtreg/applications/ctw/modules/java_base.java line 2: > >> 1: /* >> 2: * Copyright (c) 2017, 2025, Oracle and/or its affiliates. All rights reserved. > > It is not needed to update copyrights if nothing was changed in this file. Please just remove such files from commit. Reverted 'em. Please check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25175#discussion_r2086103335 From iklam at openjdk.org Tue May 13 07:45:07 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 13 May 2025 07:45:07 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v14] In-Reply-To: References: Message-ID: On Tue, 13 May 2025 03:32:13 GMT, Igor Veresov wrote: >> src/hotspot/share/cds/filemap.cpp line 1955: >> >>> 1953: " does not equal the current SpecTrapLimitExtraEntries setting (%d).", file_type, >>> 1954: _spec_trap_limit_extra_entries, SpecTrapLimitExtraEntries); >>> 1955: return false; >> >> The `log_info(cds)` should be replaced with `MetaspaceShared::report_loading_error`. (The few `log_info` lines above this block will be fixed in [JDK-8356807](https://bugs.openjdk.org/browse/JDK-8356807)) >> >> Also, could you add a new jtreg test case for this? You can see examples in `negativeTests` in the existing AOTFlags.java test case. I think you can add your checks into the new AOTProfileFlags.java test. > > Do you want me to leave the existing `log_info` alone? Or should I fix everything in `FileMapHeader::validate()` ? You can leave the existing code and just fix the new code you added. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2086143903 From mchevalier at openjdk.org Tue May 13 08:02:04 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 13 May 2025 08:02:04 GMT Subject: RFR: 8353638: C2: deoptimization and re-execution cycle with StringBuilder In-Reply-To: References: Message-ID: On Fri, 9 May 2025 14:57:54 GMT, Marc Chevalier wrote: > Unlike what was assumed at first, it is quite different from [JDK-8346989](https://bugs.openjdk.org/browse/JDK-8346989). The problem is actually unrelated to `StringBuilder`, but has to do with the underlying array allocation. > > Here, the problem is that the array allocation function, that is throwing when given a negative length, causes a deopt rather than using the compiled exception handlers. This is an old workaround, and the flag `StressCompiledExceptionHandlers` to rather use compiled handlers instead of deopting was added in [JDK-8004741](https://bugs.openjdk.org/browse/JDK-8004741) in 2012. This flag is used in testing since october 2022. > > So maybe it's time to use the compiled exception handlers! I propose to turn them on by default, and instead, add a diagnostic flag to deopt instead, in case something goes wrong. Doing so improve the performance to match the ones of C1 (both for direct array allocation, and `StringBuilder` construction). For instance, with the case given in the JBS issue: > > Stop at level 0 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,277s > user 0m4,214s > sys 0m0,117s > > Stop at level 1 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,104s > user 0m4,079s > sys 0m0,106s > > Stop at level 2 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,308s > user 0m4,239s > sys 0m0,145s > > Stop at level 3 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,304s > user 0m4,247s > sys 0m0,132s > > Default (Stop at level 4) > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,086s > user 0m4,059s > sys 0m0,122s > > > > I've run some tests (up to tier10), it seems all fine, ignoring the usual noise. I've checked with @dougxc, it shouldn't impact Graal as it doesn't use `OptoRuntime`. I agree it's a corner case and it should matter little in real life. Nevertheless, having worst performance in C2 than in C1 is never very nice looking. But to be honest, if the problem were more involved, and the fix more complicated, it's not clear to me whether it would be worth it just for that, indeed. BUT! This fix should also speedup any exception that happens during allocation. There seem to be quite some `THROW_*` in instanceKlass.cpp, for instance. And this means also for `NoClassDefFoundError`! `OptoRuntime::new_instance_C` ends with `deoptimize_caller_frame` so would benefit and we have the call chain: `OptoRuntime::new_instance_C` -> `InstanceKlass::initialize` -> `InstanceKlass::initialize_impl` which contains THROW_MSG(vmSymbols::java_lang_NoClassDefFoundError(), ss.as_string()); ------------- PR Comment: https://git.openjdk.org/jdk/pull/25149#issuecomment-2875442142 From rcastanedalo at openjdk.org Tue May 13 08:40:52 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 08:40:52 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: <_fLiVC2_bMj4oQ8k1__Y07Eyl-vAE4JrdjWbTfIR5QU=.94c2bfde-72ce-4db0-9d62-7b87c5067779@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> <_fLiVC2_bMj4oQ8k1__Y07Eyl-vAE4JrdjWbTfIR5QU=.94c2bfde-72ce-4db0-9d62-7b87c5067779@github.com> Message-ID: On Tue, 13 May 2025 06:16:53 GMT, Emanuel Peter wrote: > If I understand your statements above correctly: > The first load and any subsequent loads are all from the exact same address. Hence, if any were null-pointer, the first one has to be a null-pointer. Right. > Assuming this is correct, it seems that this follows: > Assuming the pointer is not a null-pointer, then wherever it points to cannot be moved by the GC. In your example code above, 0x10(%rsi) is the address, and presumably rsi refers to the base of some object, and 0x10 is the offset to a field. The object that rsi points to can thus not be moved by the GC, correct? But the object that the field at offset 0x10 points to may have been moved, and that is why we check its coloring, and then re-load from that field later. Does that sound correct to you? What guarantees that the object associated with rsi is not moved by the GC? The inner workings of ZGC's guarantee that "root" addresses such as `%rsi` remain valid ("have a good color" in ZGC speak), but I am afraid I cannot offer a more detailed explanation. You may find more information in e.g. [1] (even though it is outdated by now as it describes non-generational ZGC), or perhaps some GC engineer may chime into the discussion and offer more detail? In any case, to convince ourselves of the correctness of this RFE without needed to dive deep into ZGC internals, maybe it is enough to ensure that we preserve the same behavior as in mainline (where `zLoadP` cannot be used for implicit null checks). Here is how the compiled code looks for the above example before and after this change: # Before the RFE (explicit null check): testq %rsi, %rsi ; explicit null check on the base address je #uncommon_trap block movq 0x10(%rsi), %rax ; main OOP load shrq $0xd, %rax ; uncolor, destroys the OOP loaded in %rax ja #slow_barrier_path continue: (...) slow_barrier_path: movq 0x10(%rsi), %rax ; re-load OOP that was destroyed by uncoloring (...) ; call into runtime (ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded(oopDesc*, oop*)) jmp #continue # After the RFE (implicit null check): movq 0x10(%rsi), %rax ; main OOP load with implicit exception: dispatches to #uncommon_trap block shrq $0xd, %rax ; uncolor, destroys the OOP loaded in %rax ja #slow_barrier_path continue: (...) slow_barrier_path: movq 0x10(%rsi), %rax ; re-load OOP that was destroyed by uncoloring (...) ; call into runtime (ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded(oopDesc*, oop*)) jmp #continue As you can see, both cases rely on the same assumptions about the validity of `%rsi` through the execution of the compiled code. [1] Albert Mingkun Yang and Tobias Wrigstad. Deep Dive into ZGC: A Modern Garbage Collector in OpenJDK. In ACM TOPLAS, 2022. https://doi.org/10.1145/3538532 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2875559501 From thartmann at openjdk.org Tue May 13 08:53:57 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 13 May 2025 08:53:57 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on Cortex-A53 In-Reply-To: References: Message-ID: <1P_0NKxAumQ7K6LwbtQZLJ8Kx_SXUFz5QyiE2V4WmM0=.1d8e9832-d257-418f-863d-3f8d6f228aeb@github.com> On Mon, 7 Apr 2025 12:39:40 GMT, Aleksei Voitylov wrote: > The root of the problem is that VectorizedHashCode intrinsic introduced by JDK-8341194 is not aware of JDK-8079203. JDK-8079203 generates additional nop with madd instruction on Cortex-A53 as a workaround for Cortex-A53 erratum 835769 "AArch64 multiply-accumulate instruction might produce incorrect result". Current VectorizedHashCode intrinsic calculates byte offset to jump inside the unrolled loop code. It assumes 2 instructions per each unrolled iteration (load and madd). JDK-8079203 adds additional nop for Cortex-A53, which breaks offset calculation logic. > ? > Current offset calculation logic is using shift instead of multiplication, power-of-2 number instructions are present in each unrolled loop iteration. To keep it simple, this fix adds one more nop into each loop iteration on Cortex-A53 in order to have 4 instruction per iteration, which is also a power-of-2. To account for that, the shift argument for offset calculation logic is increased by 1, because each loop iteration has 2 times more instructions on Cortex-A53. > ? > This fix is tested on Raspberry Pi 3 (based on Cortex-A53) by running initially reported application and by running hotspot jtreg tests (not a single test could be run on Cortex-A53 before the fix). After the fix, the specialized test hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java passes. > > The performance gain from the intrinsic is also observed on Cortex-A53 using the ArraysHashCode benchmark. This seems to be a bit stuck and since we are getting closer to RDP 1 for JDK 25 (June 05, 2025), would it make sense to integrate the patch as-is and file a follow-up RFE to investigate potential other issues? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2875600362 From thartmann at openjdk.org Tue May 13 08:54:56 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 13 May 2025 08:54:56 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 15:15:54 GMT, Roland Westrelin wrote: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Just wondering, since we are getting closer to RDP 1 for JDK 25 (June 05, 2025), should we defer this to JDK 26? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-2875595693 From roland at openjdk.org Tue May 13 08:54:57 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 13 May 2025 08:54:57 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs In-Reply-To: References: Message-ID: <6B11oNmi2IR3aoA4oAg8Vpub0qkOGHHHpeDpSgUCVwY=.1f8e76f7-5816-458e-a5c6-c164f0cc8f54@github.com> On Tue, 13 May 2025 08:49:56 GMT, Tobias Hartmann wrote: > Just wondering, since we are getting closer to RDP 1 for JDK 25 (June 05, 2025), should we defer this to JDK 26? Deferring makes sense. This is a corner case anyway. I've been reworking the patch and it's getting more complicated so it will likely need more time for reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-2875602822 From rcastanedalo at openjdk.org Tue May 13 08:55:55 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 08:55:55 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 13 May 2025 06:08:38 GMT, Emanuel Peter wrote: > I think I need to have a look at the GC barrier code myself, to see which things are constant and which things can be mutated (possibly by another thread). What code / documentation do you recommend I look at? Regarding code, I recommend you starting [here](https://github.com/openjdk/jdk/blob/522c7b446fef17a8400bc589c55b161e939770cc/src/hotspot/cpu/x86/gc/z/z_x86_64.ad#L126-L129) and following `z_load_barrier`. The slow barrier path is generated in a stub [here](https://github.com/openjdk/jdk/blob/522c7b446fef17a8400bc589c55b161e939770cc/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp#L1217-L1235). Regarding documentation, you might have a look at the [TOPLAS paper](https://dl.acm.org/doi/full/10.1145/3538532) (which is unfortunately a bit outdated because it only covers non-generational ZGC, but might still offer some intuition that is valid for the latest ZGC version, in particular regarding concurrent relocation and load barriers), the [Generational ZGC JEP](https://openjdk.org/jeps/439), or one of the numerous presentations available on YouTube (e.g. I found the overview in https://www.youtube.com/watch?v=YyXjC68l8mw&t=864s pretty useful). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2875604527 From thartmann at openjdk.org Tue May 13 08:57:52 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 13 May 2025 08:57:52 GMT Subject: RFR: 8354282: C2: more crashes in compiled code because of dependency on removed range check CastIIs In-Reply-To: References: Message-ID: On Thu, 10 Apr 2025 15:15:54 GMT, Roland Westrelin wrote: > This is a variant of 8332827. In 8332827, an array access becomes > dependent on a range check `CastII` for another array access. When, > after loop opts are over, that RC `CastII` was removed, the array > access could float and an out of bound access happened. With the fix > for 8332827, RC `CastII`s are no longer removed. > > With this one what happens is that some transformations applied after > loop opts are over widen the type of the RC `CastII`. As a result, the > type of the RC `CastII` is no longer narrower than that of its input, > the `CastII` is removed and the dependency is lost. > > There are 2 transformations that cause this to happen: > > - after loop opts are over, the type of the `CastII` nodes are widen > so nodes that have the same inputs but a slightly different type can > common. > > - When pushing a `CastII` through an `Add`, if of the type both inputs > of the `Add`s are non constant, then we end up widening the type > (the resulting `Add` has a type that's wider than that of the > initial `CastII`). > > There are already 3 types of `Cast` nodes depending on the > optimizations that are allowed. Either the `Cast` is floating > (`depends_only_test()` returns `true`) or pinned. Either the `Cast` > can be removed if it no longer narrows the type of its input or > not. We already have variants of the `CastII`: > > - if the Cast can float and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and be removed when it doesn't narrow the type > of its input. > > - if the Cast is pinned and can't be removed when it doesn't narrow > the type of its input. > > What we need here, I think, is the 4th combination: > > - if the Cast can float and can't be removed when it doesn't narrow > the type of its input. > > Anyway, things are becoming confusing with all these different > variants named in ways that don't always help figure out what > constraints one of them operate under. So I refactored this and that's > the biggest part of this change. The fix consists in marking `Cast` > nodes when their type is widen in a way that prevents them from being > optimized out. > > Tobias ran performance testing with a slightly different version of > this change and there was no regression. Sounds good, I'll defer it to JDK 26 then. Thanks for the quick reply! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24575#issuecomment-2875610362 From mhaessig at openjdk.org Tue May 13 09:11:08 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 13 May 2025 09:11:08 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases [v5] In-Reply-To: References: Message-ID: > This PR introduces the flag `-XX:PrintPhaseLevel` that works like the flag `-XX:PrintIdealGraphLevel` and prints the name phases of a C2 compilation (essentially what we have in the left side bar in IGV) to the terminal. This allows redirecting the output to a file and comparing phase decisions between two compilations. Further, it is useful in conjunction with loop opts tracing to immediately see in which phase a certain optimization happened. > >
> Output with `-XX:PrintPhaseLevel=2` > > >> java-fastdebug -Xbatch -XX:CompileCommand=compileonly,TestLoop.test10 -XX:CompileCommand=printcompilation,TestLoop.test* -XX:PrintPhaseLevel=2 TestLoop.java > CompileCommand: compileonly TestLoop.test10 bool compileonly = true > CompileCommand: PrintCompilation TestLoop.test* bool PrintCompilation = true > 3577 98 % b 3 TestLoop::test10 @ 2 (64 bytes) > 3584 99 b 3 TestLoop::test10 (64 bytes) > 3648 100 % b 4 TestLoop::test10 @ 2 (64 bytes) > 1. After Parsing > 2. Iter GVN 1 > 3. Incremental Inline > 4. Incremental Boxing Inline > 5. Before Loop Optimizations > 6. PhaseIdealLoop 1 > 7. PhaseIdealLoop 2 > 8. PhaseIdealLoop 3 > 9. Before PhaseCCP 1 > 10. PhaseCCP 1 > 11. Iter GVN 2 > 12. PhaseIdealLoop iterations > 13. After Loop Optimizations > 14. After Macro Expansion > 15. Barrier expand > 16. Optimize finished > 17. Before matching > 18. After matching > 19. Global code motion > 20. Register Allocation > 21. Final Code > 3668 103 b 4 TestLoop::test10 (64 bytes) > 1. After Parsing > 2. Iter GVN 1 > 3. Incremental Inline > 4. Incremental Boxing Inline > 5. Before Loop Optimizations > 6. PhaseIdealLoop 1 > 7. PhaseIdealLoop 2 > 8. PhaseIdealLoop 3 > 9. Before PhaseCCP 1 > 10. PhaseCCP 1 > 11. Iter GVN 2 > 12. PhaseIdealLoop iterations > 13. PhaseIdealLoop iterations 2 > 14. PhaseIdealLoop iterations 3 > 15. PhaseIdealLoop iterations 4 > 16. PhaseIdealLoop iterations 5 > 17. PhaseIdealLoop iterations 6 > 18. PhaseIdealLoop iterations 7 > 19. PhaseIdealLoop iterations 8 > 20. PhaseIdealLoop iterations 9 > 21. After Loop Optimizations > 22. After Macro Expansion > 23. Barrier expand > 24. Optimize finished > 25. Before matching > 26. After matching > 27. Global code motion > 28. Register Allocation > 29. Final Code > >
> >
> Output with `-XX:PrintPhaseLevel=2` in conjunction with loo... Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Move functions into !PRODUCT ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25183/files - new: https://git.openjdk.org/jdk/pull/25183/files/a1b120f6..06b9d49a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25183&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25183&range=03-04 Stats: 26 lines in 2 files changed: 5 ins; 18 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25183.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25183/head:pull/25183 PR: https://git.openjdk.org/jdk/pull/25183 From mhaessig at openjdk.org Tue May 13 09:13:44 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 13 May 2025 09:13:44 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases [v4] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 16:28:35 GMT, Vladimir Kozlov wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> No tty lock needed for one print > > src/hotspot/share/opto/compile.hpp line 671: > >> 669: bool should_print_igv(int level); >> 670: bool should_print_phase(int level) const; >> 671: bool should_print_ideal_phase(CompilerPhaseType cpt) const; > > You can use macro `PRODUCT_RETURN_(return false;);` and put both methods under `#ifndef PRODUCT` in .cpp file. Thanks for pointing this out. I actually realized that these functions only have callers from `NOT_PRODUCT` and they implement functionality for development flags. So instead of your suggestion, I moved everything into `#ifndef PRODUCT`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25183#discussion_r2086322775 From adinn at openjdk.org Tue May 13 09:20:54 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 13 May 2025 09:20:54 GMT Subject: RFR: 8356774: AArch64: StubGen final stubs buffer too small for ZGC on Cavium CPU In-Reply-To: References: Message-ID: On Tue, 13 May 2025 08:57:14 GMT, Aleksey Shipilev wrote: > I am a bit confused: does the same logic we used in [JDK-8356085](https://bugs.openjdk.org/browse/JDK-8356085) [fix](https://github.com/openjdk/jdk/commit/daf6fa1e6153d3fdf48ef0840790794e57349c38) applies here as well? E.g. should this final stub size even depend on ZGC build-time presence at all? Before cleanup different arches used to generate memory copy stubs in either the compiler or final blobs but it was not uniform. After cleanup all arches generate them in the final blob. So, this has necessitated two different adjustments: 1) The fix for JDK-8356085 removed ZGC-specific space from the compiler blob size and moved it to the default allocation. There was no need for a special ZGC increment because none of the stubs include code generated by the barrier-set assembler. However, building with ZGC excluded revealed that the default allocation was too small for the normal requirement and only worked because of ZGC-derived slop. So, rather than remove the ZGC allocation it was added to the default. 2) This PR addresses insufficient storage for copy stubs i.e. changes the final blob size. It does not need to change the default allocation since that is sufficient for a build without ZGC on any HW, including Cavium ThunderX (you can verify that if you run a build that does include ZGC and specify -Xlog:stubs -XX:-UseZGC). The problem is that the extra space added to cater for ZGC is big enough on most hardware but not on ThunderX. That's because the array copy routines on ThunderX include code to handle unaligned copies which other arches do not inject. With ZGC enabled this means a lot more barrier code which blows the extra ZGX-allocated budget. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25203#issuecomment-2875681919 From duke at openjdk.org Tue May 13 09:28:59 2025 From: duke at openjdk.org (kuaiwei) Date: Tue, 13 May 2025 09:28:59 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v12] In-Reply-To: References: Message-ID: <6KvzwYs_3tFO6apkt-DUklIqUF1i4F0d8CMgPVROEFo=.5b324f6d-c2e4-43e0-9fc4-bcddc12e8a63@github.com> On Thu, 17 Apr 2025 10:07:08 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: >> >> - Merge remote-tracking branch 'origin/master' into dev/merge_loads >> - Remove unused code >> - Move code to addnode.cpp and add more tests >> - Merge remote-tracking branch 'origin/master' into dev/merge_loads >> - Fix test >> - Add more tests >> - Enable StressIGVN and riscv platform >> - Change tests as review comments >> - Fix test failure and change for review comments >> - Revert extract value and add more tests >> - ... and 4 more: https://git.openjdk.org/jdk/compare/660b17a6...f6518b26 > > src/hotspot/share/opto/addnode.cpp line 946: > >> 944: * | ((UNSAFE.getByte(array, address + 3) & 0xff) << 24); >> 945: */ >> 946: bool MergePrimitiveLoads::is_merged_load_candidate() const { > > What is your definition of candidate here? It seems to have something to do about not having a `LShift`, why? Maybe you can give a good definition here or somewhere else? Comments is updated. Only the last one for combine operator chain is the candidate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2086352141 From adinn at openjdk.org Tue May 13 09:32:56 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 13 May 2025 09:32:56 GMT Subject: RFR: 8356774: AArch64: StubGen final stubs buffer too small for ZGC on Cavium CPU In-Reply-To: References: Message-ID: On Tue, 13 May 2025 08:47:37 GMT, Andrew Dinn wrote: > Increased final stubs buffer size to allow extra space when running with ZGC enabled on Cavium ThunderX. Cavium is special because it generates stub code to handle unaligned copies. With ZGC enabled this implies a lot more injected barrier code. n.b. we could always allocate enough storage for ZGC as we do for every other GC. However, that wastes a lot of space when building without ZGC. This is not an uncommon case: the jmod binary used during the build omits ZGC (which is how we found the problem when cross-compiling). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25203#issuecomment-2875727614 From duke at openjdk.org Tue May 13 09:36:58 2025 From: duke at openjdk.org (kuaiwei) Date: Tue, 13 May 2025 09:36:58 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v15] In-Reply-To: References: Message-ID: <8yy6SHrbrMjIoOKxIiIZnzOPc0UiSrKH8bgcRw9xHdU=.1580bede-5838-4467-beb4-1b6d065df4e6@github.com> On Fri, 2 May 2025 09:50:35 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix build error on mac and windows > > src/hotspot/share/opto/addnode.cpp line 1041: > >> 1039: Node* oper = _combine; >> 1040: NOT_PRODUCT(int steps = 0;) // prevent dead loop in bad graph >> 1041: while (load == nullptr NOT_PRODUCT(&& steps < 30)) { > > And just saw this when flying by. > What "bad graph" is this? What is the "dead loop" here? > What happens in product, since there you don't have this check? It should not enter a dead loop if we go up with the combine operators. Except there's an invalid graph which child node use ancestor nodes. I didn't see this 'bad graph' in my tests. The code is like an assertion in debug mode. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2086368521 From amitkumar at openjdk.org Tue May 13 09:38:54 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 13 May 2025 09:38:54 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: <2NUKCBO7aaoQYPLVWn_rJ4nL28qtgm1OqeD6Zhil2mQ=.f5eca835-22bf-44c1-a2e1-71bdf1cd9401@github.com> <1TYgAXK73h2YE6-vEvg1wKEmLiqrl88fa5OiSkPu0qU=.0050c295-0bb9-4a2e-a81f-fcb08e24efe5@github.com> Message-ID: On Mon, 12 May 2025 09:07:29 GMT, Martin Doerr wrote: > > If we don't go ahead with mvc, then we are seeing regression, as you have noticed in the previous result. > > Are these corner cases relevant at all? I am not sure about that. But the hit was significant in case of 255 & 256 byte. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2875758855 From shade at openjdk.org Tue May 13 09:39:50 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 May 2025 09:39:50 GMT Subject: RFR: 8356774: AArch64: StubGen final stubs buffer too small for ZGC on Cavium CPU In-Reply-To: References: Message-ID: On Tue, 13 May 2025 08:47:37 GMT, Andrew Dinn wrote: > Increased final stubs buffer size to allow extra space when running with ZGC enabled on Cavium ThunderX. Cavium is special because it generates stub code to handle unaligned copies. With ZGC enabled this implies a lot more injected barrier code. All right then! Let's do this version. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25203#pullrequestreview-2835996263 From adinn at openjdk.org Tue May 13 09:44:56 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 13 May 2025 09:44:56 GMT Subject: RFR: 8356774: AArch64: StubGen final stubs buffer too small for ZGC on Cavium CPU In-Reply-To: References: Message-ID: On Tue, 13 May 2025 08:47:37 GMT, Andrew Dinn wrote: > Increased final stubs buffer size to allow extra space when running with ZGC enabled on Cavium ThunderX. Cavium is special because it generates stub code to handle unaligned copies. With ZGC enabled this implies a lot more injected barrier code. Thanks for the review, Aleksey. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25203#issuecomment-2875774902 From adinn at openjdk.org Tue May 13 09:44:56 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Tue, 13 May 2025 09:44:56 GMT Subject: Integrated: 8356774: AArch64: StubGen final stubs buffer too small for ZGC on Cavium CPU In-Reply-To: References: Message-ID: <1hLWtJdV1NiMGrBdZEWTTi_em48H9YdmPA86AK37iPM=.f979a6c2-15ff-4ddd-93e4-3740bc4b42ad@github.com> On Tue, 13 May 2025 08:47:37 GMT, Andrew Dinn wrote: > Increased final stubs buffer size to allow extra space when running with ZGC enabled on Cavium ThunderX. Cavium is special because it generates stub code to handle unaligned copies. With ZGC enabled this implies a lot more injected barrier code. This pull request has now been integrated. Changeset: 8ffc121b Author: Andrew Dinn URL: https://git.openjdk.org/jdk/commit/8ffc121b2fc6353d5419c2437d92911baac16b6b Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8356774: AArch64: StubGen final stubs buffer too small for ZGC on Cavium CPU Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/25203 From epeter at openjdk.org Tue May 13 09:53:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 May 2025 09:53:52 GMT Subject: RFR: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry In-Reply-To: References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: On Mon, 12 May 2025 12:11:36 GMT, Jatin Bhateja wrote: >> @jatin-bhateja I'll run some internal testing, please ping me in 24h for results! :) > >> @jatin-bhateja I'll run some internal testing, please ping me in 24h for results! :) > > Please use the latest version @jatin-bhateja Ah ok. Last tests had passed, but I'll re-run now with your newest updates. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25021#issuecomment-2875803888 From epeter at openjdk.org Tue May 13 09:59:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 May 2025 09:59:57 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 13 May 2025 08:53:08 GMT, Roberto Casta?eda Lozano wrote: >> @robcasloz Thanks for the explanations! >> I have no idea how the GC barriers work, and what addresses they load from. So I just had a list of questions run through my mind, about what could possibly go wrong. But the questions are more speculations, because I really have no idea what the GC barriers do. >> >> I think I need to have a look at the GC barrier code myself, to see which things are constant and which things can be mutated (possibly by another thread). What code / documentation do you recommend I look at? >> >> Ideally, we would have some sort of semi-formal proof, to guarantee that if we did ever encounter a null-pointer, we would have to encounter it already on that first load. > >> I think I need to have a look at the GC barrier code myself, to see which things are constant and which things can be mutated (possibly by another thread). What code / documentation do you recommend I look at? > > Regarding code, I recommend you starting [here](https://github.com/openjdk/jdk/blob/522c7b446fef17a8400bc589c55b161e939770cc/src/hotspot/cpu/x86/gc/z/z_x86_64.ad#L126-L129) and following `z_load_barrier`. The slow barrier path is generated in a stub [here](https://github.com/openjdk/jdk/blob/522c7b446fef17a8400bc589c55b161e939770cc/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp#L1217-L1235). > > Regarding documentation, you might have a look at the [TOPLAS paper](https://dl.acm.org/doi/full/10.1145/3538532) (which is unfortunately a bit outdated because it only covers non-generational ZGC, but might still offer some intuition that is valid for the latest ZGC version, in particular regarding concurrent relocation and load barriers), the [Generational ZGC JEP](https://openjdk.org/jeps/439), or one of the numerous presentations available on YouTube (e.g. I found the overview in https://www.youtube.com/watch?v=YyXjC68l8mw&t=864s pretty useful). @robcasloz Alright, to me this sounds convincing. I suggest you add a comment about this assumption, i.e. that the address we load from is always the same. And then let a GC engineer have a look at this PR, to confirm that this assumption is always correct, and that there is not some other path where the address could change ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2875820583 From mdoerr at openjdk.org Tue May 13 10:02:58 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 13 May 2025 10:02:58 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 06:09:25 GMT, Amit Kumar wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > improved mvc implementation The invariant on other platforms is that all Bytes before the non-writable address have been written when hitting a signal. I don' know if that is really required on s390. It may be a risk to use a different behavior. The code can be used to write memory mapped files or other stuff. If this behavior is not required, why not use mvc always? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2875829475 From thartmann at openjdk.org Tue May 13 10:29:53 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 13 May 2025 10:29:53 GMT Subject: RFR: 8353638: C2: deoptimization and re-execution cycle with StringBuilder In-Reply-To: References: Message-ID: On Fri, 9 May 2025 14:57:54 GMT, Marc Chevalier wrote: > Unlike what was assumed at first, it is quite different from [JDK-8346989](https://bugs.openjdk.org/browse/JDK-8346989). The problem is actually unrelated to `StringBuilder`, but has to do with the underlying array allocation. > > Here, the problem is that the array allocation function, that is throwing when given a negative length, causes a deopt rather than using the compiled exception handlers. This is an old workaround, and the flag `StressCompiledExceptionHandlers` to rather use compiled handlers instead of deopting was added in [JDK-8004741](https://bugs.openjdk.org/browse/JDK-8004741) in 2012. This flag is used in testing since october 2022. > > So maybe it's time to use the compiled exception handlers! I propose to turn them on by default, and instead, add a diagnostic flag to deopt instead, in case something goes wrong. Doing so improve the performance to match the ones of C1 (both for direct array allocation, and `StringBuilder` construction). For instance, with the case given in the JBS issue: > > Stop at level 0 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,277s > user 0m4,214s > sys 0m0,117s > > Stop at level 1 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,104s > user 0m4,079s > sys 0m0,106s > > Stop at level 2 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,308s > user 0m4,239s > sys 0m0,145s > > Stop at level 3 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,304s > user 0m4,247s > sys 0m0,132s > > Default (Stop at level 4) > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,086s > user 0m4,059s > sys 0m0,122s > > > > I've run some tests (up to tier10), it seems all fine, ignoring the usual noise. I've checked with @dougxc, it shouldn't impact Graal as it doesn't use `OptoRuntime`. I agree that such exceptional cases are usually not worth optimizing but in this case we already emitted all the code to handle them, so why deopt? The change looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25149#pullrequestreview-2836166665 From avoitylov at openjdk.org Tue May 13 10:49:54 2025 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Tue, 13 May 2025 10:49:54 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on Cortex-A53 In-Reply-To: <1P_0NKxAumQ7K6LwbtQZLJ8Kx_SXUFz5QyiE2V4WmM0=.1d8e9832-d257-418f-863d-3f8d6f228aeb@github.com> References: <1P_0NKxAumQ7K6LwbtQZLJ8Kx_SXUFz5QyiE2V4WmM0=.1d8e9832-d257-418f-863d-3f8d6f228aeb@github.com> Message-ID: On Tue, 13 May 2025 08:51:40 GMT, Tobias Hartmann wrote: >> The root of the problem is that VectorizedHashCode intrinsic introduced by JDK-8341194 is not aware of JDK-8079203. JDK-8079203 generates additional nop with madd instruction on Cortex-A53 as a workaround for Cortex-A53 erratum 835769 "AArch64 multiply-accumulate instruction might produce incorrect result". Current VectorizedHashCode intrinsic calculates byte offset to jump inside the unrolled loop code. It assumes 2 instructions per each unrolled iteration (load and madd). JDK-8079203 adds additional nop for Cortex-A53, which breaks offset calculation logic. >> ? >> Current offset calculation logic is using shift instead of multiplication, power-of-2 number instructions are present in each unrolled loop iteration. To keep it simple, this fix adds one more nop into each loop iteration on Cortex-A53 in order to have 4 instruction per iteration, which is also a power-of-2. To account for that, the shift argument for offset calculation logic is increased by 1, because each loop iteration has 2 times more instructions on Cortex-A53. >> ? >> This fix is tested on Raspberry Pi 3 (based on Cortex-A53) by running initially reported application and by running hotspot jtreg tests (not a single test could be run on Cortex-A53 before the fix). After the fix, the specialized test hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java passes. >> >> The performance gain from the intrinsic is also observed on Cortex-A53 using the ArraysHashCode benchmark. > > This seems to be a bit stuck and since we are getting closer to RDP 1 for JDK 25 (June 05, 2025), would it make sense to integrate the patch as-is and file a follow-up RFE to investigate potential other issues? Thanks @TobiHartmann, I filed JDK-8356856 with some ideas on what could be investigated and improved further. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2875986207 From thartmann at openjdk.org Tue May 13 10:51:53 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 13 May 2025 10:51:53 GMT Subject: RFR: 8355488: Add stress mode for C2 loop peeling In-Reply-To: References: Message-ID: On Fri, 9 May 2025 11:31:27 GMT, Marc Chevalier wrote: > If anyone has another idea, I'd be happy to move it. What about keeping track of the peeling in `LoopNode`? We already have an `_unswitch_count` there to limit loop unswitching. src/hotspot/share/opto/compile.cpp line 666: > 664: _congraph(nullptr), > 665: NOT_PRODUCT(_igv_printer(nullptr) COMMA) > 666: NOT_PRODUCT(_peeling_rounds_of_node(comp_arena(), 8, 0, Pair(0, 0)) COMMA) `NOT_PRODUCT` means that it's also available in the optimized build but you only want/need it in debug. src/hotspot/share/opto/compile.cpp line 5295: > 5293: > 5294: uint& Compile::peeling_rounds_at_node(const Node* const head) { > 5295: for(int i = 0; i < _peeling_rounds_of_node.length(); ++i) { Suggestion: for (int i = 0; i < _peeling_rounds_of_node.length(); ++i) { src/hotspot/share/opto/compile.cpp line 5297: > 5295: for(int i = 0; i < _peeling_rounds_of_node.length(); ++i) { > 5296: auto& head_and_round_count = _peeling_rounds_of_node.at(i); > 5297: if(head_and_round_count.first == head->_idx) { Suggestion: if (head_and_round_count.first == head->_idx) { src/hotspot/share/opto/loopTransform.cpp line 506: > 504: > 505: // Check for vectorized loops, any peeling done was already applied. > 506: // Peeling is not legal here, we don't even stress peel! Should this comment go to the `return 0;` and also describe why it's not legal? ------------- PR Review: https://git.openjdk.org/jdk/pull/25140#pullrequestreview-2836183709 PR Review Comment: https://git.openjdk.org/jdk/pull/25140#discussion_r2086497092 PR Review Comment: https://git.openjdk.org/jdk/pull/25140#discussion_r2086491129 PR Review Comment: https://git.openjdk.org/jdk/pull/25140#discussion_r2086491377 PR Review Comment: https://git.openjdk.org/jdk/pull/25140#discussion_r2086492477 From duke at openjdk.org Tue May 13 11:02:54 2025 From: duke at openjdk.org (Ulrich Weigand) Date: Tue, 13 May 2025 11:02:54 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: <8qjDACyV6xL20tkJRlWm_Ur6FVV1OPryEgh90dqRlug=.9095bba3-f3e4-4f2c-8c1f-c80d99e973a2@github.com> On Tue, 13 May 2025 09:59:49 GMT, Martin Doerr wrote: > The invariant on other platforms is that all Bytes before the non-writable address have been written when hitting a signal. I don' know if that is really required on s390. It may be a risk to use a different behavior. The code can be used to write memory mapped files or other stuff. If this behavior is not required, why not use mvc always? I thought the reason for not using mvc always is atomicity within array elements? That is, if you're writing an array of 4- or 8-byte values, than change to every one of those array elements should be atomic w.r.t. other CPUs. If that is true, you cannot use mvc. (However, that requirement would not be relevant for arrays of 1-byte values.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2876031937 From mchevalier at openjdk.org Tue May 13 11:31:04 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 13 May 2025 11:31:04 GMT Subject: RFR: 8325647: [IR framework] Only prints stdout if exitCode is 134 Message-ID: <8Chv6pneN7s8OzJxIjGxfNVmr1q-StTW1PuGNC3yBJE=.c9f940d8-faac-4114-b3a0-ff449f73c8b5@github.com> On Linux, `assert` and such eventually use `abort` which give the return code 134 (128 + 6 (code of SIGABRT/SIGIOT)). On Windows, dying returns `-1` (exit code are more-or-less-signed int on Windows): https://github.com/openjdk/jdk/blob/2b3254160933e8b11527f801507a9c01b90d22b0/src/hotspot/os/windows/os_windows.cpp#L1382-L1384 So let's make the IR framework aware of this: we consider there was a JVM error if the OS is windows and the return code -1, or if it's 134 otherwise. I'm not sure what's the most idiomatic/robust way to check whether we are on Windows or not, but it's not customer code: it just needs to work for testing. ------------- Commit messages: - Correctly detect JVM errors on windows Changes: https://git.openjdk.org/jdk/pull/25200/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25200&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325647 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25200.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25200/head:pull/25200 PR: https://git.openjdk.org/jdk/pull/25200 From epeter at openjdk.org Tue May 13 11:34:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 May 2025 11:34:54 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock [v3] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 09:25:43 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> This PR addresses an `assert(bb->is_reachable())` that is triggered in the code for `-XX:+VerifyStack` after a deoptimization with reason `null_assert_or_unreached0` at a `getstatic` bytecode. Following the `getstatic` is an `areturn` and then an unreachable bytecode. When the code for `VerifyStack` tries to compute an oop map for the basic block of the unreachable bytecode, the assert triggers: >> >> getstatic Field A.val:"LB"; // if class B is not loaded, C2 deopts with reason "null_assert_or_unreached0" >> areturn; >> // The following is unreachable >> iconst_0; >> >> >> This is a similar problem to [JDK-8271055](https://bugs.openjdk.org/browse/JDK-8271055) (#7331), but this particular deopt with reason `null_assert_or_unreached0` at `getstatic` of a field containing an object reference [deopts at the next bytecode](https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/opto/parse3.cpp#L176-L199). The aforementioned issue introduced a check to skip stack verification of the next bytecode in the code if the execution after the deopted bytecode does not continue at the next bytecode in the code, i.e. falls through to the next bytecode. Unfortunately, this check did not include `areturn` as a bytecode that does not fall-through: >> https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/runtime/deoptimization.cpp#L845-L856 >> >> # Change Summary >> >> To fix the immediate issue described above, this PR adds `areturn` to the list of bytecodes that does not fall through. However, all return bytecodes exhibit the same behavior and might be susceptible to a similar issue. Even though I was not able to reproduce the same crash with `{d,f,i,l}return` because I could not get those or the preceding bytecode to deopt, I also added them to the `falls_through()` function. For the remaining bytecodes in `falls_through()` with the exception of `athrow` I wrote a regression test. >> >> # Testing >> >> - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14595928439) >> - [x] tier1 through tier3 on Oracle supported platforms and OSs plus Oracle internal testing >> >> # Acknowledgements >> Special thanks to @eme64 for his hard work on reducing a reproducer that works on all platforms. > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Add jsr to falls_through() Looks reasonable to me :) src/hotspot/share/runtime/deoptimization.cpp line 845: > 843: > 844: #ifndef PRODUCT > 845: // Return true if the execution after the provided bytecode continues at the Suggestion: // Return true if the execution after the provided bytecode can continue at the Nit: Because a `cmp_if` may or may not continue with the next bci. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25118#pullrequestreview-2836355702 PR Review Comment: https://git.openjdk.org/jdk/pull/25118#discussion_r2086593337 From chagedorn at openjdk.org Tue May 13 11:41:57 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 13 May 2025 11:41:57 GMT Subject: RFR: 8325647: [IR framework] Only prints stdout if exitCode is 134 In-Reply-To: <8Chv6pneN7s8OzJxIjGxfNVmr1q-StTW1PuGNC3yBJE=.c9f940d8-faac-4114-b3a0-ff449f73c8b5@github.com> References: <8Chv6pneN7s8OzJxIjGxfNVmr1q-StTW1PuGNC3yBJE=.c9f940d8-faac-4114-b3a0-ff449f73c8b5@github.com> Message-ID: <3C-T8Ib0RcLtyxhZomQQNAx6Hf0eHKwjdHQk98tHYVs=.6658ceaa-49d2-419c-b1f5-9b5d95a74cd1@github.com> On Tue, 13 May 2025 08:03:21 GMT, Marc Chevalier wrote: > On Linux, `assert` and such eventually use `abort` which give the return code 134 (128 + 6 (code of SIGABRT/SIGIOT)). On Windows, dying returns `-1` (exit code are more-or-less-signed int on Windows): > > https://github.com/openjdk/jdk/blob/2b3254160933e8b11527f801507a9c01b90d22b0/src/hotspot/os/windows/os_windows.cpp#L1382-L1384 > > So let's make the IR framework aware of this: we consider there was a JVM error if the OS is windows and the return code -1, or if it's 134 otherwise. I'm not sure what's the most idiomatic/robust way to check whether we are on Windows or not, but it's not customer code: it just needs to work for testing. Thanks for fixing this! test/hotspot/jtreg/compiler/lib/ir_framework/driver/TestVMProcess.java line 262: > 260: String stdErr = oa.getStderr(); > 261: String stdOut = ""; > 262: boolean osIsWindows = System.getProperty("os.name").toLowerCase().contains("windows"); You can use `Platform.isWindows()` instead. ------------- Changes requested by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25200#pullrequestreview-2836396502 PR Review Comment: https://git.openjdk.org/jdk/pull/25200#discussion_r2086611206 From mhaessig at openjdk.org Tue May 13 11:54:53 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 13 May 2025 11:54:53 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock [v3] In-Reply-To: References: Message-ID: On Tue, 13 May 2025 11:28:30 GMT, Emanuel Peter wrote: >> Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: >> >> Add jsr to falls_through() > > src/hotspot/share/runtime/deoptimization.cpp line 845: > >> 843: >> 844: #ifndef PRODUCT >> 845: // Return true if the execution after the provided bytecode continues at the > > Suggestion: > > // Return true if the execution after the provided bytecode can continue at the > > Nit: Because a `cmp_if` may or may not continue with the next bci. I looked into that and both basic blocks following the `if_cmp` will always be reachable as far as `bb->is_reachable()` is concerned. The bytecode verification code of the `if_cmp` to check that the stack depth on both the then and else branch are the same requires both basic blocks to be reachable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25118#discussion_r2086635527 From epeter at openjdk.org Tue May 13 12:06:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 13 May 2025 12:06:53 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock [v3] In-Reply-To: References: Message-ID: On Tue, 13 May 2025 11:51:59 GMT, Manuel H?ssig wrote: >> src/hotspot/share/runtime/deoptimization.cpp line 845: >> >>> 843: >>> 844: #ifndef PRODUCT >>> 845: // Return true if the execution after the provided bytecode continues at the >> >> Suggestion: >> >> // Return true if the execution after the provided bytecode can continue at the >> >> Nit: Because a `cmp_if` may or may not continue with the next bci. > > I looked into that and both basic blocks following the `if_cmp` will always be reachable as far as `bb->is_reachable()` is concerned. The bytecode verification code of the `if_cmp` to check that the stack depth on both the then and else branch are the same requires both basic blocks to be reachable. Ah, I think there is a misunderstanding: I am saying that `if_cmp` does not always continue. Your statement seems to suggest that all the ones you return `true` for "continue at the next bytecode". That is missing some nuance. They `can` continue to there, but they do not always. I'm just asking for the wording to be more precise. You may even want to change the name of the whole function. `falls_through` suggests that they would always fall through. But you are rather asking for "does not continue at next bci", `has_no_fallthrough` or similar. I leave it up to you if / what you want to do here :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25118#discussion_r2086657500 From mdoerr at openjdk.org Tue May 13 12:33:56 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 13 May 2025 12:33:56 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: <8qjDACyV6xL20tkJRlWm_Ur6FVV1OPryEgh90dqRlug=.9095bba3-f3e4-4f2c-8c1f-c80d99e973a2@github.com> References: <8qjDACyV6xL20tkJRlWm_Ur6FVV1OPryEgh90dqRlug=.9095bba3-f3e4-4f2c-8c1f-c80d99e973a2@github.com> Message-ID: On Tue, 13 May 2025 11:00:32 GMT, Ulrich Weigand wrote: > However, that requirement would not be relevant for arrays of 1-byte values. Correct. `Unsafe::setMemory` fills a memory region with 1-byte values. So, atomicity can't be a problem. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2876324631 From yzheng at openjdk.org Tue May 13 12:39:55 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 13 May 2025 12:39:55 GMT Subject: RFR: 8356447: Change default for EagerJVMCI to true [v2] In-Reply-To: <4rohFHtNW1xFl9DQ47qqySsYnYxtfrO7-UZ--L3CRmA=.06aa514d-1846-47ae-b7bd-7535bed88fcb@github.com> References: <4rohFHtNW1xFl9DQ47qqySsYnYxtfrO7-UZ--L3CRmA=.06aa514d-1846-47ae-b7bd-7535bed88fcb@github.com> Message-ID: On Tue, 13 May 2025 06:52:27 GMT, Doug Simon wrote: >> By default, JVMCI and Graal initialization only occurs on the first top-tier (i.e. tier 4) JIT compilation request. This made sense prior to libgraal where the initialization was interpreted and so noticeably contributed to VM startup. However, with libgraal the initialization is sufficiently fast to not impact startup noticeably. >> >> The motivation for JVMCI and Graal eager initialization by default is to make Graal command line option processing happen in the same VM phase as handling of all other VM command line flags. That is, errors in Graal options should: >> 1. Happen deterministically, not just for apps that run long enough to trigger a top tier JIT compilation. For example: `java -XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT --version`. In a JDK build that does not include Graal, this may succeed (and print out the version info) or result in a VM error ("Cannot use JVMCI compiler: No JVMCI compiler found"). >> 2. Stop the VM before any application code can be executed. This is just good hygiene. >> >> This PR makes JVMCI initialization eager by default if `UseJVMCICompiler` is true. >> This is done for both libgraal and jargraal so that the behavior is uniform. Since jargraal is now a development configuration, VM startup costs are not critical. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > use FLAG_SET_ERGO_IF_DEFAULT LGTM ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/25121#pullrequestreview-2836595615 From mhaessig at openjdk.org Tue May 13 12:41:52 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 13 May 2025 12:41:52 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock [v3] In-Reply-To: References: Message-ID: On Tue, 13 May 2025 12:03:50 GMT, Emanuel Peter wrote: >> I looked into that and both basic blocks following the `if_cmp` will always be reachable as far as `bb->is_reachable()` is concerned. The bytecode verification code of the `if_cmp` to check that the stack depth on both the then and else branch are the same requires both basic blocks to be reachable. > > Ah, I think there is a misunderstanding: > I am saying that `if_cmp` does not always continue. Your statement seems to suggest that all the ones you return `true` for "continue at the next bytecode". That is missing some nuance. They `can` continue to there, but they do not always. I'm just asking for the wording to be more precise. > > You may even want to change the name of the whole function. `falls_through` suggests that they would always fall through. But you are rather asking for "does not continue at next bci", `has_no_fallthrough` or similar. > > I leave it up to you if / what you want to do here :) Ah, I got confused again between "should be in `falls_through()`" and "can trigger this assert". `if_cmp` cannot trigger the assert, but could be in `falls_through()`. Since it can not fail as per our current understanding and because it does not fit the current semantics of `falls_through()` I would opt to leave it as it is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25118#discussion_r2086722901 From bkilambi at openjdk.org Tue May 13 12:49:32 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 13 May 2025 12:49:32 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v2] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 10:10:58 GMT, Bhavana Kilambi wrote: >> Sounds good to me. But I'm worried it may crash with bad ad file on AArch64 if the Vector API java and compiler IR part is ready for HF types, while the AArch64 relative masked rules are missing. Beacause the masked vector IR have been generated, while the codegen is missing on AArch64. We have to add the HF ops to `match_rule_supported_vector_masked` first, and then remove them when adding the masked version rules. WDYT? > > It's a good idea to disable masked ops in the backend. That should help not generate the masked IR in the inline expanders. I'll update this PR soon. Thanks for your comments. Done. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25096#discussion_r2086737010 From bkilambi at openjdk.org Tue May 13 12:49:32 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 13 May 2025 12:49:32 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v2] In-Reply-To: References: Message-ID: > This patch adds aarch64 backend (both Neon and SVE) for FP16 vector operations - add, mul, sub, div, min, max, sqrt and fma. > > Testing: > JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) pass on aarch64 which also includes the JTREG test to test the FP16 vector operations - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25096/files - new: https://git.openjdk.org/jdk/pull/25096/files/56edd6df..080a0ce8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25096&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25096&range=00-01 Stats: 123 lines in 3 files changed: 26 ins; 95 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25096.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25096/head:pull/25096 PR: https://git.openjdk.org/jdk/pull/25096 From bkilambi at openjdk.org Tue May 13 12:49:32 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 13 May 2025 12:49:32 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v2] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 03:48:20 GMT, Hao Sun wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 225: > >> 223: case Op_MaxVHF: >> 224: case Op_SqrtVHF: >> 225: // FEAT_FP16 is enabled if both "fphp" and "asimdhp" features are supported. > > It's an unary op and we should add it to `is_vector_unary_op_name` in `adlc/dfa.cpp`. > See the related code in the previous patch https://github.com/openjdk/jdk/pull/9534 Done. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25096#discussion_r2086736470 From duke at openjdk.org Tue May 13 13:04:28 2025 From: duke at openjdk.org (Anjian-Wen) Date: Tue, 13 May 2025 13:04:28 GMT Subject: RFR: 8356869: RISC-V: Improve tail handling of array fill stub Message-ID: The tail handling after bulk copy in array fill stub may trigger misaligned memory accesses. The address is 8-byte aligned after bulk copy and the tail handling copies BYTE, SHORT, and INT granules in order. This could trigger misaligned accesses. We should copy the remainings in this order: INT, SHORT, and BYTE to avoid such an issue. JMH data on P550 SBC for reference (@Param("15") private int size): Before: Benchmark (size) Mode Cnt Score Error Units ArrayFill.fillByteArray 15 avgt 12 961.604 ? 1.497 ns/op ArrayFill.fillIntArray 15 avgt 12 29.355 ? 0.024 ns/op ArrayFill.fillShortArray 15 avgt 12 569.499 ? 0.662 ns/op ArrayFill.zeroByteArray 15 avgt 12 957.080 ? 5.358 ns/op ArrayFill.zeroIntArray 15 avgt 12 29.344 ? 0.006 ns/op ArrayFill.zeroShortArray 15 avgt 12 569.730 ? 0.441 ns/op After: Benchmark (size) Mode Cnt Score Error Units ArrayFill.fillByteArray 15 avgt 12 32.206 ? 0.005 ns/op ArrayFill.fillIntArray 15 avgt 12 29.347 ? 0.007 ns/op ArrayFill.fillShortArray 15 avgt 12 31.732 ? 0.451 ns/op ArrayFill.zeroByteArray 15 avgt 12 32.208 ? 0.007 ns/op ArrayFill.zeroIntArray 15 avgt 12 29.346 ? 0.007 ns/op ArrayFill.zeroShortArray 15 avgt 12 31.492 ? 0.006 ns/op ------------- Commit messages: - RISCV: fix alignment of generate_fill after fill_word Changes: https://git.openjdk.org/jdk/pull/25210/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25210&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356869 Stats: 19 lines in 1 file changed: 5 ins; 5 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/25210.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25210/head:pull/25210 PR: https://git.openjdk.org/jdk/pull/25210 From duke at openjdk.org Tue May 13 13:15:08 2025 From: duke at openjdk.org (Anjian-Wen) Date: Tue, 13 May 2025 13:15:08 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v8] In-Reply-To: References: Message-ID: > From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time > > on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below > > before the patch > `Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op > MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op > MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op > MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op > MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op > MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op > MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op > MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op > MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op > MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op > MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op > MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op > MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op > MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op > MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op > MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op > MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op > MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op > MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/op > MemorySegmentZeroUnsafe.panama false 63 avgt 30... Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: Improve tail handling alignment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23890/files - new: https://git.openjdk.org/jdk/pull/23890/files/918bf5aa..f57086b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=06-07 Stats: 9 lines in 1 file changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23890/head:pull/23890 PR: https://git.openjdk.org/jdk/pull/23890 From mli at openjdk.org Tue May 13 13:29:55 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 13 May 2025 13:29:55 GMT Subject: RFR: 8350960: RISC-V: Add riscv backend for Float16 operations - vectorization In-Reply-To: References: Message-ID: On Tue, 13 May 2025 03:16:56 GMT, Fei Yang wrote: >> Hi, >> Can you help to review this patch? >> It's a follow-up of https://github.com/openjdk/jdk/commit/9a3f9997b68a1f64e53b9711b878fb073c3c9b90. >> Thanks! >> >> ## Test >> >> performance test running in progress ... > > src/hotspot/cpu/riscv/riscv_v.ad line 125: > >> 123: return UseZvfh; >> 124: case Op_FmaVHF: >> 125: return UseZvfh && UseFMA; > > Maybe group with the existing two cases at L98 and L99 (Op_VectorCastHF2F / Op_VectorCastF2HF)? Make sense, will fix. > src/hotspot/cpu/riscv/riscv_v.ad line 382: > >> 380: ins_encode %{ >> 381: assert(UseZvfh, "must"); >> 382: BasicType bt = Matcher::vector_element_basic_type(this); > > Question: What is `bt` calculated here? Seems there isn't one for HF16 in `enum BasicType` definition in file src/hotspot/share/utilities/globalDefinitions.hpp. I only see `T_FLOAT` and `T_DOUBLE`, which I don't think is usable here as we need to set SEW=16 for this instruction. No, it uses T_SHORT instead, in Float16.java it also uses a short as underlying payload. And if you check the generated assembly code, you'll find some code like `vsetivli t0,16,e16,m1,tu,mu`. To avoid confusion, I will add an assertion here so that it can be understood later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25181#discussion_r2086829723 PR Review Comment: https://git.openjdk.org/jdk/pull/25181#discussion_r2086829293 From mli at openjdk.org Tue May 13 13:43:06 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 13 May 2025 13:43:06 GMT Subject: RFR: 8350960: RISC-V: Add riscv backend for Float16 operations - vectorization [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > It's a follow-up of https://github.com/openjdk/jdk/commit/9a3f9997b68a1f64e53b9711b878fb073c3c9b90. > Thanks! > > ## Test > > performance test running in progress ... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: minor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25181/files - new: https://git.openjdk.org/jdk/pull/25181/files/6eb66599..0ad3b5a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25181&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25181&range=00-01 Stats: 14 lines in 1 file changed: 11 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25181.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25181/head:pull/25181 PR: https://git.openjdk.org/jdk/pull/25181 From shade at openjdk.org Tue May 13 13:47:09 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 13 May 2025 13:47:09 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v16] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: - Rename CompilerTask::is_unloaded back to avoid losing comment context - Simplify select_for_compilation - Merge branch 'master' into JDK-8231269-compile-task-weaks - More touchups - Fix release builds - More thorough locking and redefinition escape hatch - Fix build failures: add more headers - Tracking UMH state more accurately - Rework for safer concurrency - Merge branch 'master' into JDK-8231269-compile-task-weaks - ... and 19 more: https://git.openjdk.org/jdk/compare/48d2acb3...59798bdb ------------- Changes: https://git.openjdk.org/jdk/pull/24018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=15 Stats: 422 lines in 12 files changed: 379 ins; 19 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From aph at openjdk.org Tue May 13 14:17:14 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 13 May 2025 14:17:14 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v4] In-Reply-To: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: <_YHxBGEeUL7hfvcstwV9F0BCAC-ijDjoI__DrDAQuqM=.5e56c31b-5cf1-447f-83cf-2166dccf7b0e@github.com> > This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). > > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op > MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op > MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op > MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Cleanups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25147/files - new: https://git.openjdk.org/jdk/pull/25147/files/47179d57..23838f8b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25147&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25147&range=02-03 Stats: 6 lines in 1 file changed: 3 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25147/head:pull/25147 PR: https://git.openjdk.org/jdk/pull/25147 From aph at openjdk.org Tue May 13 14:46:54 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 13 May 2025 14:46:54 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on Cortex-A53 In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 12:39:40 GMT, Aleksei Voitylov wrote: > The root of the problem is that VectorizedHashCode intrinsic introduced by JDK-8341194 is not aware of JDK-8079203. JDK-8079203 generates additional nop with madd instruction on Cortex-A53 as a workaround for Cortex-A53 erratum 835769 "AArch64 multiply-accumulate instruction might produce incorrect result". Current VectorizedHashCode intrinsic calculates byte offset to jump inside the unrolled loop code. It assumes 2 instructions per each unrolled iteration (load and madd). JDK-8079203 adds additional nop for Cortex-A53, which breaks offset calculation logic. > ? > Current offset calculation logic is using shift instead of multiplication, power-of-2 number instructions are present in each unrolled loop iteration. To keep it simple, this fix adds one more nop into each loop iteration on Cortex-A53 in order to have 4 instruction per iteration, which is also a power-of-2. To account for that, the shift argument for offset calculation logic is increased by 1, because each loop iteration has 2 times more instructions on Cortex-A53. > ? > This fix is tested on Raspberry Pi 3 (based on Cortex-A53) by running initially reported application and by running hotspot jtreg tests (not a single test could be run on Cortex-A53 before the fix). After the fix, the specialized test hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java passes. > > The performance gain from the intrinsic is also observed on Cortex-A53 using the ArraysHashCode benchmark. OK. I don't like this much, but it'll do. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24489#pullrequestreview-2837088538 From kvn at openjdk.org Tue May 13 15:34:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 13 May 2025 15:34:52 GMT Subject: RFR: 8356447: Change default for EagerJVMCI to true [v2] In-Reply-To: <4rohFHtNW1xFl9DQ47qqySsYnYxtfrO7-UZ--L3CRmA=.06aa514d-1846-47ae-b7bd-7535bed88fcb@github.com> References: <4rohFHtNW1xFl9DQ47qqySsYnYxtfrO7-UZ--L3CRmA=.06aa514d-1846-47ae-b7bd-7535bed88fcb@github.com> Message-ID: On Tue, 13 May 2025 06:52:27 GMT, Doug Simon wrote: >> By default, JVMCI and Graal initialization only occurs on the first top-tier (i.e. tier 4) JIT compilation request. This made sense prior to libgraal where the initialization was interpreted and so noticeably contributed to VM startup. However, with libgraal, the initialization is sufficiently fast to not impact startup. >> >> The motivation for JVMCI and Graal eager initialization by default is to make Graal command line option processing happen in the same VM phase as handling of all other VM command line flags. That is, errors in Graal options should: >> 1. Happen deterministically, not just for apps that run long enough to trigger a top tier JIT compilation. For example: `java -XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT --version`. In a JDK build that does not include Graal, this may succeed (and print out the version info) or result in a VM error ("Cannot use JVMCI compiler: No JVMCI compiler found"). >> 2. Stop the VM before any application code can be executed. This is just good hygiene. >> >> This PR makes JVMCI initialization eager by default if `UseJVMCICompiler` is true. >> This is done for both libgraal and jargraal so that the behavior is uniform. Since jargraal is now a development configuration, VM startup costs are not critical. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > use FLAG_SET_ERGO_IF_DEFAULT Marked as reviewed by kvn (Reviewer). @dougxc please remind me. Is it true that with current libgraal no Java code is executed when it is initialized? Or you still have calls into core library? ------------- PR Review: https://git.openjdk.org/jdk/pull/25121#pullrequestreview-2837264646 PR Comment: https://git.openjdk.org/jdk/pull/25121#issuecomment-2876984838 From never at openjdk.org Tue May 13 15:39:51 2025 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 13 May 2025 15:39:51 GMT Subject: RFR: 8356447: Change default for EagerJVMCI to true [v2] In-Reply-To: <4rohFHtNW1xFl9DQ47qqySsYnYxtfrO7-UZ--L3CRmA=.06aa514d-1846-47ae-b7bd-7535bed88fcb@github.com> References: <4rohFHtNW1xFl9DQ47qqySsYnYxtfrO7-UZ--L3CRmA=.06aa514d-1846-47ae-b7bd-7535bed88fcb@github.com> Message-ID: On Tue, 13 May 2025 06:52:27 GMT, Doug Simon wrote: >> By default, JVMCI and Graal initialization only occurs on the first top-tier (i.e. tier 4) JIT compilation request. This made sense prior to libgraal where the initialization was interpreted and so noticeably contributed to VM startup. However, with libgraal, the initialization is sufficiently fast to not impact startup. >> >> The motivation for JVMCI and Graal eager initialization by default is to make Graal command line option processing happen in the same VM phase as handling of all other VM command line flags. That is, errors in Graal options should: >> 1. Happen deterministically, not just for apps that run long enough to trigger a top tier JIT compilation. For example: `java -XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT --version`. In a JDK build that does not include Graal, this may succeed (and print out the version info) or result in a VM error ("Cannot use JVMCI compiler: No JVMCI compiler found"). >> 2. Stop the VM before any application code can be executed. This is just good hygiene. >> >> This PR makes JVMCI initialization eager by default if `UseJVMCICompiler` is true. >> This is done for both libgraal and jargraal so that the behavior is uniform. Since jargraal is now a development configuration, VM startup costs are not critical. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > use FLAG_SET_ERGO_IF_DEFAULT Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25121#pullrequestreview-2837280275 From dnsimon at openjdk.org Tue May 13 15:53:53 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 13 May 2025 15:53:53 GMT Subject: RFR: 8356447: Change default for EagerJVMCI to true [v2] In-Reply-To: References: <4rohFHtNW1xFl9DQ47qqySsYnYxtfrO7-UZ--L3CRmA=.06aa514d-1846-47ae-b7bd-7535bed88fcb@github.com> Message-ID: <5ryTduYlJ4b6MFzxFmjZaXl8Y7LhX5fG2TIPWXKs2dk=.c6840419-1b67-47b5-953c-437e36cf1cc0@github.com> On Tue, 13 May 2025 15:30:03 GMT, Vladimir Kozlov wrote: > @dougxc please remind me. Is it true that with current libgraal no Java code is executed when it is initialized? Or you still have calls into core library? There are still some calls to `CompilerToVM.lookupType` during libgraal initialization but I think all the types it looks up will already be resolved so will not require Java code execution. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25121#issuecomment-2877051478 From dnsimon at openjdk.org Tue May 13 16:02:00 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 13 May 2025 16:02:00 GMT Subject: Integrated: 8356447: Change default for EagerJVMCI to true In-Reply-To: References: Message-ID: On Thu, 8 May 2025 14:44:55 GMT, Doug Simon wrote: > By default, JVMCI and Graal initialization only occurs on the first top-tier (i.e. tier 4) JIT compilation request. This made sense prior to libgraal where the initialization was interpreted and so noticeably contributed to VM startup. However, with libgraal, the initialization is sufficiently fast to not impact startup. > > The motivation for JVMCI and Graal eager initialization by default is to make Graal command line option processing happen in the same VM phase as handling of all other VM command line flags. That is, errors in Graal options should: > 1. Happen deterministically, not just for apps that run long enough to trigger a top tier JIT compilation. For example: `java -XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT --version`. In a JDK build that does not include Graal, this may succeed (and print out the version info) or result in a VM error ("Cannot use JVMCI compiler: No JVMCI compiler found"). > 2. Stop the VM before any application code can be executed. This is just good hygiene. > > This PR makes JVMCI initialization eager by default if `UseJVMCICompiler` is true. > This is done for both libgraal and jargraal so that the behavior is uniform. Since jargraal is now a development configuration, VM startup costs are not critical. This pull request has now been integrated. Changeset: 08b2df80 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/08b2df80c68e182fbf6b1fc94e991c02b23040ec Stats: 32 lines in 6 files changed: 29 ins; 0 del; 3 mod 8356447: Change default for EagerJVMCI to true Reviewed-by: yzheng, kvn, never ------------- PR: https://git.openjdk.org/jdk/pull/25121 From dnsimon at openjdk.org Tue May 13 16:01:59 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 13 May 2025 16:01:59 GMT Subject: RFR: 8356447: Change default for EagerJVMCI to true [v2] In-Reply-To: <4rohFHtNW1xFl9DQ47qqySsYnYxtfrO7-UZ--L3CRmA=.06aa514d-1846-47ae-b7bd-7535bed88fcb@github.com> References: <4rohFHtNW1xFl9DQ47qqySsYnYxtfrO7-UZ--L3CRmA=.06aa514d-1846-47ae-b7bd-7535bed88fcb@github.com> Message-ID: On Tue, 13 May 2025 06:52:27 GMT, Doug Simon wrote: >> By default, JVMCI and Graal initialization only occurs on the first top-tier (i.e. tier 4) JIT compilation request. This made sense prior to libgraal where the initialization was interpreted and so noticeably contributed to VM startup. However, with libgraal, the initialization is sufficiently fast to not impact startup. >> >> The motivation for JVMCI and Graal eager initialization by default is to make Graal command line option processing happen in the same VM phase as handling of all other VM command line flags. That is, errors in Graal options should: >> 1. Happen deterministically, not just for apps that run long enough to trigger a top tier JIT compilation. For example: `java -XX:+UnlockExperimentalVMOptions -XX:+UseGraalJIT --version`. In a JDK build that does not include Graal, this may succeed (and print out the version info) or result in a VM error ("Cannot use JVMCI compiler: No JVMCI compiler found"). >> 2. Stop the VM before any application code can be executed. This is just good hygiene. >> >> This PR makes JVMCI initialization eager by default if `UseJVMCICompiler` is true. >> This is done for both libgraal and jargraal so that the behavior is uniform. Since jargraal is now a development configuration, VM startup costs are not critical. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > use FLAG_SET_ERGO_IF_DEFAULT Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25121#issuecomment-2877108155 From mchevalier at openjdk.org Tue May 13 16:03:09 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 13 May 2025 16:03:09 GMT Subject: RFR: 8325647: [IR framework] Only prints stdout if exitCode is 134 [v2] In-Reply-To: <8Chv6pneN7s8OzJxIjGxfNVmr1q-StTW1PuGNC3yBJE=.c9f940d8-faac-4114-b3a0-ff449f73c8b5@github.com> References: <8Chv6pneN7s8OzJxIjGxfNVmr1q-StTW1PuGNC3yBJE=.c9f940d8-faac-4114-b3a0-ff449f73c8b5@github.com> Message-ID: > On Linux, `assert` and such eventually use `abort` which give the return code 134 (128 + 6 (code of SIGABRT/SIGIOT)). On Windows, dying returns `-1` (exit code are more-or-less-signed int on Windows): > > https://github.com/openjdk/jdk/blob/2b3254160933e8b11527f801507a9c01b90d22b0/src/hotspot/os/windows/os_windows.cpp#L1382-L1384 > > So let's make the IR framework aware of this: we consider there was a JVM error if the OS is windows and the return code -1, or if it's 134 otherwise. I'm not sure what's the most idiomatic/robust way to check whether we are on Windows or not, but it's not customer code: it just needs to work for testing. Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Use Platform.isWindows ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25200/files - new: https://git.openjdk.org/jdk/pull/25200/files/7087c5d3..5759a146 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25200&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25200&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25200.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25200/head:pull/25200 PR: https://git.openjdk.org/jdk/pull/25200 From rcastanedalo at openjdk.org Tue May 13 16:03:43 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 16:03:43 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v2] In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). Roberto Casta?eda Lozano has updated the pull request incrementally with nine additional commits since the last revision: - Generalize tests by removing requires annotation and adding local applyIf rules - Assert that we do not move control nodes - Extend comment about hoisting DecodeN inputs - Apply Emanuels suggestions to ensure_node_is_at_block_or_above - Rename auxiliary functions - Rename auxiliary functions - Clarify scope of move_into - Extend comment about MachTemp nodes - Extract and reuse legitimize_address test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25066/files - new: https://git.openjdk.org/jdk/pull/25066/files/dc5aa4fc..6353f42b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=00-01 Stats: 66 lines in 5 files changed: 21 ins; 19 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/25066.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25066/head:pull/25066 PR: https://git.openjdk.org/jdk/pull/25066 From mchevalier at openjdk.org Tue May 13 16:06:52 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 13 May 2025 16:06:52 GMT Subject: RFR: 8325647: [IR framework] Only prints stdout if exitCode is 134 [v2] In-Reply-To: <3C-T8Ib0RcLtyxhZomQQNAx6Hf0eHKwjdHQk98tHYVs=.6658ceaa-49d2-419c-b1f5-9b5d95a74cd1@github.com> References: <8Chv6pneN7s8OzJxIjGxfNVmr1q-StTW1PuGNC3yBJE=.c9f940d8-faac-4114-b3a0-ff449f73c8b5@github.com> <3C-T8Ib0RcLtyxhZomQQNAx6Hf0eHKwjdHQk98tHYVs=.6658ceaa-49d2-419c-b1f5-9b5d95a74cd1@github.com> Message-ID: On Tue, 13 May 2025 11:38:25 GMT, Christian Hagedorn wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Use Platform.isWindows > > test/hotspot/jtreg/compiler/lib/ir_framework/driver/TestVMProcess.java line 262: > >> 260: String stdErr = oa.getStderr(); >> 261: String stdOut = ""; >> 262: boolean osIsWindows = System.getProperty("os.name").toLowerCase().contains("windows"); > > You can use `Platform.isWindows()` instead. Better, thanks. And replaced. (My implementation was actually pretty similar to `Platform.isWindows()`, there seems not to be a much nicer way, but only name matching.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25200#discussion_r2087178535 From rcastanedalo at openjdk.org Tue May 13 16:06:56 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 16:06:56 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v2] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 8 May 2025 09:51:44 GMT, Emanuel Peter wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with nine additional commits since the last revision: >> >> - Generalize tests by removing requires annotation and adding local applyIf rules >> - Assert that we do not move control nodes >> - Extend comment about hoisting DecodeN inputs >> - Apply Emanuels suggestions to ensure_node_is_at_block_or_above >> - Rename auxiliary functions >> - Rename auxiliary functions >> - Clarify scope of move_into >> - Extend comment about MachTemp nodes >> - Extract and reuse legitimize_address test > > src/hotspot/cpu/aarch64/gc/z/z_aarch64.ad line 130: > >> 128: Address::offset_ok_for_immed(ref_addr.offset(), exact_log2(size)), >> 129: "an instruction that can be used for implicit null checking should emit the candidate memory access first"); >> 130: ref_addr = __ legitimize_address(ref_addr, size, rscratch2); > > For context: > > 132 /* Sometimes we get misaligned loads and stores, usually from Unsafe > 133 accesses, and these can exceed the offset range. */ > 134 Address legitimize_address(const Address &a, int size, Register scratch) { > 135 if (a.getMode() == Address::base_plus_offset) { > 136 if (! Address::offset_ok_for_immed(a.offset(), exact_log2(size))) { > 137 block_comment("legitimize_address {"); > 138 lea(scratch, a); > 139 block_comment("} legitimize_address"); > 140 return Address(scratch); > 141 } > 142 } > 143 return a; > 144 } > > I wonder if it might be worth to create a `legitimize_address_requires_lea` that does the checks. Then you could refactor `legitimize_address` with it, and also use it here. Not sure if it is worth it, but it could ensure that the checks stay in sync. Up to you. Thanks, done (commit 5c7da867). > What about the `MachTemp`? I did not include moving incoming MachTemp nodes so that I could reuse the function across `PhaseCFG::implicit_null_check` without risking behavioral changes. I extended the comment of `move_into` to clarify its scope (commit d6a749e4). > Also: how specific to implicit null checks are your methods `move_into` and `maybe_hoist_into`? If they are not reusable elsewhere, it may be good to give them a more specific name. I changed the name according to your suggestion below, except using "above" instead of "before" which I find more natural when referring to the dominator tree (commits dbe46110 and bcf08f90). > src/hotspot/share/opto/lcm.cpp line 356: > >> 354: if (mach->in(j)->is_MachTemp()) { >> 355: assert(mach->in(j)->outcnt() == 1, "MachTemp nodes should not be shared"); >> 356: // Ignore MachTemp inputs, they can be safely hoisted with the candidate. > > Suggestion: > > // Ignore MachTemp inputs, they can be safely hoisted with the candidate. > // MachTemp have no inputs themselves and are only there to reserve a scratch > // register for the GC barrier of the memory operation. > > That was what you told me in our offline meeting, I thought it was helpful context information. Thanks, added a slightly generalized version (commit 446649a6). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087177975 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087179887 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087178324 From rcastanedalo at openjdk.org Tue May 13 16:17:59 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 16:17:59 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v2] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 8 May 2025 11:10:05 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/lcm.cpp line 428: >> >>> 426: maybe_hoist_into(val->in(i), block); >>> 427: } >>> 428: move_into(val, block); >> >> Suggestion: >> >> // Inputs of val may already be early enough, but if not move them together with val. >> ensure_node_is_at_block_or_before(val->in(i), block); >> } >> move_node_and_its_projections_to_block(val, block); > > It's a little hard to see here: did you just refactor this code, or make any changes? I just refactored the code (extracted and generalized the logic into the `ensure_node_is_at_block_or_above` and `move_node_and_its_projections_to_block` primitives so that it can be reused by the new logic (dealing with `MachTemp` inputs) and also by other existing logic (hoisting the memory candidate and its flag-killing projections). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087186663 From rcastanedalo at openjdk.org Tue May 13 16:17:57 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 16:17:57 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v2] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 8 May 2025 10:48:26 GMT, Emanuel Peter wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with nine additional commits since the last revision: >> >> - Generalize tests by removing requires annotation and adding local applyIf rules >> - Assert that we do not move control nodes >> - Extend comment about hoisting DecodeN inputs >> - Apply Emanuels suggestions to ensure_node_is_at_block_or_above >> - Rename auxiliary functions >> - Rename auxiliary functions >> - Clarify scope of move_into >> - Extend comment about MachTemp nodes >> - Extract and reuse legitimize_address test > > src/hotspot/share/opto/lcm.cpp line 79: > >> 77: } >> 78: >> 79: void PhaseCFG::move_into(Node* n, Block* b) { > > Suggestion: > > void PhaseCFG::move_node_and_its_projections_to_block(Node* n, Block* b) { Done. > Do I understand this right: You are looking at some input `n` here, and want to make sure that it is located at `b` or before? Right. > I did not understand what this meant: `sanity check: temp node placement`... Ah, I suppose we are assuming that `n` is a `MachTemp`, and this would have to be placed in a block dominated by b? But could `n` not also be a `load Base`? Could that be a `MachProj`? Just a little confused here. Maybe moving the `b->dominates(current)` assert down helps give good context? But in a sense, it is also a precondition, we can only move `n` up to `b` if `b` dominates `n`... > > Do you have a better idea? Right, the comment comes from the original context from which the code is moved, and I guess it should be generalized to make more sense in its new context. I went with your suggestion (commit 793bbe7f). The intention of `ensure_node_is_at_block_or_above` becomes hopefully clear by looking at its callees. > That was what you told me in our offline meeting, I thought it was helpful context information. Thanks, added a slightly generalized version (commit 446649a6). > src/hotspot/share/opto/lcm.cpp line 437: > >> 435: if (n == nullptr || !n->is_MachTemp()) { >> 436: continue; >> 437: } > > Do you want to check that all other nodes already dominate `block`? This is guaranteed by the input domination test in https://github.com/openjdk/jdk/pull/25066/files#diff-6343a8024ec7abfc1bd5e377ba254ed868d97a99258b5af0aab12ecf8f961503R345-R369, so it feels a bit redundant. Let me know if you still think it would be useful to add the assertion. > src/hotspot/share/opto/lcm.cpp line 439: > >> 437: } >> 438: maybe_hoist_into(n, block); >> 439: } > > It seems to me this is definitely new code, ensuring that we move the `MachTemp`. We did not do that before, at least not here. Correct? That's right. > src/hotspot/share/opto/lcm.cpp line 441: > >> 439: map_node_to_block(n, block); >> 440: } >> 441: } > > This now happens in `move_into`, right? Yes. > src/hotspot/share/opto/machnode.hpp line 391: > >> 389: >> 390: // Whether this node is expanded during code emission into a sequence of >> 391: // instructions and the first instruction can perform an implicit null check. > > You may want to put a warning / reasoning here, in case there are multiple loads. > You explained to me offline that a `zLoadP` may have a load at the beginning, but then need to load again if the GC moved the object. I suppose if it was moved, then it cannot be null, and so that should be safe... maybe that is a sufficient argument, what do you think? In light of our discussion above I am not sure this warning is needed, the key invariant IMO is that the very first instruction emitted should be able to implement the implicit null check. > test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 51: > >> 49: * @requires vm.gc.Z >> 50: * @run driver compiler.gcbarriers.TestImplicitNullChecks Z >> 51: */ > > Do you think there would be any value in having a run without requirements? Just for general result verification... i.e. that we get the correct NullPointerException. > Of course, you would have to probably add `applyIf` to the `@IR` rules. Sure, done (commit 6353f42b). > test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 140: > >> 138: // G1 and ZGC stores cannot be currently used to implement implicit null >> 139: // checks, because they expand into multiple memory access instructions that >> 140: // are not necessarily located at the initial instruction start address. > > Very random idea, no idea if it is any good: > Why not do the implicit null-check with a fake Load? > No idea on the implications here. I suppose it would be extra code, but at least not branching code? Thanks, but given that doing something theoretically more efficient (addressing the limitation and using the stores for implicit null checking as described in https://github.com/openjdk/jdk/pull/25066#issuecomment-2872870543) did not show any benefit in practice I would not expect any benefit either from implementing the null checks with a synthetic load. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087184637 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087184187 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087195811 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087187130 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087191174 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087187408 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087187699 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087188494 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087190800 From rcastanedalo at openjdk.org Tue May 13 16:17:57 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 16:17:57 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v2] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 8 May 2025 15:22:33 GMT, Vladimir Kozlov wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with nine additional commits since the last revision: >> >> - Generalize tests by removing requires annotation and adding local applyIf rules >> - Assert that we do not move control nodes >> - Extend comment about hoisting DecodeN inputs >> - Apply Emanuels suggestions to ensure_node_is_at_block_or_above >> - Rename auxiliary functions >> - Rename auxiliary functions >> - Clarify scope of move_into >> - Extend comment about MachTemp nodes >> - Extract and reuse legitimize_address test > > src/hotspot/share/opto/lcm.cpp line 95: > >> 93: } >> 94: >> 95: void PhaseCFG::maybe_hoist_into(Node* n, Block* b) { > > Consider adding asserts into these 2 new methods to make sure that they operate only on **data** and not control nodes. Thanks, done (commit b198fca8). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087192210 From rcastanedalo at openjdk.org Tue May 13 16:23:56 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 16:23:56 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v2] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: <8NRnIxRrMoiLw2RGUzMuiFjiC35mPs53Kp1IKOWLRuI=.44049e5d-48b2-4aac-abe1-27e7b76d8cc5@github.com> On Thu, 8 May 2025 11:04:26 GMT, Emanuel Peter wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with nine additional commits since the last revision: >> >> - Generalize tests by removing requires annotation and adding local applyIf rules >> - Assert that we do not move control nodes >> - Extend comment about hoisting DecodeN inputs >> - Apply Emanuels suggestions to ensure_node_is_at_block_or_above >> - Rename auxiliary functions >> - Rename auxiliary functions >> - Clarify scope of move_into >> - Extend comment about MachTemp nodes >> - Extract and reuse legitimize_address test > > test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 119: > >> 117: testLoad(o); >> 118: } catch (NullPointerException e) { nullPointerException = true; } >> 119: Asserts.assertTrue(nullPointerException); > > Suggestion: > > try { > testLoad(o); > throw new RuntimeException("Should have thrown NullPointerException"); > } catch (NullPointerException e) { /* expected */} > > Could be a shorter alternative. Up to you. Maybe there is a benefit to `Asserts.assertTrue` I am also not aware of? > But totally optional, as your approach works anyway :) I rather prefer the current version with `Asserts.assertTrue(nullPointerException)`, because it makes the test expectations more explicit (no need for an `/* expected */` comment or similar). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087205670 From qamai at openjdk.org Tue May 13 16:27:00 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 13 May 2025 16:27:00 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v2] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 13 May 2025 16:03:43 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Roberto Casta?eda Lozano has updated the pull request incrementally with nine additional commits since the last revision: > > - Generalize tests by removing requires annotation and adding local applyIf rules > - Assert that we do not move control nodes > - Extend comment about hoisting DecodeN inputs > - Apply Emanuels suggestions to ensure_node_is_at_block_or_above > - Rename auxiliary functions > - Rename auxiliary functions > - Clarify scope of move_into > - Extend comment about MachTemp nodes > - Extract and reuse legitimize_address test src/hotspot/share/opto/output.cpp line 2020: > 2018: assert(access->barrier_data() == 0 || > 2019: access->is_late_expanded_null_check_candidate(), > 2020: "Implicit null checks on memory accesses with barriers are only supported on nodes explicitly marked as null-check candidates"); I assume this is why you want the SIGSEGV instruction to be the first one. Do you think it is better if we mark the whole region and any SIGSEGV from any instruction inside the region will be mapped to this handler. Another way is to make the `MachNode` set the SIGSEGV point themselves. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087211380 From asmehra at openjdk.org Tue May 13 16:27:17 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 13 May 2025 16:27:17 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v5] In-Reply-To: References: Message-ID: <9SgUncbv-K1xeivpenkKeHg0EbjktycNXJp_ThrVfLM=.b2de51b8-468d-41bd-9173-27a22a45b32f@github.com> > [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. > This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. > `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: Update test to make it more resilient Signed-off-by: Ashutosh Mehra ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25019/files - new: https://git.openjdk.org/jdk/pull/25019/files/98e5fa07..c7341cde Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25019&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25019&range=03-04 Stats: 53 lines in 1 file changed: 48 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25019/head:pull/25019 PR: https://git.openjdk.org/jdk/pull/25019 From aph at openjdk.org Tue May 13 16:27:41 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 13 May 2025 16:27:41 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v5] In-Reply-To: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: > This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). > > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op > MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op > MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op > MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op Andrew Haley has updated the pull request incrementally with two additional commits since the last revision: - Copyright - Remove AArch64 exception from native threshold ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25147/files - new: https://git.openjdk.org/jdk/pull/25147/files/23838f8b..87eadb40 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25147&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25147&range=03-04 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25147/head:pull/25147 PR: https://git.openjdk.org/jdk/pull/25147 From aph at openjdk.org Tue May 13 16:27:41 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 13 May 2025 16:27:41 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v2] In-Reply-To: <6iRBKELx-Xnkgfm4MbONISmz9D3SjO_38PlaQXijv7w=.d6cb630c-7e83-4695-9ccc-6cfa30da5e17@github.com> References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> <6iRBKELx-Xnkgfm4MbONISmz9D3SjO_38PlaQXijv7w=.d6cb630c-7e83-4695-9ccc-6cfa30da5e17@github.com> Message-ID: On Mon, 12 May 2025 09:39:48 GMT, Andrew Haley wrote: > > Looking at the improvements made, I suggest we also change (in `SegmentBulkOperations`): > > ``` > > private static final int NATIVE_THRESHOLD_FILL = powerOfPropertyOr("fill", Architecture.isAARCH64() ? 18 : 5); > > ``` > > to > > ``` > > private static final int NATIVE_THRESHOLD_FILL = powerOfPropertyOr("fill", 5); > > ``` > > Possibly so, yes, but I'm still looking at the reasons for the differences. OK, I've done that. Numbers below, and I think that makes the cut between Java code and the intrinsic at the right place. Benchmark (aligned) (size) Mode Cnt Score Error Units MemorySegmentFillUnsafe.panama true 1 avgt 10 1.522 ? 0.004 ns/op MemorySegmentFillUnsafe.panama true 2 avgt 10 1.385 ? 0.003 ns/op MemorySegmentFillUnsafe.panama true 3 avgt 10 1.387 ? 0.003 ns/op MemorySegmentFillUnsafe.panama true 4 avgt 10 1.520 ? 0.004 ns/op MemorySegmentFillUnsafe.panama true 5 avgt 10 1.530 ? 0.007 ns/op MemorySegmentFillUnsafe.panama true 6 avgt 10 1.523 ? 0.003 ns/op MemorySegmentFillUnsafe.panama true 7 avgt 10 1.538 ? 0.010 ns/op MemorySegmentFillUnsafe.panama true 8 avgt 10 1.663 ? 0.014 ns/op MemorySegmentFillUnsafe.panama true 15 avgt 10 1.940 ? 0.005 ns/op MemorySegmentFillUnsafe.panama true 16 avgt 10 1.812 ? 0.007 ns/op MemorySegmentFillUnsafe.panama true 63 avgt 10 2.332 ? 0.005 ns/op MemorySegmentFillUnsafe.panama true 64 avgt 10 2.157 ? 0.009 ns/op MemorySegmentFillUnsafe.panama true 255 avgt 10 3.857 ? 0.057 ns/op MemorySegmentFillUnsafe.panama true 256 avgt 10 3.506 ? 0.015 ns/op MemorySegmentFillUnsafe.panama false 1 avgt 10 1.522 ? 0.005 ns/op MemorySegmentFillUnsafe.panama false 2 avgt 10 1.384 ? 0.004 ns/op MemorySegmentFillUnsafe.panama false 3 avgt 10 1.386 ? 0.004 ns/op MemorySegmentFillUnsafe.panama false 4 avgt 10 1.520 ? 0.004 ns/op MemorySegmentFillUnsafe.panama false 5 avgt 10 1.528 ? 0.009 ns/op MemorySegmentFillUnsafe.panama false 6 avgt 10 1.525 ? 0.004 ns/op MemorySegmentFillUnsafe.panama false 7 avgt 10 1.533 ? 0.003 ns/op MemorySegmentFillUnsafe.panama false 8 avgt 10 1.665 ? 0.004 ns/op MemorySegmentFillUnsafe.panama false 15 avgt 10 1.941 ? 0.004 ns/op MemorySegmentFillUnsafe.panama false 16 avgt 10 1.811 ? 0.008 ns/op MemorySegmentFillUnsafe.panama false 63 avgt 10 2.332 ? 0.006 ns/op MemorySegmentFillUnsafe.panama false 64 avgt 10 2.152 ? 0.005 ns/op MemorySegmentFillUnsafe.panama false 255 avgt 10 5.040 ? 0.041 ns/op MemorySegmentFillUnsafe.panama false 256 avgt 10 4.859 ? 0.040 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/25147#issuecomment-2877178492 From lmesnik at openjdk.org Tue May 13 16:34:53 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 13 May 2025 16:34:53 GMT Subject: RFR: 8356702: CTW: Update modules [v2] In-Reply-To: <3s1wwqmKLRzxZw2FL-s48FvSEcmzlYle_qZBn7YYvOE=.a09bdd82-7186-405a-a076-a2934b9bb3b3@github.com> References: <5_pxWyLzGtPZEDsJKkq6i5wFIemDsY-OeXTgkVO_kuk=.ed16944a-2e41-4c19-a27c-6c1a8269da42@github.com> <3s1wwqmKLRzxZw2FL-s48FvSEcmzlYle_qZBn7YYvOE=.a09bdd82-7186-405a-a076-a2934b9bb3b3@github.com> Message-ID: On Tue, 13 May 2025 07:24:14 GMT, Evgeny Nikitin wrote: >> This PR enhances CTW test wrappers generator in order to make it more user-friendly. Added features are: >> >> 1. Automatic scanning for modules list under `open/src` >> 2. Automatic recognition of current year; >> 3. Multi-wrapper modules support (allows for splitting huge modules into 2 and more wrappers) >> 4. ability to exclude modules; >> >> The updated generator have been used to refresh JTReg module wrappers. >> The most meaningful change is contained in the `generate.bash` >> Testing: `open/test/hotspot/jtreg/applications/ctw/modules` with the supported platforms, no failures spotted. > > Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Update modified wrappers" > > This reverts commit d7122ccbf3b03a3c43917656ad209624910f6230. Assuming that you generates the same files as we have now, looks fine. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25175#pullrequestreview-2837441481 From dlong at openjdk.org Tue May 13 16:48:52 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 13 May 2025 16:48:52 GMT Subject: RFR: 8353638: C2: deoptimization and re-execution cycle with StringBuilder In-Reply-To: References: Message-ID: On Fri, 9 May 2025 14:57:54 GMT, Marc Chevalier wrote: > Unlike what was assumed at first, it is quite different from [JDK-8346989](https://bugs.openjdk.org/browse/JDK-8346989). The problem is actually unrelated to `StringBuilder`, but has to do with the underlying array allocation. > > Here, the problem is that the array allocation function, that is throwing when given a negative length, causes a deopt rather than using the compiled exception handlers. This is an old workaround, and the flag `StressCompiledExceptionHandlers` to rather use compiled handlers instead of deopting was added in [JDK-8004741](https://bugs.openjdk.org/browse/JDK-8004741) in 2012. This flag is used in testing since october 2022. > > So maybe it's time to use the compiled exception handlers! I propose to turn them on by default, and instead, add a diagnostic flag to deopt instead, in case something goes wrong. Doing so improve the performance to match the ones of C1 (both for direct array allocation, and `StringBuilder` construction). For instance, with the case given in the JBS issue: > > Stop at level 0 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,277s > user 0m4,214s > sys 0m0,117s > > Stop at level 1 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,104s > user 0m4,079s > sys 0m0,106s > > Stop at level 2 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,308s > user 0m4,239s > sys 0m0,145s > > Stop at level 3 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,304s > user 0m4,247s > sys 0m0,132s > > Default (Stop at level 4) > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,086s > user 0m4,059s > sys 0m0,122s > > > > I've run some tests (up to tier10), it seems all fine, ignoring the usual noise. I've checked with @dougxc, it shouldn't impact Graal as it doesn't use `OptoRuntime`. This change looks good, but do we still need StressCompiledExceptionHandlers after this? The only other use is in JavaThread::handle_async_exception, and it looks like it should be change too because it's dealing with the same issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25149#issuecomment-2877267621 From mchevalier at openjdk.org Tue May 13 17:07:52 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 13 May 2025 17:07:52 GMT Subject: RFR: 8353638: C2: deoptimization and re-execution cycle with StringBuilder In-Reply-To: References: Message-ID: On Fri, 9 May 2025 14:57:54 GMT, Marc Chevalier wrote: > Unlike what was assumed at first, it is quite different from [JDK-8346989](https://bugs.openjdk.org/browse/JDK-8346989). The problem is actually unrelated to `StringBuilder`, but has to do with the underlying array allocation. > > Here, the problem is that the array allocation function, that is throwing when given a negative length, causes a deopt rather than using the compiled exception handlers. This is an old workaround, and the flag `StressCompiledExceptionHandlers` to rather use compiled handlers instead of deopting was added in [JDK-8004741](https://bugs.openjdk.org/browse/JDK-8004741) in 2012. This flag is used in testing since october 2022. > > So maybe it's time to use the compiled exception handlers! I propose to turn them on by default, and instead, add a diagnostic flag to deopt instead, in case something goes wrong. Doing so improve the performance to match the ones of C1 (both for direct array allocation, and `StringBuilder` construction). For instance, with the case given in the JBS issue: > > Stop at level 0 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,277s > user 0m4,214s > sys 0m0,117s > > Stop at level 1 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,104s > user 0m4,079s > sys 0m0,106s > > Stop at level 2 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,308s > user 0m4,239s > sys 0m0,145s > > Stop at level 3 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,304s > user 0m4,247s > sys 0m0,132s > > Default (Stop at level 4) > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,086s > user 0m4,059s > sys 0m0,122s > > > > I've run some tests (up to tier10), it seems all fine, ignoring the usual noise. I've checked with @dougxc, it shouldn't impact Graal as it doesn't use `OptoRuntime`. I've found a failing test (maybe even two) related to asynchronous exceptions when using this flag. I suspect something is wrong there ([JDK-8356648](https://bugs.openjdk.org/browse/JDK-8356648)). It's also not clear to me whether the failure is directly linked to faulty exceptions handlers: asynchronous exceptions have a complicated (to me) handshake mechanism and maybe it's faulty without the deopts (or deopts makes it less likely). Unlike in [JDK-8004741](https://bugs.openjdk.org/browse/JDK-8004741) (that added `StressCompiledExceptionHandlers`), I'm not hitting the assert "missing exception handler" (that still exists). Aside from being guarded by the same flag, the relation is not clear to me. For allocations, I couldn't find any problem, and the logic seems simpler. I think it's fine to still use the one that works. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25149#issuecomment-2877316276 From rcastanedalo at openjdk.org Tue May 13 17:24:59 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 17:24:59 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v2] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 13 May 2025 16:24:25 GMT, Quan Anh Mai wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with nine additional commits since the last revision: >> >> - Generalize tests by removing requires annotation and adding local applyIf rules >> - Assert that we do not move control nodes >> - Extend comment about hoisting DecodeN inputs >> - Apply Emanuels suggestions to ensure_node_is_at_block_or_above >> - Rename auxiliary functions >> - Rename auxiliary functions >> - Clarify scope of move_into >> - Extend comment about MachTemp nodes >> - Extract and reuse legitimize_address test > > src/hotspot/share/opto/output.cpp line 2020: > >> 2018: assert(access->barrier_data() == 0 || >> 2019: access->is_late_expanded_null_check_candidate(), >> 2020: "Implicit null checks on memory accesses with barriers are only supported on nodes explicitly marked as null-check candidates"); > > I assume this is why you want the SIGSEGV instruction to be the first one. Do you think it is better if we mark the whole region and any SIGSEGV from any instruction inside the region will be mapped to this handler. Another way is to make the `MachNode` set the SIGSEGV point themselves. Thanks, both could be done, but require non-trivial changes to the exception table building logic for no apparent benefit. I actually prototyped your second suggestion [here](https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-implicit-null-checks) some time ago so that I could also use ZGC and G1 writes as implicit null checks, but the experiments did not show any performance benefit that could justify the additional complexity. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2087316762 From rcastanedalo at openjdk.org Tue May 13 17:40:40 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 17:40:40 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v3] In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Extend comments in zLoadP implementations to explain role of reload ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25066/files - new: https://git.openjdk.org/jdk/pull/25066/files/6353f42b..20d960e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=01-02 Stats: 12 lines in 2 files changed: 8 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25066.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25066/head:pull/25066 PR: https://git.openjdk.org/jdk/pull/25066 From rcastanedalo at openjdk.org Tue May 13 17:44:53 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 17:44:53 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 13 May 2025 08:53:08 GMT, Roberto Casta?eda Lozano wrote: >> @robcasloz Thanks for the explanations! >> I have no idea how the GC barriers work, and what addresses they load from. So I just had a list of questions run through my mind, about what could possibly go wrong. But the questions are more speculations, because I really have no idea what the GC barriers do. >> >> I think I need to have a look at the GC barrier code myself, to see which things are constant and which things can be mutated (possibly by another thread). What code / documentation do you recommend I look at? >> >> Ideally, we would have some sort of semi-formal proof, to guarantee that if we did ever encounter a null-pointer, we would have to encounter it already on that first load. > >> I think I need to have a look at the GC barrier code myself, to see which things are constant and which things can be mutated (possibly by another thread). What code / documentation do you recommend I look at? > > Regarding code, I recommend you starting [here](https://github.com/openjdk/jdk/blob/522c7b446fef17a8400bc589c55b161e939770cc/src/hotspot/cpu/x86/gc/z/z_x86_64.ad#L126-L129) and following `z_load_barrier`. The slow barrier path is generated in a stub [here](https://github.com/openjdk/jdk/blob/522c7b446fef17a8400bc589c55b161e939770cc/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp#L1217-L1235). > > Regarding documentation, you might have a look at the [TOPLAS paper](https://dl.acm.org/doi/full/10.1145/3538532) (which is unfortunately a bit outdated because it only covers non-generational ZGC, but might still offer some intuition that is valid for the latest ZGC version, in particular regarding concurrent relocation and load barriers), the [Generational ZGC JEP](https://openjdk.org/jeps/439), or one of the numerous presentations available on YouTube (e.g. I found the overview in https://www.youtube.com/watch?v=YyXjC68l8mw&t=864s pretty useful). > @robcasloz Alright, to me this sounds convincing. I suggest you add a comment about this assumption, i.e. that the address we load from is always the same. Thanks, I added comments to the zLoadP implementations (commit 20d960e6). > And then let a GC engineer have a look at this PR, to confirm that this assumption is always correct, and that there is not some other path where the address could change ;) Absolutely, after getting approval from the compiler side, I will request a formal review from the GC side. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2877434993 From rcastanedalo at openjdk.org Tue May 13 17:44:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 13 May 2025 17:44:54 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 13 May 2025 09:56:47 GMT, Emanuel Peter wrote: >>> I think I need to have a look at the GC barrier code myself, to see which things are constant and which things can be mutated (possibly by another thread). What code / documentation do you recommend I look at? >> >> Regarding code, I recommend you starting [here](https://github.com/openjdk/jdk/blob/522c7b446fef17a8400bc589c55b161e939770cc/src/hotspot/cpu/x86/gc/z/z_x86_64.ad#L126-L129) and following `z_load_barrier`. The slow barrier path is generated in a stub [here](https://github.com/openjdk/jdk/blob/522c7b446fef17a8400bc589c55b161e939770cc/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp#L1217-L1235). >> >> Regarding documentation, you might have a look at the [TOPLAS paper](https://dl.acm.org/doi/full/10.1145/3538532) (which is unfortunately a bit outdated because it only covers non-generational ZGC, but might still offer some intuition that is valid for the latest ZGC version, in particular regarding concurrent relocation and load barriers), the [Generational ZGC JEP](https://openjdk.org/jeps/439), or one of the numerous presentations available on YouTube (e.g. I found the overview in https://www.youtube.com/watch?v=YyXjC68l8mw&t=864s pretty useful). > > @robcasloz Alright, to me this sounds convincing. I suggest you add a comment about this assumption, i.e. that the address we load from is always the same. And then let a GC engineer have a look at this PR, to confirm that this assumption is always correct, and that there is not some other path where the address could change ;) @eme64 @vnkozlov Thank you for your thorough comments and suggestions, I believe I have addressed all of them in the latest version. Please re-review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2877438612 From asmehra at openjdk.org Tue May 13 17:48:58 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 13 May 2025 17:48:58 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v4] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 23:04:54 GMT, Vladimir Kozlov wrote: >> Ashutosh Mehra has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove more unused code >> >> Signed-off-by: Ashutosh Mehra >> - Fix whitespace issue. Remove unused code. >> >> Signed-off-by: Ashutosh Mehra > > src/hotspot/share/code/aotCodeCache.hpp line 374: > >> 372: >> 373: static bool is_dumping_stubs() NOT_CDS_RETURN_(false); >> 374: static bool is_using_stubs() NOT_CDS_RETURN_(false); > > We have singular naming (`is_dumping_stub()`) for these methods in `premain` branch. > I was debating to do separate RFE for renaming in mainline or may be you can do it here. > It is up to you. > I did not pay attention to these when I work on adapter caching. But now I have to merge from mainline to premain and I noticed difference. I will make this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2087351871 From kvn at openjdk.org Tue May 13 17:58:57 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 13 May 2025 17:58:57 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v3] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 13 May 2025 17:40:40 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Extend comments in zLoadP implementations to explain role of reload Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25066#pullrequestreview-2837672919 From asmehra at openjdk.org Tue May 13 18:03:09 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 13 May 2025 18:03:09 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v6] In-Reply-To: References: Message-ID: > [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. > This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. > `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - Merge branch 'master' into preserve-runtime-blobs-master - Address Vladimir's comments Signed-off-by: Ashutosh Mehra - Update test to make it more resilient Signed-off-by: Ashutosh Mehra - Remove more unused code Signed-off-by: Ashutosh Mehra - Fix whitespace issue. Remove unused code. Signed-off-by: Ashutosh Mehra - Add test for using AOTCodeCache with different CompressedOops configuration Signed-off-by: Ashutosh Mehra - Add check for compressed oops base address; minor refacotring Signed-off-by: Ashutosh Mehra - Merge branch 'master' into preserve-runtime-blobs-master - Address Vladimir's comments Signed-off-by: Ashutosh Mehra - Remove irrelevant comment Signed-off-by: Ashutosh Mehra - ... and 8 more: https://git.openjdk.org/jdk/compare/d1543429...5d7c3aa0 ------------- Changes: https://git.openjdk.org/jdk/pull/25019/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25019&range=05 Stats: 1360 lines in 31 files changed: 1100 ins; 125 del; 135 mod Patch: https://git.openjdk.org/jdk/pull/25019.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25019/head:pull/25019 PR: https://git.openjdk.org/jdk/pull/25019 From asmehra at openjdk.org Tue May 13 18:07:59 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 13 May 2025 18:07:59 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: <8W_FRkLbamdZ6l0Lkbn8WqXv_JXPjG-i5hBus2foor4=.4f80cd55-4141-46ff-8436-0cbbc9228461@github.com> Message-ID: On Mon, 12 May 2025 23:07:02 GMT, Vladimir Kozlov wrote: >>> I think for these changes we should not use AOT code when the heap base does not match. >> Something changed in compressed oops code which prevents enforcing encoding. >> We can investigate and fix it later. >> >> @vnkozlov for this PR we are relying on having relocation for COOP base, not on enforcing encoding. And that should be able to handle cases where heap base is different in assembly vs prod. Why do you suggest to not use AOT code when the heap base does not match? > > @ashu-mehra, this looks good with few comments. After you address them, please merge latest jdk - I pushed small related change to limit platforms to run with AOT. > > After that I will submit new testing. @vnkozlov addressed your comments. I also noticed the newly added test `AOTCodeCompressedOopsTest` was consistently failing on macos-aarch64 because for some reason the CompressedOops::shift=3 even for heap size as low as for Xmx128m. So I have updated the test to make it more resilient by reading the 'actual' base and shift from Xlog:cds output. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2877494411 From mli at openjdk.org Tue May 13 19:46:54 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 13 May 2025 19:46:54 GMT Subject: RFR: 8356869: RISC-V: Improve tail handling of array fill stub In-Reply-To: References: Message-ID: On Tue, 13 May 2025 12:53:40 GMT, Anjian-Wen wrote: > The tail handling after bulk copy in array fill stub may trigger misaligned memory accesses. > The address is 8-byte aligned after bulk copy and the tail handling copies BYTE, SHORT, and > INT granules in order. This could trigger misaligned accesses. We should copy the remainings > in this order: INT, SHORT, and BYTE to avoid such an issue. > > JMH data on P550 SBC for reference (@Param("15") private int size): > > Before: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 15 avgt 12 961.604 ? 1.497 ns/op > ArrayFill.fillIntArray 15 avgt 12 29.355 ? 0.024 ns/op > ArrayFill.fillShortArray 15 avgt 12 569.499 ? 0.662 ns/op > ArrayFill.zeroByteArray 15 avgt 12 957.080 ? 5.358 ns/op > ArrayFill.zeroIntArray 15 avgt 12 29.344 ? 0.006 ns/op > ArrayFill.zeroShortArray 15 avgt 12 569.730 ? 0.441 ns/op > > After: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 15 avgt 12 32.206 ? 0.005 ns/op > ArrayFill.fillIntArray 15 avgt 12 29.347 ? 0.007 ns/op > ArrayFill.fillShortArray 15 avgt 12 31.732 ? 0.451 ns/op > ArrayFill.zeroByteArray 15 avgt 12 32.208 ? 0.007 ns/op > ArrayFill.zeroIntArray 15 avgt 12 29.346 ? 0.007 ns/op > ArrayFill.zeroShortArray 15 avgt 12 31.492 ? 0.006 ns/op Nice catch and fix. Looks good. Can you add the test case `@param("15")` and maybe some more to ArrayFill.java? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25210#issuecomment-2877756163 From dlong at openjdk.org Tue May 13 20:36:53 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 13 May 2025 20:36:53 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock [v3] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 09:25:43 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> This PR addresses an `assert(bb->is_reachable())` that is triggered in the code for `-XX:+VerifyStack` after a deoptimization with reason `null_assert_or_unreached0` at a `getstatic` bytecode. Following the `getstatic` is an `areturn` and then an unreachable bytecode. When the code for `VerifyStack` tries to compute an oop map for the basic block of the unreachable bytecode, the assert triggers: >> >> getstatic Field A.val:"LB"; // if class B is not loaded, C2 deopts with reason "null_assert_or_unreached0" >> areturn; >> // The following is unreachable >> iconst_0; >> >> >> This is a similar problem to [JDK-8271055](https://bugs.openjdk.org/browse/JDK-8271055) (#7331), but this particular deopt with reason `null_assert_or_unreached0` at `getstatic` of a field containing an object reference [deopts at the next bytecode](https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/opto/parse3.cpp#L176-L199). The aforementioned issue introduced a check to skip stack verification of the next bytecode in the code if the execution after the deopted bytecode does not continue at the next bytecode in the code, i.e. falls through to the next bytecode. Unfortunately, this check did not include `areturn` as a bytecode that does not fall-through: >> https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/runtime/deoptimization.cpp#L845-L856 >> >> # Change Summary >> >> To fix the immediate issue described above, this PR adds `areturn` to the list of bytecodes that does not fall through. However, all return bytecodes exhibit the same behavior and might be susceptible to a similar issue. Even though I was not able to reproduce the same crash with `{d,f,i,l}return` because I could not get those or the preceding bytecode to deopt, I also added them to the `falls_through()` function. For the remaining bytecodes in `falls_through()` with the exception of `athrow` I wrote a regression test. >> >> # Testing >> >> - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14595928439) >> - [x] tier1 through tier3 on Oracle supported platforms and OSs plus Oracle internal testing >> >> # Acknowledgements >> Special thanks to @eme64 for his hard work on reducing a reproducer that works on all platforms. > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Add jsr to falls_through() Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25118#pullrequestreview-2838054894 From dlong at openjdk.org Tue May 13 20:36:53 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 13 May 2025 20:36:53 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock [v3] In-Reply-To: References: Message-ID: On Tue, 13 May 2025 12:39:15 GMT, Manuel H?ssig wrote: >> Ah, I think there is a misunderstanding: >> I am saying that `if_cmp` does not always continue. Your statement seems to suggest that all the ones you return `true` for "continue at the next bytecode". That is missing some nuance. They `can` continue to there, but they do not always. I'm just asking for the wording to be more precise. >> >> You may even want to change the name of the whole function. `falls_through` suggests that they would always fall through. But you are rather asking for "does not continue at next bci", `has_no_fallthrough` or similar. >> >> I leave it up to you if / what you want to do here :) > > Ah, I got confused again between "should be in `falls_through()`" and "can trigger this assert". `if_cmp` cannot trigger the assert, but could be in `falls_through()`. Since it can not fail as per our current understanding and because it does not fit the current semantics of `falls_through()` I would opt to leave it as it is. Yes, the function name and comment aren't as precise as they could be. "parsing" might have been a better word than "execution", for example. But I think the function name is OK as it is. We use variants of this naming elsewhere for exactly the same purpose: https://github.com/openjdk/jdk/blob/e7ce661adb01fba4bb690d51cc2858c822008654/src/hotspot/share/oops/generateOopMap.cpp#L1159 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25118#discussion_r2087587281 From iveresov at openjdk.org Tue May 13 20:42:07 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 13 May 2025 20:42:07 GMT Subject: RFR: 8356885: Don't emit C1 profiling for casts if TypeProfileCasts is off Message-ID: Make sure we don't try to emit cast type profiling if TypeProfileCast == false ------------- Commit messages: - Harden the test - Check TypeProfileCast before emitting type profiling for casts Changes: https://git.openjdk.org/jdk/pull/25218/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25218&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356885 Stats: 90 lines in 2 files changed: 87 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25218.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25218/head:pull/25218 PR: https://git.openjdk.org/jdk/pull/25218 From vlivanov at openjdk.org Tue May 13 20:42:07 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 13 May 2025 20:42:07 GMT Subject: RFR: 8356885: Don't emit C1 profiling for casts if TypeProfileCasts is off In-Reply-To: References: Message-ID: On Tue, 13 May 2025 20:30:54 GMT, Igor Veresov wrote: > Make sure we don't try to emit cast type profiling if TypeProfileCast == false Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25218#pullrequestreview-2838063598 From kvn at openjdk.org Tue May 13 20:44:56 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 13 May 2025 20:44:56 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v6] In-Reply-To: References: Message-ID: <4S-Dtf0DIhtmch16UNF_ZpmnEZmRG1HsHukz6WxOkvs=.2a258cbb-79ec-4531-b308-6789bdda2a52@github.com> On Tue, 13 May 2025 18:03:09 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Update test to make it more resilient > > Signed-off-by: Ashutosh Mehra > - Remove more unused code > > Signed-off-by: Ashutosh Mehra > - Fix whitespace issue. Remove unused code. > > Signed-off-by: Ashutosh Mehra > - Add test for using AOTCodeCache with different CompressedOops > configuration > > Signed-off-by: Ashutosh Mehra > - Add check for compressed oops base address; minor refacotring > > Signed-off-by: Ashutosh Mehra > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - ... and 8 more: https://git.openjdk.org/jdk/compare/d1543429...5d7c3aa0 Looks good. I submitted testing. ------------- PR Review: https://git.openjdk.org/jdk/pull/25019#pullrequestreview-2838072325 From kvn at openjdk.org Tue May 13 21:14:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 13 May 2025 21:14:54 GMT Subject: RFR: 8356885: Don't emit C1 profiling for casts if TypeProfileCasts is off In-Reply-To: References: Message-ID: On Tue, 13 May 2025 20:30:54 GMT, Igor Veresov wrote: > Make sure we don't try to emit cast type profiling if TypeProfileCast == false Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25218#pullrequestreview-2838140864 From iveresov at openjdk.org Tue May 13 21:53:05 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 13 May 2025 21:53:05 GMT Subject: Integrated: 8356885: Don't emit C1 profiling for casts if TypeProfileCasts is off In-Reply-To: References: Message-ID: On Tue, 13 May 2025 20:30:54 GMT, Igor Veresov wrote: > Make sure we don't try to emit cast type profiling if TypeProfileCast == false This pull request has now been integrated. Changeset: 89242eec Author: Igor Veresov URL: https://git.openjdk.org/jdk/commit/89242eecd2f381608f78bd8c431eca389956e79a Stats: 90 lines in 2 files changed: 87 ins; 0 del; 3 mod 8356885: Don't emit C1 profiling for casts if TypeProfileCasts is off Reviewed-by: vlivanov, kvn ------------- PR: https://git.openjdk.org/jdk/pull/25218 From iveresov at openjdk.org Tue May 13 22:37:55 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 13 May 2025 22:37:55 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v18] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request incrementally with 34 additional commits since the last revision: - Address Ioi's comments - 8356885: Don't emit C1 profiling for casts if TypeProfileCasts is off Reviewed-by: vlivanov, kvn - 8352755: Misconceptions about j.text.DecimalFormat digits during parsing Reviewed-by: naoto - 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() Reviewed-by: wkemper - 8356819: [macos] MacSign should use "openssl" and "faketime" from Homebrew by default Reviewed-by: asemenyuk - 8356107: [java.lang] Use @requires tag instead of exiting based on os.name or separatorChar property Reviewed-by: naoto, bpb - 8356447: Change default for EagerJVMCI to true Reviewed-by: yzheng, kvn, never - 8351415: (fs) Path::toAbsolutePath should specify if an absolute path has a root component Reviewed-by: alanb - 8356551: Javac rejects receiver parameter in constructor of local class in early construction context Reviewed-by: mcimadamore - 8355992: Add unsignedMultiplyExact and *powExact methods to Math and StrictMath Reviewed-by: darcy - ... and 24 more: https://git.openjdk.org/jdk/compare/da4a3420...72030a30 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24886/files - new: https://git.openjdk.org/jdk/pull/24886/files/da4a3420..72030a30 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=16-17 Stats: 7451 lines in 145 files changed: 4269 ins; 1931 del; 1251 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From iveresov at openjdk.org Tue May 13 22:40:42 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 13 May 2025 22:40:42 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v19] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 82 commits: - Merge branch 'master' into pp2 - Address Ioi's comments - 8356885: Don't emit C1 profiling for casts if TypeProfileCasts is off Reviewed-by: vlivanov, kvn - 8352755: Misconceptions about j.text.DecimalFormat digits during parsing Reviewed-by: naoto - 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() Reviewed-by: wkemper - 8356819: [macos] MacSign should use "openssl" and "faketime" from Homebrew by default Reviewed-by: asemenyuk - 8356107: [java.lang] Use @requires tag instead of exiting based on os.name or separatorChar property Reviewed-by: naoto, bpb - 8356447: Change default for EagerJVMCI to true Reviewed-by: yzheng, kvn, never - 8351415: (fs) Path::toAbsolutePath should specify if an absolute path has a root component Reviewed-by: alanb - 8356551: Javac rejects receiver parameter in constructor of local class in early construction context Reviewed-by: mcimadamore - ... and 72 more: https://git.openjdk.org/jdk/compare/10dcdf1b...1669f900 ------------- Changes: https://git.openjdk.org/jdk/pull/24886/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=18 Stats: 3330 lines in 59 files changed: 3116 ins; 100 del; 114 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From iveresov at openjdk.org Tue May 13 22:44:55 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 13 May 2025 22:44:55 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v14] In-Reply-To: References: Message-ID: On Tue, 13 May 2025 07:42:33 GMT, Ioi Lam wrote: >> Do you want me to leave the existing `log_info` alone? Or should I fix everything in `FileMapHeader::validate()` ? > > You can leave the existing code and just fix the new code you added. Done. And changed the test. Please take a look. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2087730827 From iklam at openjdk.org Tue May 13 23:15:57 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 13 May 2025 23:15:57 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v19] In-Reply-To: References: Message-ID: On Tue, 13 May 2025 22:40:42 GMT, Igor Veresov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 82 commits: > > - Merge branch 'master' into pp2 > - Address Ioi's comments > - 8356885: Don't emit C1 profiling for casts if TypeProfileCasts is off > > Reviewed-by: vlivanov, kvn > - 8352755: Misconceptions about j.text.DecimalFormat digits during parsing > > Reviewed-by: naoto > - 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() > > Reviewed-by: wkemper > - 8356819: [macos] MacSign should use "openssl" and "faketime" from Homebrew by default > > Reviewed-by: asemenyuk > - 8356107: [java.lang] Use @requires tag instead of exiting based on os.name or separatorChar property > > Reviewed-by: naoto, bpb > - 8356447: Change default for EagerJVMCI to true > > Reviewed-by: yzheng, kvn, never > - 8351415: (fs) Path::toAbsolutePath should specify if an absolute path has a root component > > Reviewed-by: alanb > - 8356551: Javac rejects receiver parameter in constructor of local class in early construction context > > Reviewed-by: mcimadamore > - ... and 72 more: https://git.openjdk.org/jdk/compare/10dcdf1b...1669f900 test/hotspot/jtreg/runtime/cds/appcds/aotProfile/AOTProfileFlags.java line 115: > 113: > 114: out = CDSTestUtils.executeAndLog(pb, "production_failure"); > 115: out.shouldContain("does not equal"); Since all the flags have `Profile` in them, I think we should use this to match the intended output: String errorPattern = "Profile.* setting .* does not equal the current .*Profile.* setting"; out.shouldNotMatch(errorPattern); ... out.shouldMatch(errorPattern); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2087753972 From iveresov at openjdk.org Tue May 13 23:15:57 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 13 May 2025 23:15:57 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v19] In-Reply-To: References: Message-ID: <-Y1WtA0iFbtvlUbZSni93xpK2TnribKtD-Hfl7YVML4=.85c93976-d723-40a2-b382-238c56a57148@github.com> On Tue, 13 May 2025 23:10:53 GMT, Ioi Lam wrote: >> Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 82 commits: >> >> - Merge branch 'master' into pp2 >> - Address Ioi's comments >> - 8356885: Don't emit C1 profiling for casts if TypeProfileCasts is off >> >> Reviewed-by: vlivanov, kvn >> - 8352755: Misconceptions about j.text.DecimalFormat digits during parsing >> >> Reviewed-by: naoto >> - 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() >> >> Reviewed-by: wkemper >> - 8356819: [macos] MacSign should use "openssl" and "faketime" from Homebrew by default >> >> Reviewed-by: asemenyuk >> - 8356107: [java.lang] Use @requires tag instead of exiting based on os.name or separatorChar property >> >> Reviewed-by: naoto, bpb >> - 8356447: Change default for EagerJVMCI to true >> >> Reviewed-by: yzheng, kvn, never >> - 8351415: (fs) Path::toAbsolutePath should specify if an absolute path has a root component >> >> Reviewed-by: alanb >> - 8356551: Javac rejects receiver parameter in constructor of local class in early construction context >> >> Reviewed-by: mcimadamore >> - ... and 72 more: https://git.openjdk.org/jdk/compare/10dcdf1b...1669f900 > > test/hotspot/jtreg/runtime/cds/appcds/aotProfile/AOTProfileFlags.java line 115: > >> 113: >> 114: out = CDSTestUtils.executeAndLog(pb, "production_failure"); >> 115: out.shouldContain("does not equal"); > > Since all the flags have `Profile` in them, I think we should use this to match the intended output: > > > String errorPattern = "Profile.* setting .* does not equal the current .*Profile.* setting"; > out.shouldNotMatch(errorPattern); > ... > out.shouldMatch(errorPattern); `SpecTrapLimitExtraEntries` does not. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2087755889 From iveresov at openjdk.org Tue May 13 23:46:42 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 13 May 2025 23:46:42 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v20] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Address Ioi's comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24886/files - new: https://git.openjdk.org/jdk/pull/24886/files/1669f900..fd26cfe4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=18-19 Stats: 12 lines in 1 file changed: 2 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From iklam at openjdk.org Tue May 13 23:46:42 2025 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 13 May 2025 23:46:42 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v20] In-Reply-To: References: Message-ID: On Tue, 13 May 2025 23:43:01 GMT, Igor Veresov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: > > Address Ioi's comments updates are good! ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24886#pullrequestreview-2838347038 From iveresov at openjdk.org Tue May 13 23:46:42 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 13 May 2025 23:46:42 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v19] In-Reply-To: <-Y1WtA0iFbtvlUbZSni93xpK2TnribKtD-Hfl7YVML4=.85c93976-d723-40a2-b382-238c56a57148@github.com> References: <-Y1WtA0iFbtvlUbZSni93xpK2TnribKtD-Hfl7YVML4=.85c93976-d723-40a2-b382-238c56a57148@github.com> Message-ID: On Tue, 13 May 2025 23:13:42 GMT, Igor Veresov wrote: >> test/hotspot/jtreg/runtime/cds/appcds/aotProfile/AOTProfileFlags.java line 115: >> >>> 113: >>> 114: out = CDSTestUtils.executeAndLog(pb, "production_failure"); >>> 115: out.shouldContain("does not equal"); >> >> Since all the flags have `Profile` in them, I think we should use this to match the intended output: >> >> >> String errorPattern = "Profile.* setting .* does not equal the current .*Profile.* setting"; >> out.shouldNotMatch(errorPattern); >> ... >> out.shouldMatch(errorPattern); > > `SpecTrapLimitExtraEntries` does not. Fixed. Take a look. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24886#discussion_r2087775255 From fyang at openjdk.org Wed May 14 00:38:51 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 14 May 2025 00:38:51 GMT Subject: RFR: 8350960: RISC-V: Add riscv backend for Float16 operations - vectorization [v2] In-Reply-To: References: Message-ID: <4-yFRcNLDHjzdKZ_WK_wHjEr49ADQBoOuQ1u8doVB08=.0bb9788d-ee5e-4999-a687-bdc1f34f4f3f@github.com> On Tue, 13 May 2025 13:26:34 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/riscv_v.ad line 382: >> >>> 380: ins_encode %{ >>> 381: assert(UseZvfh, "must"); >>> 382: BasicType bt = Matcher::vector_element_basic_type(this); >> >> Question: What is `bt` calculated here? Seems there isn't one for HF16 in `enum BasicType` definition in file src/hotspot/share/utilities/globalDefinitions.hpp. I only see `T_FLOAT` and `T_DOUBLE`, which I don't think is usable here as we need to set SEW=16 for this instruction. > > No, it uses T_SHORT instead, in Float16.java it also uses a short as underlying payload. > And if you check the generated assembly code, you'll find some code like `vsetivli t0,16,e16,m1,tu,mu`. > > To avoid confusion, I will add an assertion here so that it can be understood later. Thanks for the answer. I see that is also reflected on the C2 source code [1]. Why not save this `Matcher::vector_element_basic_type(this)` call then? I mean: assert(Matcher::vector_element_basic_type(this) == T_SHORT, "must"); __ vsetvli_helper(T_SHORT, Matcher::vector_length(this)); [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectornode.cpp#L63 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25181#discussion_r2087822782 From haosun at openjdk.org Wed May 14 01:02:54 2025 From: haosun at openjdk.org (Hao Sun) Date: Wed, 14 May 2025 01:02:54 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v2] In-Reply-To: References: Message-ID: On Tue, 13 May 2025 12:49:32 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend (both Neon and SVE) for FP16 vector operations - add, mul, sub, div, min, max, sqrt and fma. >> >> Testing: >> JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) pass on aarch64 which also includes the JTREG test to test the FP16 vector operations - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments LGTM except one nit. src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 553: > 551: > 552: // vector add - predicated > 553: BINARY_OP_PREDICATE(vaddB, AddVB, sve_add, B) stylistic nit: remove the extra spaces. In the initial commit, this extra space is added due to `vaddHF`. However, in the latest commit the predicated Float16 rules are removed. ------------- PR Review: https://git.openjdk.org/jdk/pull/25096#pullrequestreview-2838450534 PR Review Comment: https://git.openjdk.org/jdk/pull/25096#discussion_r2087854380 From xgong at openjdk.org Wed May 14 01:35:50 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 14 May 2025 01:35:50 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v2] In-Reply-To: References: Message-ID: <7fFflMD9iyjyj_v2aGbJH9BD5ZzvHu7wW_NBeos2XBc=.451a7884-e077-459b-835b-d224a433ca48@github.com> On Tue, 13 May 2025 12:49:32 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend (both Neon and SVE) for FP16 vector operations - add, mul, sub, div, min, max, sqrt and fma. >> >> Testing: >> JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) pass on aarch64 which also includes the JTREG test to test the FP16 vector operations - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments LGTM! Thanks! ------------- Marked as reviewed by xgong (Committer). PR Review: https://git.openjdk.org/jdk/pull/25096#pullrequestreview-2838483252 From xgong at openjdk.org Wed May 14 02:29:50 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 14 May 2025 02:29:50 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API In-Reply-To: References: Message-ID: <2os9gEZk7LXqHQkJ-FTv_8jsGrT9Cy0lAVDYLPgHst0=.6ea1f0d0-67a3-4a6a-9b47-5cb753c3d668@github.com> On Fri, 9 May 2025 07:35:41 GMT, Xiaohong Gong wrote: > JDK-8318650 introduced hotspot intrinsification of subword gather load APIs for X86 platforms [1]. However, the current implementation is not optimal for AArch64 SVE platform, which natively supports vector instructions for subword gather load operations using an int vector for indices (see [2][3]). > > Two key areas require improvement: > 1. At the Java level, vector indices generated for range validation could be reused for the subsequent gather load operation on architectures with native vector instructions like AArch64 SVE. However, the current implementation prevents compiler reuse of these index vectors due to divergent control flow, potentially impacting performance. > 2. At the compiler IR level, the additional `offset` input for `LoadVectorGather`/`LoadVectorGatherMasked` with subword types increases IR complexity and complicates backend implementation. Furthermore, generating `add` instructions before each memory access negatively impacts performance. > > This patch refactors the implementation at both the Java level and compiler mid-end to improve efficiency and maintainability across different architectures. > > Main changes: > 1. Java-side API refactoring: > - Explicitly passes generated index vectors to hotspot, eliminating duplicate index vectors for gather load instructions on > architectures like AArch64. > 2. C2 compiler IR refactoring: > - Refactors `LoadVectorGather`/`LoadVectorGatherMasked` IR for subword types by removing the memory offset input and incorporating it into the memory base `addr` at the IR level. This simplifies backend implementation, reduces add operations, and unifies the IR across all types. > 3. Backend changes: > - Streamlines X86 implementation of subword gather operations following the removal of the offset input from the IR level. > > Performance: > The performance of the relative JMH improves up to 27% on a X86 AVX512 system. Please see the data below: > > Benchmark Mode Cnt Unit SIZE Before After Gain > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 64 53682.012 52650.325 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 256 14484.252 14255.156 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 1024 3664.900 3595.615 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 4096 908.312 935.269 1.02 > GatherOperationsBenchmark.micr... Hi, could anyone please take a look at this PR? Thanks a lot in advance! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-2878450580 From duke at openjdk.org Wed May 14 02:44:14 2025 From: duke at openjdk.org (erifan) Date: Wed, 14 May 2025 02:44:14 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6] In-Reply-To: References: Message-ID: > This patch optimizes the following patterns: > For integer types: > > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. > > For float and double types: > > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: > > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 > testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 > testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 > testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 > testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 > testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 > testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 > testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 > testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 > testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 > testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 > testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 > testCompareLTMaskNotInt ops/s 1672180.09 995.238142 2353757.863 853.774734 1.4 > testCompareLTMaskNotLong ops/s 856502.26... erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Refactor the JTReg tests for compare.xor(maskAll) Also made a bit change to support pattern `VectorMask.fromLong()`. - Merge branch 'master' into JDK-8354242 - Refactor code Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this optimization, making the code more modular. - Merge branch 'master' into JDK-8354242 - Update the jtreg test - Merge branch 'master' into JDK-8354242 - Addressed some review comments 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. 2. Improve code comments. - Merge branch 'master' into JDK-8354242 - Merge branch 'master' into JDK-8354242 - 8354242: VectorAPI: combine vector not operation with compare This patch optimizes the following patterns: For integer types: ``` (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) => (VectorMaskCmp src1 src2 ncond) (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) => (VectorMaskCmp src1 src2 ncond) ``` cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. For float and double types: ``` (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) ``` cond can be eq or ne. Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: ``` Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 testCompareLTMaskNotInt ops/s 1672180.09 995.238142 2353757.863 853.774734 1.4 testCompareLTMaskNotLong ops/s 856502.2695 12276.82851 1177671.815 496.723302 1.37 testCompareLTMaskNotShort ops/s 3325798.025 2412.702501 4711554.181 1779.302112 1.41 testCompareNEMaskNotByte ops/s 7910002.518 2771.82477 10245315.33 16321.93935 1.29 testCompareNEMaskNotDouble ops/s 863754.6022 523.140788 1179133.982 476.572178 1.36 testCompareNEMaskNotFloat ops/s 1723321.883 2598.484803 2358492.186 877.1401 1.36 testCompareNEMaskNotInt ops/s 1670288.841 751.774826 2354158.125 835.720163 1.4 testCompareNEMaskNotLong ops/s 836327.6835 410.525466 1178178.825 308.757932 1.4 testCompareNEMaskNotShort ops/s 3327815.841 1511.978763 4711379.136 2336.505531 1.41 testCompareUGEMaskNotByte ops/s 7906699.024 3200.936474 10253843.74 15067.59401 1.29 testCompareUGEMaskNotInt ops/s 1674003.923 3287.191727 2353340.666 951.381021 1.4 testCompareUGEMaskNotLong ops/s 852424.5562 8920.408939 1177943.609 389.6621 1.38 testCompareUGEMaskNotShort ops/s 3327255.858 1584.885143 4711622.355 1247.215277 1.41 testCompareUGTMaskNotByte ops/s 7909249.189 4435.283667 10245541.34 10993.34739 1.29 testCompareUGTMaskNotInt ops/s 1693713.433 20650.00213 2353153.787 1055.343846 1.38 testCompareUGTMaskNotLong ops/s 851022.3395 7079.065268 1177910.677 538.604598 1.38 testCompareUGTMaskNotShort ops/s 3327236.988 1616.886789 4711209.865 3098.494145 1.41 testCompareULEMaskNotByte ops/s 7909350.825 3251.262342 10261449.03 7273.831341 1.29 testCompareULEMaskNotInt ops/s 1672350.925 1545.304304 2353231.755 914.231193 1.4 testCompareULEMaskNotLong ops/s 853349.4765 9804.906913 1177967.254 435.044367 1.38 testCompareULEMaskNotShort ops/s 3325757.891 1555.062257 4712873.187 1650.986905 1.41 testCompareULTMaskNotByte ops/s 7912218.621 2633.477744 10242095.98 21921.39902 1.29 testCompareULTMaskNotInt ops/s 1673994.849 2672.507666 2353449.22 946.105757 1.4 testCompareULTMaskNotLong ops/s 849032.5868 10406.06689 1177586.047 506.541456 1.38 testCompareULTMaskNotShort ops/s 3328062.026 1892.991844 4713247.216 1855.983724 1.41 ``` With option `-XX:UseSVE=0`: ``` Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 7895961.919 72712.90804 7746493.731 71481.92938 0.98 testCompareEQMaskNotDouble ops/s 789811.0455 384.493088 766473.7994 2216.581793 0.97 testCompareEQMaskNotFloat ops/s 1806305.818 638.010451 1819616.613 3295.38958 1 testCompareEQMaskNotInt ops/s 1815820.144 1225.336135 1849538.401 766.29902 1.01 testCompareEQMaskNotLong ops/s 807336.492 335.451807 792732.9483 277.954432 0.98 testCompareEQMaskNotShort ops/s 4818266.38 1927.862665 4668903.001 1922.782715 0.96 testCompareGEMaskNotByte ops/s 7818439.678 75374.97739 16498003.98 41440.49653 2.11 testCompareGEMaskNotInt ops/s 1815159.05 1090.912209 2372095.779 1664.397112 1.3 testCompareGEMaskNotLong ops/s 804324.5575 2301.686878 927919.8507 371.766719 1.15 testCompareGEMaskNotShort ops/s 4818966.563 2443.643652 5385561.038 29558.37423 1.11 testCompareGTMaskNotByte ops/s 7893406.157 82687.74264 16470663.2 22165.55812 2.08 testCompareGTMaskNotInt ops/s 1815316.812 915.894106 2370447.198 655.016338 1.3 testCompareGTMaskNotLong ops/s 807019.456 526.525482 928079.0541 330.582693 1.15 testCompareGTMaskNotShort ops/s 4820552.881 1684.247747 5355902.93 5893.2915 1.11 testCompareLEMaskNotByte ops/s 7816263.323 79560.0015 16473621.19 56688.99585 2.1 testCompareLEMaskNotInt ops/s 1814915.724 926.998625 2368790.306 932.594778 1.3 testCompareLEMaskNotLong ops/s 806483.9 935.718082 928110.9074 407.096695 1.15 testCompareLEMaskNotShort ops/s 4813660.241 6817.870509 5357107.852 10061.47975 1.11 testCompareLTMaskNotByte ops/s 7838948.962 69136.4504 16424405.96 24464.75469 2.09 testCompareLTMaskNotInt ops/s 1815056.833 1187.6453 2369892.187 1103.819634 1.3 testCompareLTMaskNotLong ops/s 806602.1804 287.923365 928346.4118 617.682824 1.15 testCompareLTMaskNotShort ops/s 4817940.643 2767.1509 5372537.84 15397.47169 1.11 testCompareNEMaskNotByte ops/s 9078493.798 4630.339307 16484348.42 18925.88346 1.81 testCompareNEMaskNotDouble ops/s 661769.6272 398.712981 926763.5839 1808.843788 1.4 testCompareNEMaskNotFloat ops/s 1570527.252 563.642144 2312425.678 1815.844846 1.47 testCompareNEMaskNotInt ops/s 1619146.58 626.793854 2369711.543 942.330478 1.46 testCompareNEMaskNotLong ops/s 680201.5381 2252.836482 927808.6147 414.917863 1.36 testCompareNEMaskNotShort ops/s 3763508.054 3622.560798 5367808.015 8591.466599 1.42 testCompareUGEMaskNotByte ops/s 7886373.129 75917.74675 16480928.93 27524.31005 2.08 testCompareUGEMaskNotInt ops/s 1815636.832 750.036241 2369683.015 901.609404 1.3 testCompareUGEMaskNotLong ops/s 806862.5826 287.819616 928001.4394 361.063837 1.15 testCompareUGEMaskNotShort ops/s 4820581.361 2098.537435 5375854.248 25619.40165 1.11 testCompareUGTMaskNotByte ops/s 7891591.465 96614.93542 16410405.93 15012.37096 2.07 testCompareUGTMaskNotInt ops/s 1814871.179 662.825588 2371325.903 1170.491164 1.3 testCompareUGTMaskNotLong ops/s 804013.7658 2240.534209 928062.2169 531.306897 1.15 testCompareUGTMaskNotShort ops/s 4818150.337 3051.717685 5381449.337 21212.34187 1.11 testCompareULEMaskNotByte ops/s 7831540.628 81306.67253 16495250.78 38682.19675 2.1 testCompareULEMaskNotInt ops/s 1814484.14 687.860656 2369265.075 940.609586 1.3 testCompareULEMaskNotLong ops/s 807780.5749 769.876816 927538.0732 1278.267724 1.14 testCompareULEMaskNotShort ops/s 4817437.42 5141.336541 5356183.359 7015.608124 1.11 testCompareULTMaskNotByte ops/s 7849078.225 56753.59764 16395975.27 34043.67295 2.08 testCompareULTMaskNotInt ops/s 1814328.226 2697.219111 2370700.47 1991.841988 1.3 testCompareULTMaskNotLong ops/s 807166.8197 253.061506 927926.2803 252.933462 1.14 testCompareULTMaskNotShort ops/s 4821098.216 1625.959044 5348980.243 4100.768121 1.1 ``` Benchmarks on AMD EPYC 9124 16-Core Processor: With option `-XX:UseAVX=3`: ``` Benchmark Unit Before Score Error After Score Error Uplift testCompareEQMaskNotByte ops/s 16607323.35 1233692.631 18381557.66 1163201.522 1.1 testCompareEQMaskNotDouble ops/s 2114285.245 58782.2534 2959946.353 43016.0445 1.39 testCompareEQMaskNotFloat ops/s 4480874.437 89975.29074 6960151.436 64799.143 1.55 testCompareEQMaskNotInt ops/s 4370906.91 51784.80889 6856955.043 313858.5504 1.56 testCompareEQMaskNotLong ops/s 2080065.895 26762.06732 2939142.143 67179.05314 1.41 testCompareEQMaskNotShort ops/s 7968282.563 210437.2781 12701214.56 473152.6407 1.59 testCompareGEMaskNotByte ops/s 18419141.89 473408.9451 19880059.68 321638.0397 1.07 testCompareGEMaskNotInt ops/s 4419015.62 77352.98633 7037639.227 151066.0383 1.59 testCompareGEMaskNotLong ops/s 2147982.48 49227.42782 3000275.928 39298.75344 1.39 testCompareGEMaskNotShort ops/s 8469039.613 17833.19707 12288229.49 244317.8812 1.45 testCompareGTMaskNotByte ops/s 18728997.5 468328.8358 20544730.05 392264.6466 1.09 testCompareGTMaskNotInt ops/s 4510009.705 78812.57357 7364629.942 70970.78473 1.63 testCompareGTMaskNotLong ops/s 2124104.969 40917.89257 2953536.279 35199.19687 1.39 testCompareGTMaskNotShort ops/s 8690557.621 311534.1159 12344017.51 457931.8741 1.42 testCompareLEMaskNotByte ops/s 17758400.53 478383.4945 19209183.26 1143297.241 1.08 testCompareLEMaskNotInt ops/s 4363664.862 43443.18063 7054093.064 78141.11476 1.61 testCompareLEMaskNotLong ops/s 2068632.213 29844.78023 2954766.412 50667.22502 1.42 testCompareLEMaskNotShort ops/s 8637608.548 183538.5511 12719010.27 473568.8825 1.47 testCompareLTMaskNotByte ops/s 14406138.95 423105.0163 17292417.96 371386.9689 1.2 testCompareLTMaskNotInt ops/s 4546707.266 131977.3144 7040483.394 213590.4657 1.54 testCompareLTMaskNotLong ops/s 2123277.356 47243.21499 2848720.442 58896.97045 1.34 testCompareLTMaskNotShort ops/s 7570169.363 649873.6295 11945383.75 988276.5955 1.57 testCompareNEMaskNotByte ops/s 18274529.55 683396.7384 19081938.8 1118739.778 1.04 testCompareNEMaskNotDouble ops/s 2112533.61 43295.50012 2912115.441 78189.51083 1.37 testCompareNEMaskNotFloat ops/s 4628683.814 93817.07362 6967208.729 145135.8544 1.5 testCompareNEMaskNotInt ops/s 4470900.214 75974.50842 7286913.662 116328.5277 1.62 testCompareNEMaskNotLong ops/s 2134091.061 46377.94061 2934667.477 81675.46021 1.37 testCompareNEMaskNotShort ops/s 8790384.287 396161.8599 13076858.35 286272.1155 1.48 testCompareUGEMaskNotByte ops/s 18009150.9 660803.8886 17551258.33 1667014.843 0.97 testCompareUGEMaskNotInt ops/s 4442928.74 83190.81019 6854088.277 329008.8901 1.54 testCompareUGEMaskNotLong ops/s 2088357.736 71696.24791 2973202.26 63278.78974 1.42 testCompareUGEMaskNotShort ops/s 8348624.02 116562.7876 12832250.78 546869.3006 1.53 testCompareUGTMaskNotByte ops/s 17871101.25 800199.6321 19902619.81 214003.3262 1.11 testCompareUGTMaskNotInt ops/s 4088304.421 137797.9723 7135454.33 124553.651 1.74 testCompareUGTMaskNotLong ops/s 2070610.42 19881.82182 2991536.365 36260.60767 1.44 testCompareUGTMaskNotShort ops/s 8637099.341 155822.1608 12756579.77 186068.199 1.47 testCompareULEMaskNotByte ops/s 17940901.36 1258029.364 18932484.94 694554.6305 1.05 testCompareULEMaskNotInt ops/s 4369177.511 74982.31936 6392773.082 550171.2266 1.46 testCompareULEMaskNotLong ops/s 2135905.761 43693.63178 2877579.631 41651.56289 1.34 testCompareULEMaskNotShort ops/s 8607710.544 132655.1676 12446370.04 441718.3035 1.44 testCompareULTMaskNotByte ops/s 17409912.23 1033204.537 20607479.99 362000.5056 1.18 testCompareULTMaskNotInt ops/s 4386455.9 119192.1635 6920123.264 186158.2845 1.57 testCompareULTMaskNotLong ops/s 2064995.149 38622.2734 2988343.589 39037.90006 1.44 testCompareULTMaskNotShort ops/s 8642182.752 230919.2442 13029582.09 437101.4923 1.5 ``` The small amount of performance degradation is due to test fluctuations. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24674/files - new: https://git.openjdk.org/jdk/pull/24674/files/001fac0f..f2f71e34 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24674&range=04-05 Stats: 20102 lines in 755 files changed: 11515 ins; 4932 del; 3655 mod Patch: https://git.openjdk.org/jdk/pull/24674.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24674/head:pull/24674 PR: https://git.openjdk.org/jdk/pull/24674 From duke at openjdk.org Wed May 14 02:44:14 2025 From: duke at openjdk.org (erifan) Date: Wed, 14 May 2025 02:44:14 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 09:37:22 GMT, erifan wrote: >> Yes, converting `VectorMask.fromLong(SPECIES, -1L)` to `MaskAll()` would be better, and that will benefit AArch64 as well, since `MaskAll()` is much more cheaper than `fromLong()` on AArch64. We can add such a transformation with another PR. > > Ok, I'll extend the test to xor(maskAll(true) in the next commit, thanks! Done, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2087927027 From duke at openjdk.org Wed May 14 02:44:16 2025 From: duke at openjdk.org (erifan) Date: Wed, 14 May 2025 02:44:16 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v5] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:13:23 GMT, Jatin Bhateja wrote: >> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Refactor code >> >> Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this >> optimization, making the code more modular. >> - Merge branch 'master' into JDK-8354242 >> - Update the jtreg test >> - Merge branch 'master' into JDK-8354242 >> - Addressed some review comments >> >> 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >> 2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >> This patch optimizes the following patterns: >> For integer types: >> ``` >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> ``` >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> ``` >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> ``` >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: >> With option `-XX:UseSVE=2`: >> ``` >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.156... > > test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java line 40: > >> 38: @Fork(jvmArgs = { "--add-modules=jdk.incubator.vector" }) >> 39: public class MaskCompareNotBenchmark { >> 40: private static final int ARRAYLEN = 4096; > > ARRAYLEN should be configurable @Param. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2087927310 From duke at openjdk.org Wed May 14 02:50:53 2025 From: duke at openjdk.org (Anjian-Wen) Date: Wed, 14 May 2025 02:50:53 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v9] In-Reply-To: References: Message-ID: > From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time > > on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below > > before the patch > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op > MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op > MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op > MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op > MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op > MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op > MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op > MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op > MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op > MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op > MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op > MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op > MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op > MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op > MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op > MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op > MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op > MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op > MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/op > MemorySegmentZeroUnsafe.panama false 63 avgt 30... Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: fix bug and delete some useless code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23890/files - new: https://git.openjdk.org/jdk/pull/23890/files/f57086b1..51654891 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=07-08 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23890/head:pull/23890 PR: https://git.openjdk.org/jdk/pull/23890 From duke at openjdk.org Wed May 14 03:20:56 2025 From: duke at openjdk.org (kuaiwei) Date: Wed, 14 May 2025 03:20:56 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v15] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 10:34:02 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix build error on mac and windows > > src/hotspot/share/opto/addnode.cpp line 1054: > >> 1052: } else { >> 1053: // not found >> 1054: add_operators_to_worklist(_combine); > > Why are you doing this? If an input still needs to be transformed, then it should be put onto the work list by the inputs of that operator. And not by `combine`, i.e. the use of that operator. > > Plus: if those operators are now transformed, would we actually ever get back here and attempt optimizing again? Your flag is now already set with `set_merge_memops_checked`, so we would not get here again, right? It's in the process of transforming `_combine`, so I think it follows the rule, "put inputs of the operator to worklist", do I misunderstand? Yes, "_mege_memops_checked" flag is used to block re-transforming `_combine` again. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2087954737 From duke at openjdk.org Wed May 14 03:23:41 2025 From: duke at openjdk.org (Anjian-Wen) Date: Wed, 14 May 2025 03:23:41 GMT Subject: RFR: 8356869: RISC-V: Improve tail handling of array fill stub [v2] In-Reply-To: References: Message-ID: > The tail handling after bulk copy in array fill stub may trigger misaligned memory accesses. > The address is 8-byte aligned after bulk copy and the tail handling copies BYTE, SHORT, and > INT granules in order. This could trigger misaligned accesses. We should copy the remainings > in this order: INT, SHORT, and BYTE to avoid such an issue. > > JMH data on P550 SBC for reference (@Param("15") private int size): > > Before: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 15 avgt 12 961.604 ? 1.497 ns/op > ArrayFill.fillIntArray 15 avgt 12 29.355 ? 0.024 ns/op > ArrayFill.fillShortArray 15 avgt 12 569.499 ? 0.662 ns/op > ArrayFill.zeroByteArray 15 avgt 12 957.080 ? 5.358 ns/op > ArrayFill.zeroIntArray 15 avgt 12 29.344 ? 0.006 ns/op > ArrayFill.zeroShortArray 15 avgt 12 569.730 ? 0.441 ns/op > > After: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 15 avgt 12 32.206 ? 0.005 ns/op > ArrayFill.fillIntArray 15 avgt 12 29.347 ? 0.007 ns/op > ArrayFill.fillShortArray 15 avgt 12 31.732 ? 0.451 ns/op > ArrayFill.zeroByteArray 15 avgt 12 32.208 ? 0.007 ns/op > ArrayFill.zeroIntArray 15 avgt 12 29.346 ? 0.007 ns/op > ArrayFill.zeroShortArray 15 avgt 12 31.492 ? 0.006 ns/op Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: add test for two corner case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25210/files - new: https://git.openjdk.org/jdk/pull/25210/files/9e1d1d44..e9114839 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25210&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25210&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25210.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25210/head:pull/25210 PR: https://git.openjdk.org/jdk/pull/25210 From duke at openjdk.org Wed May 14 03:23:41 2025 From: duke at openjdk.org (Anjian-Wen) Date: Wed, 14 May 2025 03:23:41 GMT Subject: RFR: 8356869: RISC-V: Improve tail handling of array fill stub In-Reply-To: References: Message-ID: <1xs8g9mVhaTz8FIMw3hyEay8PhrOKxvBmPm3rmPKYoA=.99454cd9-078f-4268-9285-389e02f03247@github.com> On Tue, 13 May 2025 19:44:12 GMT, Hamlin Li wrote: >> The tail handling after bulk copy in array fill stub may trigger misaligned memory accesses. >> The address is 8-byte aligned after bulk copy and the tail handling copies BYTE, SHORT, and >> INT granules in order. This could trigger misaligned accesses. We should copy the remainings >> in this order: INT, SHORT, and BYTE to avoid such an issue. >> >> JMH data on P550 SBC for reference (@Param("15") private int size): >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 15 avgt 12 961.604 ? 1.497 ns/op >> ArrayFill.fillIntArray 15 avgt 12 29.355 ? 0.024 ns/op >> ArrayFill.fillShortArray 15 avgt 12 569.499 ? 0.662 ns/op >> ArrayFill.zeroByteArray 15 avgt 12 957.080 ? 5.358 ns/op >> ArrayFill.zeroIntArray 15 avgt 12 29.344 ? 0.006 ns/op >> ArrayFill.zeroShortArray 15 avgt 12 569.730 ? 0.441 ns/op >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 15 avgt 12 32.206 ? 0.005 ns/op >> ArrayFill.fillIntArray 15 avgt 12 29.347 ? 0.007 ns/op >> ArrayFill.fillShortArray 15 avgt 12 31.732 ? 0.451 ns/op >> ArrayFill.zeroByteArray 15 avgt 12 32.208 ? 0.007 ns/op >> ArrayFill.zeroIntArray 15 avgt 12 29.346 ? 0.007 ns/op >> ArrayFill.zeroShortArray 15 avgt 12 31.492 ? 0.006 ns/op > > Nice catch and fix. Looks good. > Can you add the test case `@param("15")` and maybe some more to ArrayFill.java? @Hamlin-Li Thanks for the review! I have added two sizes to @Param. Size = 15 for this case and size = 7 for the case in JDK-8356593. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25210#issuecomment-2878515780 From duke at openjdk.org Wed May 14 03:28:36 2025 From: duke at openjdk.org (Anjian-Wen) Date: Wed, 14 May 2025 03:28:36 GMT Subject: RFR: 8356869: RISC-V: Improve tail handling of array fill stub [v3] In-Reply-To: References: Message-ID: <6xC6wZ3gPlHPOO7rytPY5Tv-uSd9SZu0WIT9kHsWihE=.f37cabe3-fc45-4acc-9094-27893e259974@github.com> > The tail handling after bulk copy in array fill stub may trigger misaligned memory accesses. > The address is 8-byte aligned after bulk copy and the tail handling copies BYTE, SHORT, and > INT granules in order. This could trigger misaligned accesses. We should copy the remainings > in this order: INT, SHORT, and BYTE to avoid such an issue. > > JMH data on P550 SBC for reference (@Param("15") private int size): > > Before: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 15 avgt 12 961.604 ? 1.497 ns/op > ArrayFill.fillIntArray 15 avgt 12 29.355 ? 0.024 ns/op > ArrayFill.fillShortArray 15 avgt 12 569.499 ? 0.662 ns/op > ArrayFill.zeroByteArray 15 avgt 12 957.080 ? 5.358 ns/op > ArrayFill.zeroIntArray 15 avgt 12 29.344 ? 0.006 ns/op > ArrayFill.zeroShortArray 15 avgt 12 569.730 ? 0.441 ns/op > > After: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 15 avgt 12 32.206 ? 0.005 ns/op > ArrayFill.fillIntArray 15 avgt 12 29.347 ? 0.007 ns/op > ArrayFill.fillShortArray 15 avgt 12 31.732 ? 0.451 ns/op > ArrayFill.zeroByteArray 15 avgt 12 32.208 ? 0.007 ns/op > ArrayFill.zeroIntArray 15 avgt 12 29.346 ? 0.007 ns/op > ArrayFill.zeroShortArray 15 avgt 12 31.492 ? 0.006 ns/op Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: fix format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25210/files - new: https://git.openjdk.org/jdk/pull/25210/files/e9114839..432b31d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25210&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25210&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25210.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25210/head:pull/25210 PR: https://git.openjdk.org/jdk/pull/25210 From fyang at openjdk.org Wed May 14 03:30:56 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 14 May 2025 03:30:56 GMT Subject: RFR: 8356869: RISC-V: Improve tail handling of array fill stub [v3] In-Reply-To: <6xC6wZ3gPlHPOO7rytPY5Tv-uSd9SZu0WIT9kHsWihE=.f37cabe3-fc45-4acc-9094-27893e259974@github.com> References: <6xC6wZ3gPlHPOO7rytPY5Tv-uSd9SZu0WIT9kHsWihE=.f37cabe3-fc45-4acc-9094-27893e259974@github.com> Message-ID: <5T4HirOfmjJpZ9dC4A6w_9Cs-ZLQAB-5qkR04CxCEOE=.77c0015e-8a7e-41a2-b511-a1734c3cf74e@github.com> On Wed, 14 May 2025 03:28:36 GMT, Anjian-Wen wrote: >> The tail handling after bulk copy in array fill stub may trigger misaligned memory accesses. >> The address is 8-byte aligned after bulk copy and the tail handling copies BYTE, SHORT, and >> INT granules in order. This could trigger misaligned accesses. We should copy the remainings >> in this order: INT, SHORT, and BYTE to avoid such an issue. >> >> JMH data on P550 SBC for reference (@Param("15") private int size): >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 15 avgt 12 961.604 ? 1.497 ns/op >> ArrayFill.fillIntArray 15 avgt 12 29.355 ? 0.024 ns/op >> ArrayFill.fillShortArray 15 avgt 12 569.499 ? 0.662 ns/op >> ArrayFill.zeroByteArray 15 avgt 12 957.080 ? 5.358 ns/op >> ArrayFill.zeroIntArray 15 avgt 12 29.344 ? 0.006 ns/op >> ArrayFill.zeroShortArray 15 avgt 12 569.730 ? 0.441 ns/op >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 15 avgt 12 32.206 ? 0.005 ns/op >> ArrayFill.fillIntArray 15 avgt 12 29.347 ? 0.007 ns/op >> ArrayFill.fillShortArray 15 avgt 12 31.732 ? 0.451 ns/op >> ArrayFill.zeroByteArray 15 avgt 12 32.208 ? 0.007 ns/op >> ArrayFill.zeroIntArray 15 avgt 12 29.346 ? 0.007 ns/op >> ArrayFill.zeroShortArray 15 avgt 12 31.492 ? 0.006 ns/op > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > fix format Looks good. Thanks for finding this! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25210#pullrequestreview-2838613769 From duke at openjdk.org Wed May 14 03:33:00 2025 From: duke at openjdk.org (kuaiwei) Date: Wed, 14 May 2025 03:33:00 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> Message-ID: <4W0upz252QQIcjJ0Ca21ZzbbyZRfaH5TRpu2yU4IieU=.63c1a5a5-3774-4801-9a48-72ebc7c0508b@github.com> On Mon, 24 Mar 2025 11:41:46 GMT, Emanuel Peter wrote: >>> @kuaiwei I have not yet had the time to read through the PR, but I would like to talk about `LoadNode::Ideal`. The idea with `Ideal` in general, is that you replace one node with another. After `Ideal` returns, all usages of the old node now take the new node instead. >>> >>> You copied the structure from my MergeStores implementation in `StoreNode::Idea`. There it made sense to replace `StoreB` nodes that have a memory output with `LoadI` nodes, which also have memory output. >>> >>> But it does not make sense to replace a `LoadB` that has a byte/int output with a `LoadL` that has a long output for example. >>> >>> I think your implementation should go into `OrINode`, and match the expression up from there. Because we want to replace the old `OrI` with the new `LoadL`. >>> >>> Another question: Do you have some tests where some of the nodes in the `load/shift/or` expression have other uses? Imagine this: >>> >>> ``` >>> l0 = a[0]; >>> l1 = a[1]; >>> l2 = a[2]; >>> l3 = a[3]; >>> l = ; >>> now use l1 for something else as well >>> ``` >>> >>> What happens now? Do you check that we only use the old `LoadB` in the expression we are replacing? >> >> Hi @eme64 , I understand your concern. In this patch , I check the usage of all `loadB` nodes and only allow they have only single usage into `OrNode`, I also check the `OrNode` as well. So I think it will not cause the trouble. >> >> >> l0 = a[0]; >> l1 = a[1]; >> l2 = a[2]; >> l3 = a[3]; >> l = ; >> now use l1 for something else as well >> >> For this case, because l1 has other usage, all these loads will not be merged. >> >> In my previous patch, I tried to extract value from merged `LoadNode` if origin `loadB` has other usage, such as used by uncommon trap. You can find them in https://github.com/openjdk/jdk/pull/24023/commits/b621db1cf0c17885516254a2af4b5df43e06c098 and search MergePrimitiveLoads::extract_value_for_uncommon_trap . But in my test with jtreg tier1, it never hit a case which replaced `LoadB` used by uncommon trap, I think range check smearing remove all the uncommon trap usages. So I revert it to make code simple. In my opinion, the extract_value function can be used as a general solution for other usages. But we may need a cost model to evaluate cost of new instructions which used for extracting and benefit of merged load. To simplify, I choose to check usage strictly. > > @kuaiwei Thanks for your response! > > What about these two things I brought up? > >> Do you have some tests where some of the nodes in the load/shift/or expression have other uses? > > It would be good to have these tests, even if we think your code is correct. It is good to verify it with tests. And someone in the future might break it. > >> I think your implementation should go into OrINode, and match the expression up from there. Because we want to replace the old OrI with the new LoadL. > > This is really the pattern we use in `Idea`. We replace the node at the bottom of an expression with a new node (or new expression). @eme64 Thanks for your careful review. I worked on other item before and not response your comments in time? Sorry for delay. I will check if we can remove the additional flag and use pattern match for this case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2878525957 From dzhang at openjdk.org Wed May 14 04:10:32 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 14 May 2025 04:10:32 GMT Subject: RFR: 8356924: RISC-V: Clean up cost for vector instructions Message-ID: As mentioned in https://bugs.openjdk.org/browse/JDK-8285790 regarding the ARM64 vector instruct modifications: Since the new rules are unique and setting different "ins_cost" makes no sense, we have switched to using the default cost. Currently, there is a similar situation on RISC-V. Over half of the instructions in riscv_v.ad do not include ins_cost definitions. Additionally, as RVV nodes are also unique, we can unify the format by removing these ins_cost entries from riscv_v.ad. ### Testing qemu-system 9.1.0 with UseRVV (ubuntu24.10): * [x] Run test/jdk/jdk/incubator/vector (fastdebug) * [x] Run test/hotspot/jtreg/compiler/vectorapi (fastdebug) ------------- Commit messages: - 8356924: RISC-V: Clean up cost for vector instructions Changes: https://git.openjdk.org/jdk/pull/25221/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25221&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356924 Stats: 164 lines in 1 file changed: 0 ins; 164 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25221.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25221/head:pull/25221 PR: https://git.openjdk.org/jdk/pull/25221 From chagedorn at openjdk.org Wed May 14 05:58:50 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 14 May 2025 05:58:50 GMT Subject: RFR: 8325647: [IR framework] Only prints stdout if exitCode is 134 [v2] In-Reply-To: References: <8Chv6pneN7s8OzJxIjGxfNVmr1q-StTW1PuGNC3yBJE=.c9f940d8-faac-4114-b3a0-ff449f73c8b5@github.com> Message-ID: On Tue, 13 May 2025 16:03:09 GMT, Marc Chevalier wrote: >> On Linux, `assert` and such eventually use `abort` which give the return code 134 (128 + 6 (code of SIGABRT/SIGIOT)). On Windows, dying returns `-1` (exit code are more-or-less-signed int on Windows): >> >> https://github.com/openjdk/jdk/blob/2b3254160933e8b11527f801507a9c01b90d22b0/src/hotspot/os/windows/os_windows.cpp#L1382-L1384 >> >> So let's make the IR framework aware of this: we consider there was a JVM error if the OS is windows and the return code -1, or if it's 134 otherwise. I'm not sure what's the most idiomatic/robust way to check whether we are on Windows or not, but it's not customer code: it just needs to work for testing. > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Use Platform.isWindows That looks good, thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25200#pullrequestreview-2838822842 From duke at openjdk.org Wed May 14 06:29:56 2025 From: duke at openjdk.org (kuaiwei) Date: Wed, 14 May 2025 06:29:56 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v15] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 10:03:43 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix build error on mac and windows > > src/hotspot/share/opto/addnode.cpp line 868: > >> 866: private: >> 867: // Detect if the embedding combine node is last one of combine operators >> 868: bool has_no_merge_load_combine_below( ) const; > > Suggestion: > > bool has_no_merge_load_combine_below() const; Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2088145386 From thartmann at openjdk.org Wed May 14 06:31:54 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 14 May 2025 06:31:54 GMT Subject: RFR: 8356281: Fix for TestFPComparison failure due to incorrect result [v3] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 22:46:10 GMT, Srinivas Vamsi Parasa wrote: >> This PR fixes the cause of failure in TestFPComparison while using APX NDD instructions. >> >> The test passes after using this fix as shown below: >> >> Passed: compiler/c2/irTests/TestFPComparison.java >> Test results: passed: 1 >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR SKIP >> jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison.java >> 1 1 0 0 0 >> ============================== >> TEST SUCCESS > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > add TEMP(dst) Looks good to me. Testing on our side passed. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25101#pullrequestreview-2838897498 From galder at openjdk.org Wed May 14 06:44:00 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 14 May 2025 06:44:00 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:59:48 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - Whitespace >> - Suggestions by Christian >> >> Co-authored-by: Christian Hagedorn >> - typo >> - For Christian: example and more intro >> - fix hashtag >> - manual merge >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - move library >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - ... and 6 more: https://git.openjdk.org/jdk/compare/0844745e...fae7ced6 > > Next batch of comments. Will probably resume tomorrow :-) > @chhagedorn Ok, I tried my best with the `(Un)FilledTemplate` refactoring. I'm still not sure if I want to rename `FilledTemplate` to `RenderableTemplate`, it is not super satisfying for a beginner either. Naming is hard. If anybody else has a better idea than `(Un)FilledTemplate`, please let me know ;) > > I think one can continue reviewing this now! I've just quickly skimmed through this hierarchy. `(Un)FilledTemplate` reminds me a bit of the builder pattern. What about renaming `UnFilledTemplate` to `TemplateBuilder` and `FilledTemplate` to just `Template`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2878868131 From duke at openjdk.org Wed May 14 06:44:54 2025 From: duke at openjdk.org (duke) Date: Wed, 14 May 2025 06:44:54 GMT Subject: RFR: 8356702: CTW: Update modules [v2] In-Reply-To: <3s1wwqmKLRzxZw2FL-s48FvSEcmzlYle_qZBn7YYvOE=.a09bdd82-7186-405a-a076-a2934b9bb3b3@github.com> References: <5_pxWyLzGtPZEDsJKkq6i5wFIemDsY-OeXTgkVO_kuk=.ed16944a-2e41-4c19-a27c-6c1a8269da42@github.com> <3s1wwqmKLRzxZw2FL-s48FvSEcmzlYle_qZBn7YYvOE=.a09bdd82-7186-405a-a076-a2934b9bb3b3@github.com> Message-ID: On Tue, 13 May 2025 07:24:14 GMT, Evgeny Nikitin wrote: >> This PR enhances CTW test wrappers generator in order to make it more user-friendly. Added features are: >> >> 1. Automatic scanning for modules list under `open/src` >> 2. Automatic recognition of current year; >> 3. Multi-wrapper modules support (allows for splitting huge modules into 2 and more wrappers) >> 4. ability to exclude modules; >> >> The updated generator have been used to refresh JTReg module wrappers. >> The most meaningful change is contained in the `generate.bash` >> Testing: `open/test/hotspot/jtreg/applications/ctw/modules` with the supported platforms, no failures spotted. > > Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: > > Revert "Update modified wrappers" > > This reverts commit d7122ccbf3b03a3c43917656ad209624910f6230. @lepestock Your change (at version 13a0b97b1bc5f2b6f39966da9915a250d0c773ab) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25175#issuecomment-2878869798 From fjiang at openjdk.org Wed May 14 07:29:54 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 14 May 2025 07:29:54 GMT Subject: RFR: 8356869: RISC-V: Improve tail handling of array fill stub [v3] In-Reply-To: <6xC6wZ3gPlHPOO7rytPY5Tv-uSd9SZu0WIT9kHsWihE=.f37cabe3-fc45-4acc-9094-27893e259974@github.com> References: <6xC6wZ3gPlHPOO7rytPY5Tv-uSd9SZu0WIT9kHsWihE=.f37cabe3-fc45-4acc-9094-27893e259974@github.com> Message-ID: On Wed, 14 May 2025 03:28:36 GMT, Anjian-Wen wrote: >> The tail handling after bulk copy in array fill stub may trigger misaligned memory accesses. >> The address is 8-byte aligned after bulk copy and the tail handling copies BYTE, SHORT, and >> INT granules in order. This could trigger misaligned accesses. We should copy the remainings >> in this order: INT, SHORT, and BYTE to avoid such an issue. >> >> JMH data on P550 SBC for reference (@Param("15") private int size): >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 15 avgt 12 961.604 ? 1.497 ns/op >> ArrayFill.fillIntArray 15 avgt 12 29.355 ? 0.024 ns/op >> ArrayFill.fillShortArray 15 avgt 12 569.499 ? 0.662 ns/op >> ArrayFill.zeroByteArray 15 avgt 12 957.080 ? 5.358 ns/op >> ArrayFill.zeroIntArray 15 avgt 12 29.344 ? 0.006 ns/op >> ArrayFill.zeroShortArray 15 avgt 12 569.730 ? 0.441 ns/op >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 15 avgt 12 32.206 ? 0.005 ns/op >> ArrayFill.fillIntArray 15 avgt 12 29.347 ? 0.007 ns/op >> ArrayFill.fillShortArray 15 avgt 12 31.732 ? 0.451 ns/op >> ArrayFill.zeroByteArray 15 avgt 12 32.208 ? 0.007 ns/op >> ArrayFill.zeroIntArray 15 avgt 12 29.346 ? 0.007 ns/op >> ArrayFill.zeroShortArray 15 avgt 12 31.492 ? 0.006 ns/op > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > fix format looks good! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/25210#pullrequestreview-2839063150 From dfenacci at openjdk.org Wed May 14 07:31:58 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 14 May 2025 07:31:58 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v2] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 02:32:49 GMT, Jasmine Karthikeyan wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge >> - Improve AbsNode::Value > > Thank you all for the comments! I've pushed an update that refactors the method to check for `min_value` ahead of doing abs, so that we can safely use `ABS()` instead of `uabs()`. I've refactored the behavior for constants to avoid using `uabs` there as well. A re-review would be appreciated! Thanks @jaskarth. Running tier1-3 tests... ------------- PR Comment: https://git.openjdk.org/jdk/pull/23685#issuecomment-2879038589 From gcao at openjdk.org Wed May 14 07:50:52 2025 From: gcao at openjdk.org (Gui Cao) Date: Wed, 14 May 2025 07:50:52 GMT Subject: RFR: 8356924: RISC-V: Clean up cost for vector instructions In-Reply-To: References: Message-ID: On Wed, 14 May 2025 04:04:11 GMT, Dingli Zhang wrote: > As mentioned in https://bugs.openjdk.org/browse/JDK-8285790 regarding the ARM64 vector instruct modifications: > Since the new rules are unique and setting different "ins_cost" makes no sense, we have switched to using the default cost. > > Currently, there is a similar situation on RISC-V. Over half of the instructions in riscv_v.ad do not include ins_cost definitions. Additionally, as RVV nodes are also unique, we can unify the format by removing these ins_cost entries from riscv_v.ad. > > ### Testing > qemu-system 9.1.0 with UseRVV (ubuntu24.10): > * [x] Run test/jdk/jdk/incubator/vector (fastdebug) > * [x] Run test/hotspot/jtreg/compiler/vectorapi (fastdebug) LGTM. ------------- Marked as reviewed by gcao (Author). PR Review: https://git.openjdk.org/jdk/pull/25221#pullrequestreview-2839130010 From thartmann at openjdk.org Wed May 14 07:53:50 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 14 May 2025 07:53:50 GMT Subject: RFR: 8325647: [IR framework] Only prints stdout if exitCode is 134 [v2] In-Reply-To: References: <8Chv6pneN7s8OzJxIjGxfNVmr1q-StTW1PuGNC3yBJE=.c9f940d8-faac-4114-b3a0-ff449f73c8b5@github.com> Message-ID: On Tue, 13 May 2025 16:03:09 GMT, Marc Chevalier wrote: >> On Linux, `assert` and such eventually use `abort` which give the return code 134 (128 + 6 (code of SIGABRT/SIGIOT)). On Windows, dying returns `-1` (exit code are more-or-less-signed int on Windows): >> >> https://github.com/openjdk/jdk/blob/2b3254160933e8b11527f801507a9c01b90d22b0/src/hotspot/os/windows/os_windows.cpp#L1382-L1384 >> >> So let's make the IR framework aware of this: we consider there was a JVM error if the OS is windows and the return code -1, or if it's 134 otherwise. I'm not sure what's the most idiomatic/robust way to check whether we are on Windows or not, but it's not customer code: it just needs to work for testing. > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Use Platform.isWindows Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25200#pullrequestreview-2839142327 From mchevalier at openjdk.org Wed May 14 08:00:57 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 14 May 2025 08:00:57 GMT Subject: RFR: 8325647: [IR framework] Only prints stdout if exitCode is 134 [v2] In-Reply-To: References: <8Chv6pneN7s8OzJxIjGxfNVmr1q-StTW1PuGNC3yBJE=.c9f940d8-faac-4114-b3a0-ff449f73c8b5@github.com> Message-ID: On Tue, 13 May 2025 16:03:09 GMT, Marc Chevalier wrote: >> On Linux, `assert` and such eventually use `abort` which give the return code 134 (128 + 6 (code of SIGABRT/SIGIOT)). On Windows, dying returns `-1` (exit code are more-or-less-signed int on Windows): >> >> https://github.com/openjdk/jdk/blob/2b3254160933e8b11527f801507a9c01b90d22b0/src/hotspot/os/windows/os_windows.cpp#L1382-L1384 >> >> So let's make the IR framework aware of this: we consider there was a JVM error if the OS is windows and the return code -1, or if it's 134 otherwise. I'm not sure what's the most idiomatic/robust way to check whether we are on Windows or not, but it's not customer code: it just needs to work for testing. > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Use Platform.isWindows Thanks @chhagedorn and @TobiHartmann for reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25200#issuecomment-2879156896 From mchevalier at openjdk.org Wed May 14 08:00:57 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 14 May 2025 08:00:57 GMT Subject: Integrated: 8325647: [IR framework] Only prints stdout if exitCode is 134 In-Reply-To: <8Chv6pneN7s8OzJxIjGxfNVmr1q-StTW1PuGNC3yBJE=.c9f940d8-faac-4114-b3a0-ff449f73c8b5@github.com> References: <8Chv6pneN7s8OzJxIjGxfNVmr1q-StTW1PuGNC3yBJE=.c9f940d8-faac-4114-b3a0-ff449f73c8b5@github.com> Message-ID: On Tue, 13 May 2025 08:03:21 GMT, Marc Chevalier wrote: > On Linux, `assert` and such eventually use `abort` which give the return code 134 (128 + 6 (code of SIGABRT/SIGIOT)). On Windows, dying returns `-1` (exit code are more-or-less-signed int on Windows): > > https://github.com/openjdk/jdk/blob/2b3254160933e8b11527f801507a9c01b90d22b0/src/hotspot/os/windows/os_windows.cpp#L1382-L1384 > > So let's make the IR framework aware of this: we consider there was a JVM error if the OS is windows and the return code -1, or if it's 134 otherwise. I'm not sure what's the most idiomatic/robust way to check whether we are on Windows or not, but it's not customer code: it just needs to work for testing. This pull request has now been integrated. Changeset: 3b271981 Author: Marc Chevalier URL: https://git.openjdk.org/jdk/commit/3b271981662df2a7fdf04ffd75d017964425607c Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8325647: [IR framework] Only prints stdout if exitCode is 134 Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/25200 From bkilambi at openjdk.org Wed May 14 08:09:53 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 14 May 2025 08:09:53 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v2] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 00:59:44 GMT, Hao Sun wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 553: > >> 551: >> 552: // vector add - predicated >> 553: BINARY_OP_PREDICATE(vaddB, AddVB, sve_add, B) > > stylistic nit: remove the extra spaces. > > In the initial commit, this extra space is added due to `vaddHF`. However, in the latest commit the predicated Float16 rules are removed. Good catch! I'll update a PS soon. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25096#discussion_r2088333521 From mli at openjdk.org Wed May 14 08:14:55 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 14 May 2025 08:14:55 GMT Subject: RFR: 8356869: RISC-V: Improve tail handling of array fill stub [v3] In-Reply-To: <6xC6wZ3gPlHPOO7rytPY5Tv-uSd9SZu0WIT9kHsWihE=.f37cabe3-fc45-4acc-9094-27893e259974@github.com> References: <6xC6wZ3gPlHPOO7rytPY5Tv-uSd9SZu0WIT9kHsWihE=.f37cabe3-fc45-4acc-9094-27893e259974@github.com> Message-ID: On Wed, 14 May 2025 03:28:36 GMT, Anjian-Wen wrote: >> The tail handling after bulk copy in array fill stub may trigger misaligned memory accesses. >> The address is 8-byte aligned after bulk copy and the tail handling copies BYTE, SHORT, and >> INT granules in order. This could trigger misaligned accesses. We should copy the remainings >> in this order: INT, SHORT, and BYTE to avoid such an issue. >> >> JMH data on P550 SBC for reference (@Param("15") private int size): >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 15 avgt 12 961.604 ? 1.497 ns/op >> ArrayFill.fillIntArray 15 avgt 12 29.355 ? 0.024 ns/op >> ArrayFill.fillShortArray 15 avgt 12 569.499 ? 0.662 ns/op >> ArrayFill.zeroByteArray 15 avgt 12 957.080 ? 5.358 ns/op >> ArrayFill.zeroIntArray 15 avgt 12 29.344 ? 0.006 ns/op >> ArrayFill.zeroShortArray 15 avgt 12 569.730 ? 0.441 ns/op >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 15 avgt 12 32.206 ? 0.005 ns/op >> ArrayFill.fillIntArray 15 avgt 12 29.347 ? 0.007 ns/op >> ArrayFill.fillShortArray 15 avgt 12 31.732 ? 0.451 ns/op >> ArrayFill.zeroByteArray 15 avgt 12 32.208 ? 0.007 ns/op >> ArrayFill.zeroIntArray 15 avgt 12 29.346 ? 0.007 ns/op >> ArrayFill.zeroShortArray 15 avgt 12 31.492 ? 0.006 ns/op > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > fix format Marked as reviewed by mli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25210#pullrequestreview-2839204641 From mli at openjdk.org Wed May 14 08:14:56 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 14 May 2025 08:14:56 GMT Subject: RFR: 8356869: RISC-V: Improve tail handling of array fill stub In-Reply-To: References: Message-ID: On Tue, 13 May 2025 19:44:12 GMT, Hamlin Li wrote: >> The tail handling after bulk copy in array fill stub may trigger misaligned memory accesses. >> The address is 8-byte aligned after bulk copy and the tail handling copies BYTE, SHORT, and >> INT granules in order. This could trigger misaligned accesses. We should copy the remainings >> in this order: INT, SHORT, and BYTE to avoid such an issue. >> >> JMH data on P550 SBC for reference (@Param("15") private int size): >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 15 avgt 12 961.604 ? 1.497 ns/op >> ArrayFill.fillIntArray 15 avgt 12 29.355 ? 0.024 ns/op >> ArrayFill.fillShortArray 15 avgt 12 569.499 ? 0.662 ns/op >> ArrayFill.zeroByteArray 15 avgt 12 957.080 ? 5.358 ns/op >> ArrayFill.zeroIntArray 15 avgt 12 29.344 ? 0.006 ns/op >> ArrayFill.zeroShortArray 15 avgt 12 569.730 ? 0.441 ns/op >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 15 avgt 12 32.206 ? 0.005 ns/op >> ArrayFill.fillIntArray 15 avgt 12 29.347 ? 0.007 ns/op >> ArrayFill.fillShortArray 15 avgt 12 31.732 ? 0.451 ns/op >> ArrayFill.zeroByteArray 15 avgt 12 32.208 ? 0.007 ns/op >> ArrayFill.zeroIntArray 15 avgt 12 29.346 ? 0.007 ns/op >> ArrayFill.zeroShortArray 15 avgt 12 31.492 ? 0.006 ns/op > > Nice catch and fix. Looks good. > Can you add the test case `@param("15")` and maybe some more to ArrayFill.java? > @Hamlin-Li Thanks for the review! I have added two sizes to @param. Size = 15 for this case and size = 7 for the case in JDK-8356593. Thank you for updating. Looks good! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25210#issuecomment-2879219627 From duke at openjdk.org Wed May 14 08:19:57 2025 From: duke at openjdk.org (duke) Date: Wed, 14 May 2025 08:19:57 GMT Subject: RFR: 8356869: RISC-V: Improve tail handling of array fill stub [v3] In-Reply-To: <6xC6wZ3gPlHPOO7rytPY5Tv-uSd9SZu0WIT9kHsWihE=.f37cabe3-fc45-4acc-9094-27893e259974@github.com> References: <6xC6wZ3gPlHPOO7rytPY5Tv-uSd9SZu0WIT9kHsWihE=.f37cabe3-fc45-4acc-9094-27893e259974@github.com> Message-ID: <8HkokMUsxMEQnQgKZvvOqCpRutSlauR5-qde8ddogFI=.a05b508c-01c2-45f6-8a01-e2a919492417@github.com> On Wed, 14 May 2025 03:28:36 GMT, Anjian-Wen wrote: >> The tail handling after bulk copy in array fill stub may trigger misaligned memory accesses. >> The address is 8-byte aligned after bulk copy and the tail handling copies BYTE, SHORT, and >> INT granules in order. This could trigger misaligned accesses. We should copy the remainings >> in this order: INT, SHORT, and BYTE to avoid such an issue. >> >> JMH data on P550 SBC for reference (@Param("15") private int size): >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 15 avgt 12 961.604 ? 1.497 ns/op >> ArrayFill.fillIntArray 15 avgt 12 29.355 ? 0.024 ns/op >> ArrayFill.fillShortArray 15 avgt 12 569.499 ? 0.662 ns/op >> ArrayFill.zeroByteArray 15 avgt 12 957.080 ? 5.358 ns/op >> ArrayFill.zeroIntArray 15 avgt 12 29.344 ? 0.006 ns/op >> ArrayFill.zeroShortArray 15 avgt 12 569.730 ? 0.441 ns/op >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 15 avgt 12 32.206 ? 0.005 ns/op >> ArrayFill.fillIntArray 15 avgt 12 29.347 ? 0.007 ns/op >> ArrayFill.fillShortArray 15 avgt 12 31.732 ? 0.451 ns/op >> ArrayFill.zeroByteArray 15 avgt 12 32.208 ? 0.007 ns/op >> ArrayFill.zeroIntArray 15 avgt 12 29.346 ? 0.007 ns/op >> ArrayFill.zeroShortArray 15 avgt 12 31.492 ? 0.006 ns/op > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > fix format @Anjian-Wen Your change (at version 432b31d68883b934ab31a56cb867f381fd533997) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25210#issuecomment-2879239141 From duke at openjdk.org Wed May 14 08:19:55 2025 From: duke at openjdk.org (Anjian-Wen) Date: Wed, 14 May 2025 08:19:55 GMT Subject: RFR: 8356869: RISC-V: Improve tail handling of array fill stub In-Reply-To: References: Message-ID: On Wed, 14 May 2025 08:11:54 GMT, Hamlin Li wrote: >> Nice catch and fix. Looks good. >> Can you add the test case `@param("15")` and maybe some more to ArrayFill.java? > >> @Hamlin-Li Thanks for the review! I have added two sizes to @param. Size = 15 for this case and size = 7 for the case in JDK-8356593. > > Thank you for updating. Looks good! @Hamlin-Li @feilongjiang @RealFYang Thanks for your approve? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25210#issuecomment-2879234947 From epeter at openjdk.org Wed May 14 08:24:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 08:24:01 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:59:48 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - Whitespace >> - Suggestions by Christian >> >> Co-authored-by: Christian Hagedorn >> - typo >> - For Christian: example and more intro >> - fix hashtag >> - manual merge >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - move library >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - ... and 6 more: https://git.openjdk.org/jdk/compare/0844745e...fae7ced6 > > Next batch of comments. Will probably resume tomorrow :-) > > @chhagedorn Ok, I tried my best with the `(Un)FilledTemplate` refactoring. I'm still not sure if I want to rename `FilledTemplate` to `RenderableTemplate`, it is not super satisfying for a beginner either. Naming is hard. If anybody else has a better idea than `(Un)FilledTemplate`, please let me know ;) > > I think one can continue reviewing this now! > > I've just quickly skimmed through this hierarchy. `(Un)FilledTemplate` reminds me a bit of the builder pattern. What about renaming `UnFilledTemplate` to `TemplateBuilder` and `FilledTemplate` to just `Template`? @galderz Thanks for the comment! For me both `UnfilledTemplate` and `FilledTemplate` are Templates. The unfilled one has the arguments not yet applied, the filled one has the argument applied. Calling the `UnfilledTemplate` a `TemplateBuilder` seems a little odd, because it is basically already Template, it just has some holes that need to be filled with arguments. In that sense, it is really similar to what the Java String Template was supposed to be. Personally, I like the explicit naming of `(Un)FilledTemplate`, together with the `fillWith`. This makes it very clear what the "pipeline" is supposed to be. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2879257421 From epeter at openjdk.org Wed May 14 08:27:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 08:27:17 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:59:48 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - Whitespace >> - Suggestions by Christian >> >> Co-authored-by: Christian Hagedorn >> - typo >> - For Christian: example and more intro >> - fix hashtag >> - manual merge >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - move library >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - ... and 6 more: https://git.openjdk.org/jdk/compare/0844745e...fae7ced6 > > Next batch of comments. Will probably resume tomorrow :-) @chhagedorn And I had discussed previously that it is nice to have the static methods like `let`, `body`, `$` etc under `Template`, but then not to use it for anything else. Sure, we could move the static methods under `TemplateUtils` or alike, if we really need to free up the name. But the bigger argument for me is really that `(Un)FilledTemplate` makes things very explicit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2879267349 From mli at openjdk.org Wed May 14 08:30:07 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 14 May 2025 08:30:07 GMT Subject: RFR: 8350960: RISC-V: Add riscv backend for Float16 operations - vectorization [v3] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > It's a follow-up of https://github.com/openjdk/jdk/commit/9a3f9997b68a1f64e53b9711b878fb073c3c9b90. > Thanks! > > ## Test > > Performance data > > Benchmark | (vectorDim) | Mode | Cnt | Score - patch | Score - master | Improvement (master/patch) | Error | Units > -- | -- | -- | -- | -- | -- | -- | -- | -- > Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 382.123 | 2595.718 | 6.793 | 0.631 | ns/op > Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 563.726 | 5167.687 | 9.167 | 0.063 | ns/op > Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 888.455 | 9468.714 | 10.658 | 0.147 | ns/op > Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 1540.255 | 18879.796 | 12.258 | 0.396 | ns/op > Float16OperationsBenchmark.divBenchmark | 256 | avgt | 10 | 579.959 | 4028.335 | 6.946 | 0.008 | ns/op > Float16OperationsBenchmark.divBenchmark | 512 | avgt | 10 | 914.634 | 8034.234 | 8.784 | 0.027 | ns/op > Float16OperationsBenchmark.divBenchmark | 1024 | avgt | 10 | 1494.017 | 15125.924 | 10.124 | 0.292 | ns/op > Float16OperationsBenchmark.divBenchmark | 2048 | avgt | 10 | 2728.517 | 30197.97 | 11.068 | 32.869 | ns/op > Float16OperationsBenchmark.fmaBenchmark | 256 | avgt | 10 | 476.764 | 2817.035 | 5.909 | 0.012 | ns/op > Float16OperationsBenchmark.fmaBenchmark | 512 | avgt | 10 | 707.035 | 5239.438 | 7.41 | 0.129 | ns/op > Float16OperationsBenchmark.fmaBenchmark | 1024 | avgt | 10 | 1114.29 | 7361.105 | 6.606 | 0.024 | ns/op > Float16OperationsBenchmark.fmaBenchmark | 2048 | avgt | 10 | 1931.713 | 14465.602 | 7.488 | 1.852 | ns/op > Float16OperationsBenchmark.maxBenchmark | 256 | avgt | 10 | 501.892 | 3754.563 | 7.481 | 0.408 | ns/op > Float16OperationsBenchmark.maxBenchmark | 512 | avgt | 10 | 738.148 | 7450.666 | 10.094 | 1.206 | ns/op > Float16OperationsBenchmark.maxBenchmark | 1024 | avgt | 10 | 1195.262 | 15463.892 | 12.938 | 8.889 | ns/op > Float16OperationsBenchmark.maxBenchmark | 2048 | avgt | 10 | 2253.656 | 30649.239 | 13.6 | 6.154 | ns/op > Float16OperationsBenchmark.minBenchmark | 256 | avgt | 10 | 501.873 | 3753.9 | 7.48 | 0.298 | ns/op > Float16OperationsBenchmark.minBenchmark ... Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: minor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25181/files - new: https://git.openjdk.org/jdk/pull/25181/files/0ad3b5a9..0e165958 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25181&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25181&range=01-02 Stats: 24 lines in 1 file changed: 0 ins; 8 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/25181.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25181/head:pull/25181 PR: https://git.openjdk.org/jdk/pull/25181 From mli at openjdk.org Wed May 14 08:30:08 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 14 May 2025 08:30:08 GMT Subject: RFR: 8350960: RISC-V: Add riscv backend for Float16 operations - vectorization [v3] In-Reply-To: <4-yFRcNLDHjzdKZ_WK_wHjEr49ADQBoOuQ1u8doVB08=.0bb9788d-ee5e-4999-a687-bdc1f34f4f3f@github.com> References: <4-yFRcNLDHjzdKZ_WK_wHjEr49ADQBoOuQ1u8doVB08=.0bb9788d-ee5e-4999-a687-bdc1f34f4f3f@github.com> Message-ID: On Wed, 14 May 2025 00:21:44 GMT, Fei Yang wrote: >> No, it uses T_SHORT instead, in Float16.java it also uses a short as underlying payload. >> And if you check the generated assembly code, you'll find some code like `vsetivli t0,16,e16,m1,tu,mu`. >> >> To avoid confusion, I will add an assertion here so that it can be understood later. > > Thanks for the answer. I see that is also reflected on the C2 source code [1]. > Why not save this `Matcher::vector_element_basic_type(this)` call then? I mean: > > assert(Matcher::vector_element_basic_type(this) == T_SHORT, "must"); > __ vsetvli_helper(T_SHORT, Matcher::vector_length(this)); > > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectornode.cpp#L63 Make sense, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25181#discussion_r2088371205 From duke at openjdk.org Wed May 14 08:31:06 2025 From: duke at openjdk.org (Anjian-Wen) Date: Wed, 14 May 2025 08:31:06 GMT Subject: Integrated: 8356869: RISC-V: Improve tail handling of array fill stub In-Reply-To: References: Message-ID: On Tue, 13 May 2025 12:53:40 GMT, Anjian-Wen wrote: > The tail handling after bulk copy in array fill stub may trigger misaligned memory accesses. > The address is 8-byte aligned after bulk copy and the tail handling copies BYTE, SHORT, and > INT granules in order. This could trigger misaligned accesses. We should copy the remainings > in this order: INT, SHORT, and BYTE to avoid such an issue. > > JMH data on P550 SBC for reference (@Param("15") private int size): > > Before: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 15 avgt 12 961.604 ? 1.497 ns/op > ArrayFill.fillIntArray 15 avgt 12 29.355 ? 0.024 ns/op > ArrayFill.fillShortArray 15 avgt 12 569.499 ? 0.662 ns/op > ArrayFill.zeroByteArray 15 avgt 12 957.080 ? 5.358 ns/op > ArrayFill.zeroIntArray 15 avgt 12 29.344 ? 0.006 ns/op > ArrayFill.zeroShortArray 15 avgt 12 569.730 ? 0.441 ns/op > > After: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 15 avgt 12 32.206 ? 0.005 ns/op > ArrayFill.fillIntArray 15 avgt 12 29.347 ? 0.007 ns/op > ArrayFill.fillShortArray 15 avgt 12 31.732 ? 0.451 ns/op > ArrayFill.zeroByteArray 15 avgt 12 32.208 ? 0.007 ns/op > ArrayFill.zeroIntArray 15 avgt 12 29.346 ? 0.007 ns/op > ArrayFill.zeroShortArray 15 avgt 12 31.492 ? 0.006 ns/op This pull request has now been integrated. Changeset: b76b6107 Author: Anjian-Wen Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/b76b610788cea7149a04faeeba01067272b6e046 Stats: 20 lines in 2 files changed: 5 ins; 5 del; 10 mod 8356869: RISC-V: Improve tail handling of array fill stub Reviewed-by: fyang, fjiang, mli ------------- PR: https://git.openjdk.org/jdk/pull/25210 From epeter at openjdk.org Wed May 14 08:32:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 08:32:57 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 14:26:37 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more documentation fixes Just to fill things in: Christian proposed the names `(Un)FilledTemplate`, before that I had `Template` and `TemplateWithArgs`, and you would do `Template.withArgs(...)` to get the `TemplateWithArgs`. I would prefer `Template -> TemplateWithArgs`, because the template without args applied is for me the real template. But `(Un)FilledTemplate` is slightly better because it does not let the user wonder what the `Template` is supposed to be, because there is no "qualifying" word in the name. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2879287878 From mhaessig at openjdk.org Wed May 14 08:37:56 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 14 May 2025 08:37:56 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock [v3] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 09:25:43 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> This PR addresses an `assert(bb->is_reachable())` that is triggered in the code for `-XX:+VerifyStack` after a deoptimization with reason `null_assert_or_unreached0` at a `getstatic` bytecode. Following the `getstatic` is an `areturn` and then an unreachable bytecode. When the code for `VerifyStack` tries to compute an oop map for the basic block of the unreachable bytecode, the assert triggers: >> >> getstatic Field A.val:"LB"; // if class B is not loaded, C2 deopts with reason "null_assert_or_unreached0" >> areturn; >> // The following is unreachable >> iconst_0; >> >> >> This is a similar problem to [JDK-8271055](https://bugs.openjdk.org/browse/JDK-8271055) (#7331), but this particular deopt with reason `null_assert_or_unreached0` at `getstatic` of a field containing an object reference [deopts at the next bytecode](https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/opto/parse3.cpp#L176-L199). The aforementioned issue introduced a check to skip stack verification of the next bytecode in the code if the execution after the deopted bytecode does not continue at the next bytecode in the code, i.e. falls through to the next bytecode. Unfortunately, this check did not include `areturn` as a bytecode that does not fall-through: >> https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/runtime/deoptimization.cpp#L845-L856 >> >> # Change Summary >> >> To fix the immediate issue described above, this PR adds `areturn` to the list of bytecodes that does not fall through. However, all return bytecodes exhibit the same behavior and might be susceptible to a similar issue. Even though I was not able to reproduce the same crash with `{d,f,i,l}return` because I could not get those or the preceding bytecode to deopt, I also added them to the `falls_through()` function. For the remaining bytecodes in `falls_through()` with the exception of `athrow` I wrote a regression test. >> >> # Testing >> >> - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14595928439) >> - [x] tier1 through tier3 on Oracle supported platforms and OSs plus Oracle internal testing >> >> # Acknowledgements >> Special thanks to @eme64 for his hard work on reducing a reproducer that works on all platforms. > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Add jsr to falls_through() Thank you both for your thoughtful comments and reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25118#issuecomment-2879301975 From duke at openjdk.org Wed May 14 08:37:56 2025 From: duke at openjdk.org (duke) Date: Wed, 14 May 2025 08:37:56 GMT Subject: RFR: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock [v3] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 09:25:43 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> This PR addresses an `assert(bb->is_reachable())` that is triggered in the code for `-XX:+VerifyStack` after a deoptimization with reason `null_assert_or_unreached0` at a `getstatic` bytecode. Following the `getstatic` is an `areturn` and then an unreachable bytecode. When the code for `VerifyStack` tries to compute an oop map for the basic block of the unreachable bytecode, the assert triggers: >> >> getstatic Field A.val:"LB"; // if class B is not loaded, C2 deopts with reason "null_assert_or_unreached0" >> areturn; >> // The following is unreachable >> iconst_0; >> >> >> This is a similar problem to [JDK-8271055](https://bugs.openjdk.org/browse/JDK-8271055) (#7331), but this particular deopt with reason `null_assert_or_unreached0` at `getstatic` of a field containing an object reference [deopts at the next bytecode](https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/opto/parse3.cpp#L176-L199). The aforementioned issue introduced a check to skip stack verification of the next bytecode in the code if the execution after the deopted bytecode does not continue at the next bytecode in the code, i.e. falls through to the next bytecode. Unfortunately, this check did not include `areturn` as a bytecode that does not fall-through: >> https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/runtime/deoptimization.cpp#L845-L856 >> >> # Change Summary >> >> To fix the immediate issue described above, this PR adds `areturn` to the list of bytecodes that does not fall through. However, all return bytecodes exhibit the same behavior and might be susceptible to a similar issue. Even though I was not able to reproduce the same crash with `{d,f,i,l}return` because I could not get those or the preceding bytecode to deopt, I also added them to the `falls_through()` function. For the remaining bytecodes in `falls_through()` with the exception of `athrow` I wrote a regression test. >> >> # Testing >> >> - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14595928439) >> - [x] tier1 through tier3 on Oracle supported platforms and OSs plus Oracle internal testing >> >> # Acknowledgements >> Special thanks to @eme64 for his hard work on reducing a reproducer that works on all platforms. > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Add jsr to falls_through() @mhaessig Your change (at version c18b8dc039e0b9600118148f7248712f7870815a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25118#issuecomment-2879310988 From dnsimon at openjdk.org Wed May 14 08:42:12 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 14 May 2025 08:42:12 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v15] In-Reply-To: References: Message-ID: On Thu, 8 May 2025 21:19:37 GMT, Erik ?sterlund wrote: >> Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 54 additional commits since the last revision: >> >> - Fix null check >> - Remove unnecessary include >> - Add nullptr check to relocate >> - Fix JVMCI nmethod data >> - Unexclude JVMCI methods >> - Add relocate_nmethod_mirror >> - Only hold NMethodState_lock when needed >> - Exclude JVMCI nmethods >> - Remove StressNMethodRelocation >> - Fix branch_range revert >> - ... and 44 more: https://git.openjdk.org/jdk/compare/b78b9289...9ca3563a > > src/hotspot/share/jvmci/jvmciRuntime.cpp line 852: > >> 850: >> 851: void JVMCINMethodData::relocate_nmethod_mirror(nmethod* nm) { >> 852: oop nmethod_mirror = get_nmethod_mirror(nm, /* phantom_ref */ false); > > Why is phantom false? I assume that's copied from `JVMCINMethodData::invalidate_nmethod_mirror` which was updated in https://github.com/openjdk/jdk/commit/f81c192da929d72be5134ccf195be2a985737504. The description for [JDK-8234359](https://bugs.openjdk.org/browse/JDK-8234359) implies that this somehow avoids enqueuing potentially dead object to the SATB buffer. Is that what we want here @tkrodriguez ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2088398396 From shade at openjdk.org Wed May 14 09:53:01 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 14 May 2025 09:53:01 GMT Subject: RFR: 8356946: x86: Optimize interpreter profile updates Message-ID: Noticed two awkward things in current x86 interpreter profiling code. First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. So we can save a few instructions / memory accesses on this path. Additional testing: - [x] Linux x86_64 server fastdebug, `tier1` - [ ] Linux x86_64 server fastdebug, `all` ------------- Commit messages: - Touchup - Fix Changes: https://git.openjdk.org/jdk/pull/25223/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25223&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356946 Stats: 40 lines in 2 files changed: 0 ins; 30 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/25223.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25223/head:pull/25223 PR: https://git.openjdk.org/jdk/pull/25223 From rcastanedalo at openjdk.org Wed May 14 10:05:59 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 14 May 2025 10:05:59 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 14:26:37 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more documentation fixes Just a couple of glitches while reading the tutorial examples. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 249: > 247: // We can define a custom hook. > 248: // Note: generally we prefer using the pre-defined CLASS_HOOK and METHOD_HOOK from the library, > 249: // when ever possible. See also the example after this one. Suggestion: // whenever possible. See also the example after this one. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 299: > 297: > 298: // We saw the use of custom hooks above, but now we look at the use of CLASS_HOOK and METHOD_HOOK > 299: // from the Temlate Library. Suggestion: // from the Template Library. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 481: > 479: Hooks.CLASS_HOOK.insert(templateStaticField.fillWith(myLong)), > 480: templateStatus, > 481: // We should see a mix if fields and variables sampled. Just guessing from the context: Suggestion: // We should see a mix of fields and variables sampled. ------------- PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2839561192 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088564180 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088565075 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088566704 From aph at openjdk.org Wed May 14 10:47:40 2025 From: aph at openjdk.org (Andrew Haley) Date: Wed, 14 May 2025 10:47:40 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v6] In-Reply-To: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: > This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). > > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op > MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op > MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op > MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op Andrew Haley has updated the pull request incrementally with three additional commits since the last revision: - AvoidUnalignedAccesses - Temp - The cherry on the cake ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25147/files - new: https://git.openjdk.org/jdk/pull/25147/files/87eadb40..e5771988 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25147&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25147&range=04-05 Stats: 20 lines in 1 file changed: 17 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25147/head:pull/25147 PR: https://git.openjdk.org/jdk/pull/25147 From fyang at openjdk.org Wed May 14 10:47:53 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 14 May 2025 10:47:53 GMT Subject: RFR: 8350960: RISC-V: Add riscv backend for Float16 operations - vectorization [v3] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 08:30:07 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> It's a follow-up of https://github.com/openjdk/jdk/commit/9a3f9997b68a1f64e53b9711b878fb073c3c9b90. >> Thanks! >> >> ## Test >> >> Performance data >> >> Benchmark | (vectorDim) | Mode | Cnt | Score - patch | Score - master | Improvement (master/patch) | Error | Units >> -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 382.123 | 2595.718 | 6.793 | 0.631 | ns/op >> Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 563.726 | 5167.687 | 9.167 | 0.063 | ns/op >> Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 888.455 | 9468.714 | 10.658 | 0.147 | ns/op >> Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 1540.255 | 18879.796 | 12.258 | 0.396 | ns/op >> Float16OperationsBenchmark.divBenchmark | 256 | avgt | 10 | 579.959 | 4028.335 | 6.946 | 0.008 | ns/op >> Float16OperationsBenchmark.divBenchmark | 512 | avgt | 10 | 914.634 | 8034.234 | 8.784 | 0.027 | ns/op >> Float16OperationsBenchmark.divBenchmark | 1024 | avgt | 10 | 1494.017 | 15125.924 | 10.124 | 0.292 | ns/op >> Float16OperationsBenchmark.divBenchmark | 2048 | avgt | 10 | 2728.517 | 30197.97 | 11.068 | 32.869 | ns/op >> Float16OperationsBenchmark.fmaBenchmark | 256 | avgt | 10 | 476.764 | 2817.035 | 5.909 | 0.012 | ns/op >> Float16OperationsBenchmark.fmaBenchmark | 512 | avgt | 10 | 707.035 | 5239.438 | 7.41 | 0.129 | ns/op >> Float16OperationsBenchmark.fmaBenchmark | 1024 | avgt | 10 | 1114.29 | 7361.105 | 6.606 | 0.024 | ns/op >> Float16OperationsBenchmark.fmaBenchmark | 2048 | avgt | 10 | 1931.713 | 14465.602 | 7.488 | 1.852 | ns/op >> Float16OperationsBenchmark.maxBenchmark | 256 | avgt | 10 | 501.892 | 3754.563 | 7.481 | 0.408 | ns/op >> Float16OperationsBenchmark.maxBenchmark | 512 | avgt | 10 | 738.148 | 7450.666 | 10.094 | 1.206 | ns/op >> Float16OperationsBenchmark.maxBenchmark | 1024 | avgt | 10 | 1195.262 | 15463.892 | 12.938 | 8.889 | ns/op >> Float16OperationsBenchmark.maxBenchmark | 2048 | avgt | 10 | 2253.656 | 30649.239 | 13.6 | 6.154 | ns/op >> Float16OperationsBenchmark.minBenchmark | 256 | avgt | 10 | 501.873 | 3753.9 | 7.48 ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minor Updated change looks good. And nice JMH numbers! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25181#pullrequestreview-2839695256 From mli at openjdk.org Wed May 14 10:59:51 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 14 May 2025 10:59:51 GMT Subject: RFR: 8350960: RISC-V: Add riscv backend for Float16 operations - vectorization [v3] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 10:44:51 GMT, Fei Yang wrote: > Updated change looks good. And nice JMH numbers! Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25181#issuecomment-2879751747 From jbhateja at openjdk.org Wed May 14 11:37:53 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 14 May 2025 11:37:53 GMT Subject: RFR: 8356281: Fix for TestFPComparison failure due to incorrect result [v3] In-Reply-To: References: Message-ID: <-KhR1iHMlrE3DxQflBGzBrtriF2bfiA0C4eLacFF-Uc=.3d324ec2-38e4-4119-9fcd-ae59338bde5d@github.com> On Mon, 12 May 2025 22:46:10 GMT, Srinivas Vamsi Parasa wrote: >> This PR fixes the cause of failure in TestFPComparison while using APX NDD instructions. >> >> The test passes after using this fix as shown below: >> >> Passed: compiler/c2/irTests/TestFPComparison.java >> Test results: passed: 1 >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR SKIP >> jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison.java >> 1 1 0 0 0 >> ============================== >> TEST SUCCESS > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > add TEMP(dst) Fix looks good to me. src/hotspot/cpu/x86/x86_64.ad line 6597: > 6595: ins_encode %{ > 6596: __ ecmovq(Assembler::parity, $dst$$Register, $src1$$Register, $src2$$Register); > 6597: __ cmovq(Assembler::notEqual, $dst$$Register, $src2$$Register); FTR, Instruction sequence for NaN-APX:- CMOVE DST , SRC if FLAG is PF CMOVE DST , SRC if FLAG is NE With APX CMOVE DST, DEF_DST, SRC if FLAG is PF CMOVE DST, DEF_DST, SRC if FLAT is NE Root cause: So in back-to-back CMOVEs we end up updating the DST with DEF_DST if ZF is set since NE is false and not retain the DST value updated by prior CMOVE, this is not the case with Non-APX case since second DST is not executed is ZF is set and DST retains the value set by prior CMOVE. ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25101#pullrequestreview-2839832657 PR Review Comment: https://git.openjdk.org/jdk/pull/25101#discussion_r2088726274 From rcastanedalo at openjdk.org Wed May 14 11:50:55 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 14 May 2025 11:50:55 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 14:26:37 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more documentation fixes test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestTemplate.java line 78: > 76: testMultiLine(); > 77: testBodyTokens(); > 78: testWithOneArguments(); Another small glitch: Suggestion: testWithOneArgument(); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088749854 From duke at openjdk.org Wed May 14 12:04:56 2025 From: duke at openjdk.org (kuaiwei) Date: Wed, 14 May 2025 12:04:56 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v15] In-Reply-To: References: Message-ID: On Fri, 2 May 2025 10:16:56 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix build error on mac and windows > > src/hotspot/share/opto/addnode.cpp line 975: > >> 973: MergeLoadInfo MergePrimitiveLoads::merge_load_info(LoadNode* load) const { >> 974: const MergeLoadInfo invalid = MergeLoadInfo(); >> 975: const Node* check = bypass_i2l(load); > > What does `check` stand for? Might `load_use` be more descriptive? It's a `LoadNode` or a `ConverI2L` which is unique output of the `Load`, I didn't find a good name, could I add comment to describe node type? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24023#discussion_r2088775699 From epeter at openjdk.org Wed May 14 13:02:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 13:02:45 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v17] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Review suggestions by Roberto Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/0871fcda..95d44f3a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=15-16 Stats: 5 lines in 2 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Wed May 14 13:02:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 13:02:46 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 10:03:24 GMT, Roberto Casta?eda Lozano wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation fixes > > Just a couple of glitches while reading the tutorial examples. @robcasloz Thanks for the fixes, all applied :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2880160089 From epeter at openjdk.org Wed May 14 13:02:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 13:02:48 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 14:26:37 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more documentation fixes test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestTemplate.java line 161: > 159: } > 160: > 161: public static void testWithOneArguments() { Suggestion: public static void testWithOneArgument() { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088884517 From mhaessig at openjdk.org Wed May 14 13:16:02 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 14 May 2025 13:16:02 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v17] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 13:02:45 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Review suggestions by Roberto > > Co-authored-by: Roberto Casta?eda Lozano Thank you, Emanuel, for working on this! I'm already looking forward to using it. I did a superficial pass to get an overview and gain understanding. Apart from some typos, my main concern is reproducability with the randomness introduced in `NameSet`. ------------- PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2839772186 From mhaessig at openjdk.org Wed May 14 13:16:11 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 14 May 2025 13:16:11 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 14:26:37 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more documentation fixes test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 84: > 82: * Creates a normal frame, which has a {@link parent} and which defines an inner > 83: * {@link NameSet}, for the names that are generated inside this frame. Once this > 84: * frame is exited, the name from inside this frame are not available any more. Suggestion: * frame is exited, the name from inside this frame are not available anymore. Another typo nit. test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 91: > 89: > 90: /** > 91: * Creates a special frame, which has a {@link parent} and but uses the {@link NameSet} Suggestion: * Creates a special frame, which has a {@link parent} but uses the {@link NameSet} Not 100% sure, but I think you meant to use the "but". test/hotspot/jtreg/compiler/lib/template_framework/Name.java line 40: > 38: * @param type The type with which we restrict {@link Template#weighNames} and {@link Template#sampleName}. > 39: * @param mutable Defines if the name is considered mutable or immutable. > 40: * @param weight The weight measured by {@link Template#weighNames} and according to which we sample with {@link Template#sampleName}. Suggestion: * @param weight The weight measured by {@link Template#weightNames} and according to which we sample with {@link Template#sampleName}. test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java line 37: > 35: */ > 36: class NameSet { > 37: static final Random RANDOM = Utils.getRandomInstance(); IIUC, we can generate slightly different programs by sampling variables from name sets. For the purposes of reproducing tests it might be useful to seed the randomness and print the seed. test/hotspot/jtreg/compiler/lib/template_framework/README.md line 6: > 4: We want to make it easy to generate variants of tests. Often, we would like to have a set of tests, corresponding to a set of types, a set of operators, a set of constants, etc. Writing all the tests by hand is cumbersome or even impossible. When generating such tests with scripts, it would be preferable if the code generation happens automatically, and the generator script was checked into the code base. Code generation can go beyond simple regression tests, and one might want to generate random code from a list of possible templates, to fuzz individual Java features and compiler optimizations. > 5: > 6: The Template Framework provides a facility to generate code with Templates. Templates are essencially a list of tokens that are concatenated (i.e. rendered) to a String. The Templates can have "holes", which are filled (replaced) by different values at each Template instantiation. For example, these "holes" can be filled with different types, operators or constants. Templates can also be nested, allowing a modular use of Templates. Suggestion: The Template Framework provides a facility to generate code with Templates. Templates are essentially a list of tokens that are concatenated (i.e. rendered) to a String. The Templates can have "holes", which are filled (replaced) by different values at each Template instantiation. For example, these "holes" can be filled with different types, operators or constants. Templates can also be nested, allowing a modular use of Templates. test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 215: > 213: > 214: // We need a CodeFrame to which the hook can insert code. That way, name > 215: // definitions at the hook cannot excape the hookCodeFrame. Suggestion: // definitions at the hook cannot escape the hookCodeFrame. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 328: > 326: /** > 327: * Retrieves the dollar replacement of the {@code 'name'} for the > 328: * current Template that is being instanciated. It returns the same Suggestion: * current Template that is being instantiated. It returns the same test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 357: > 355: * let("b", a * 5), > 356: * """ > 357: * System.out.prinln("Use a and b with hashtag replacement: #a and #b"); Suggestion: * System.out.println("Use a and b with hashtag replacement: #a and #b"); test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 405: > 403: > 404: /** > 405: * The default amount of fuel spent per Template. It is suptracted from the current {@link #fuel} at every Suggestion: * The default amount of fuel spent per Template. It is subtracted from the current {@link #fuel} at every test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 411: > 409: > 410: /** > 411: * The current remaining fuel for nested Templates. Every level of Template nestig Suggestion: * The current remaining fuel for nested Templates. Every level of Template nesting test/hotspot/jtreg/compiler/lib/template_framework/TemplateBinding.java line 28: > 26: /** > 27: * To facilitate recursive uses of Templates, e.g. where a template uses > 28: * itself, where a template needs to be referenced before it is fully defined, Suggestion: * itself and needs to be referenced before it is fully defined, Nit for slightly better text flow. test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 36: > 34: * The {@link parent} relationship provides a trace for the use chain of templates. > 35: * The {@link fuel} is reduced over this chain, to give a heuristic on how much time > 36: * is spend on the code from the template corrsponding to the frame, and to give a Suggestion: * is spent on the code from the template corresponding to the frame, and to give a test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java line 80: > 78: comp.compile(); > 79: > 80: // Object ret = p.xyz.InnterTest.main(); Suggestion: // Object ret = p.xyz.InnerTest.main(); test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java line 96: > 94: // - For a chosen type, operator, and generator. > 95: // - The variable name "GOLD" and the test name "test" would get conflicts > 96: // if we instanciate the template multiple times. Thus, we use the $ prefix Suggestion: // if we instantiate the template multiple times. Thus, we use the $ prefix test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java line 51: > 49: comp.compile(); > 50: > 51: // Object ret = p.xyz.InnterTest.test(); Suggestion: // Object ret = p.xyz.InnerTest.test(); test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 77: > 75: comp.compile(); > 76: > 77: // Object ret = p.xyz.InnterTest1.main(); Suggestion: // Object ret = p.xyz.InnerTest1.main(); test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 140: > 138: // replacements. Appending as a Token works whenever one has a reference > 139: // to the Object in Java code. But often, this is rather cumbersome and > 140: // looks awkward, given al the additional quotes and commans required. Suggestion: // looks awkward, given all the additional quotes and commands required. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 249: > 247: // We can define a custom hook. > 248: // Note: generally we prefer using the pre-defined CLASS_HOOK and METHOD_HOOK from the library, > 249: // when ever possible. See also the example after this one. Perhaps this tutorial should quickly explain the concept of hooks before using them. That might be a duplication of the documentation in `Hook.java`, but it would help the reading flow of the tutorial to grok the concept a bit easier without having to jump between files. I guess it is more of a problem when reading the tutorial on Github, as in the IDE you would get a preview for the documentation. Perhaps like this: Suggestion: // In this example, we look at the use of Hooks. They allow us to refer back in the Template and // to outer scopes, e.g. to define a field at the top of the class from inside a method. public static String generateWithCustomHooks() { // We can define a custom hook. // Note: generally we prefer using the pre-defined CLASS_HOOK and METHOD_HOOK from the library, // when ever possible. See also the example after this one. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 299: > 297: > 298: // We saw the use of custom hooks above, but now we look at the use of CLASS_HOOK and METHOD_HOOK > 299: // from the Temlate Library. Suggestion: // from the Template Library. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088847371 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088849342 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088860008 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088871462 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088872395 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088876295 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088798775 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088798370 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088800322 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088809659 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088895839 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088899984 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088828038 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088828476 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088825587 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088824843 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088823501 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088690378 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088695925 From mhaessig at openjdk.org Wed May 14 13:16:11 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 14 May 2025 13:16:11 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 12:47:12 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation fixes > > test/hotspot/jtreg/compiler/lib/template_framework/Name.java line 40: > >> 38: * @param type The type with which we restrict {@link Template#weighNames} and {@link Template#sampleName}. >> 39: * @param mutable Defines if the name is considered mutable or immutable. >> 40: * @param weight The weight measured by {@link Template#weighNames} and according to which we sample with {@link Template#sampleName}. > > Suggestion: > > * @param weight The weight measured by {@link Template#weightNames} and according to which we sample with {@link Template#sampleName}. Perhaps documenting the weight limits might also be warranted since this constructor is used directly in tests? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088865843 From yzheng at openjdk.org Wed May 14 13:24:02 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 14 May 2025 13:24:02 GMT Subject: RFR: 8356971: [JVMCI] Export VM_Version::supports_avx512_simd_sort to JVMCI compiler Message-ID: HotSpot selects between AVX512 and AVX2 implementations of array sort/partition stubs based on the return value of VM_Version::supports_avx512_simd_sort. The AVX2 version supports fewer element types than the AVX512 version and may fail at runtime if unsupported types are encountered. This capability information should be exposed to the JVMCI compiler to properly guard against incorrect intrinsification. This is especially important because VM_Version::supports_avx512_simd_sort includes a special exclusion rule for AMD Zen4, due to performance considerations. ------------- Commit messages: - 8356971: [JVMCI] Export VM_Version::supports_avx512_simd_sort to JVMCI compiler. Changes: https://git.openjdk.org/jdk/pull/25225/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25225&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356971 Stats: 4 lines in 3 files changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25225.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25225/head:pull/25225 PR: https://git.openjdk.org/jdk/pull/25225 From epeter at openjdk.org Wed May 14 13:29:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 13:29:56 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v17] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 13:13:08 GMT, Manuel H?ssig wrote: > Thank you, Emanuel, for working on this! I'm already looking forward to using it. > > I did a superficial pass to get an overview and gain understanding. Apart from some typos, my main concern is reproducability with the randomness introduced in `NameSet`. @mhaessig Thanks for reviewing ? The randomness depends on a seed, which is usually picked randomly, but can be fixed for reproducibility. That is why I always use `static final Random RANDOM = Utils.getRandomInstance();` from `jdk.test.lib.Utils`. Then the test produces this in the output: For random generator using seed: 3152575406766939100 To re-run test with same seed value please add "-Djdk.test.lib.random.seed=3152575406766939100" to command line. I'll have a look at your comments now :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2880247554 From dnsimon at openjdk.org Wed May 14 13:42:52 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 14 May 2025 13:42:52 GMT Subject: RFR: 8356971: [JVMCI] Export VM_Version::supports_avx512_simd_sort to JVMCI compiler In-Reply-To: References: Message-ID: On Wed, 14 May 2025 13:16:26 GMT, Yudi Zheng wrote: > HotSpot selects between AVX512 and AVX2 implementations of array sort/partition stubs based on the return value of VM_Version::supports_avx512_simd_sort. The AVX2 version supports fewer element types than the AVX512 version and may fail at runtime if unsupported types are encountered. This capability information should be exposed to the JVMCI compiler to properly guard against incorrect intrinsification. This is especially important because VM_Version::supports_avx512_simd_sort includes a special exclusion rule for AMD Zen4, due to performance considerations. Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25225#pullrequestreview-2840252727 From rcastanedalo at openjdk.org Wed May 14 13:43:11 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 14 May 2025 13:43:11 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 14:26:37 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more documentation fixes A few more documentation suggestions, will continue reviewing this changeset over the next days. test/hotspot/jtreg/compiler/lib/template_framework/README.md line 6: > 4: We want to make it easy to generate variants of tests. Often, we would like to have a set of tests, corresponding to a set of types, a set of operators, a set of constants, etc. Writing all the tests by hand is cumbersome or even impossible. When generating such tests with scripts, it would be preferable if the code generation happens automatically, and the generator script was checked into the code base. Code generation can go beyond simple regression tests, and one might want to generate random code from a list of possible templates, to fuzz individual Java features and compiler optimizations. > 5: > 6: The Template Framework provides a facility to generate code with Templates. Templates are essencially a list of tokens that are concatenated (i.e. rendered) to a String. The Templates can have "holes", which are filled (replaced) by different values at each Template instantiation. For example, these "holes" can be filled with different types, operators or constants. Templates can also be nested, allowing a modular use of Templates. Suggestion: The Template Framework provides a facility to generate code with Templates. A Template is essentially a list of tokens that are concatenated (i.e. rendered) to a String. The Templates can have "holes", which are filled (replaced) by different values at each Template instantiation. For example, these "holes" can be filled with different types, operators or constants. Templates can also be nested, allowing a modular use of Templates. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 46: > 44: * > 45: *

> 46: * The Template Framework provides a facility to generate code with Templates. Templates are essencially a list Same grammar/spelling glitch as in the README file. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 50: > 48: * filled (replaced) by different values at each Template instantiation. For example, these "holes" can > 49: * be filled with different types, operators or constants. Templates can also be nested, allowing a modular > 50: * use of Templates. I suggest replacing this text with a high-level summary of what is already written in the README file (or the other way round, write a short summary in the README file and point to this file for more background). test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 150: > 148: * > 149: *

> 150: * Given a {@link UnfilledTemplate}, one must apply the required number of arguments, i.e. fill Suggestion: * Given an {@link UnfilledTemplate}, one must apply the required number of arguments, i.e. fill test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 152: > 150: * Given a {@link UnfilledTemplate}, one must apply the required number of arguments, i.e. fill > 151: * the Template, to arrive at a {@link FilledTemplate}. Note: {@link Template#make(Supplier)}, > 152: * i.e. the making a Template with zero arguments directly returns a {@link FilledTemplate}, Suggestion: * i.e. making a Template with zero arguments directly returns a {@link FilledTemplate}, test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 168: > 166: *

> 167: * When using nested Templates, there can be collisions with identifiers (e.g. variable names and method names). > 168: * For this, Templates provide dollar replacements, which automaticall rename any Suggestion: * For this, Templates provide dollar replacements, which automatically rename any test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 228: > 226: > 227: /** > 228: * Creates a {@link UnfilledTemplate} with one argument. Suggestion: * Creates an {@link UnfilledTemplate} with one argument. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 232: > 230: * > 231: *

> 232: * Here an example with template argument {@code 'a'}, captured once as string name Suggestion: * Here is an example with template argument {@code 'a'}, captured once as string name test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 248: > 246: * @param Type of the (first) argument. > 247: * @param arg0Name The name of the (first) argument for hashtag replacement. > 248: * @return A {@link UnfilledTemplate} with one argument. Suggestion: * @return An {@link UnfilledTemplate} with one argument. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 255: > 253: > 254: /** > 255: * Creates a {@link UnfilledTemplate} with two arguments. Suggestion: * Creates an {@link UnfilledTemplate} with two arguments. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 277: > 275: * @param Type of the second argument. > 276: * @param arg1Name The name of the second argument for hashtag replacement. > 277: * @return A {@link UnfilledTemplate} with two arguments. Suggestion: * @return An {@link UnfilledTemplate} with two arguments. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 284: > 282: > 283: /** > 284: * Creates a {@link UnfilledTemplate} with three arguments. Suggestion: * Creates an {@link UnfilledTemplate} with three arguments. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 294: > 292: * @param Type of the third argument. > 293: * @param arg2Name The name of the third argument for hashtag replacement. > 294: * @return A {@link UnfilledTemplate} with three arguments. Suggestion: * @return An {@link UnfilledTemplate} with three arguments. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 328: > 326: /** > 327: * Retrieves the dollar replacement of the {@code 'name'} for the > 328: * current Template that is being instanciated. It returns the same Suggestion: * current Template that is being instantiated. It returns the same test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 364: > 362: * @param key Name for the hashtag replacement. > 363: * @param value The value that the hashtag is replaced with. > 364: * @return A token that does nothing, so that the {@link #let} cal can easily be put in a list of tokens Suggestion: * @return A token that does nothing, so that the {@link #let} can easily be put in a list of tokens test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 390: > 388: * @param The type of the value. > 389: * @param function The function that is applied with the provided {@code 'value'}. > 390: * @return A token that does nothing, so that the {@link #let} cal can easily be put in a list of tokens Suggestion: * @return A token that does nothing, so that the {@link #let} can easily be put in a list of tokens test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 390: > 388: * @param The type of the value. > 389: * @param function The function that is applied with the provided {@code 'value'}. > 390: * @return A token that does nothing, so that the {@link #let} cal can easily be put in a list of tokens The return type is something else in this case, no? test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 394: > 392: * @throws RendererException if there is a duplicate hashtag {@code key}. > 393: */ > 394: static TemplateBody let(String key, T value, Function function) { I found it a bit confusing to find two methods called `let` that are pretty different in nature. Maybe you could rename this one to e.g. `letIn`? test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 405: > 403: > 404: /** > 405: * The default amount of fuel spent per Template. It is suptracted from the current {@link #fuel} at every Suggestion: * The default amount of fuel spent per Template. It is subtracted from the current {@link #fuel} at every test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 411: > 409: > 410: /** > 411: * The current remaining fuel for nested Templates. Every level of Template nestig Suggestion: * The current remaining fuel for nested Templates. Every level of Template nesting test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 419: > 417: * Example of a recursive Template, which checks the remaining {@link #fuel} at every level, > 418: * and terminates if it reaches zero. It also demonstrates the use of {@link TemplateBinding} for > 419: * the recursive use of Templates. We {@link FilledTemplate#render} with {@code 30} total fuel, and spending {@code 5} fuel at each recursion level. Suggestion: * the recursive use of Templates. We {@link FilledTemplate#render} with {@code 30} total fuel, and spend {@code 5} fuel at each recursion level. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 461: > 459: * @param name The {@link Name} to be added to the current code frame. > 460: * @return The token that performs the defining action. > 461: */ The concept of "code frame" is not clear here, maybe you can introduce it or replace it by a concept that is already defined? test/hotspot/jtreg/compiler/lib/template_framework/UnfilledTemplate.java line 37: > 35: public interface UnfilledTemplate { > 36: /** > 37: * A {@link UnfilledTemplate} with no arguments. Suggestion: * An {@link UnfilledTemplate} with no arguments. test/hotspot/jtreg/compiler/lib/template_framework/UnfilledTemplate.java line 48: > 46: /** > 47: * Creates a {@link FilledTemplate} which can be used as a {@link Token} inside > 48: * a {@link UnfilledTemplate} for nested code generation, and it can also be used with Suggestion: * an {@link UnfilledTemplate} for nested code generation, and it can also be used with test/hotspot/jtreg/compiler/lib/template_framework/UnfilledTemplate.java line 60: > 58: > 59: /** > 60: * A {@link UnfilledTemplate} with one argument. Suggestion: * An {@link UnfilledTemplate} with one argument. test/hotspot/jtreg/compiler/lib/template_framework/UnfilledTemplate.java line 74: > 72: /** > 73: * Creates a {@link FilledTemplate} which can be used as a {@link Token} inside > 74: * a {@link UnfilledTemplate} for nested code generation, and it can also be used with Suggestion: * an {@link UnfilledTemplate} for nested code generation, and it can also be used with test/hotspot/jtreg/compiler/lib/template_framework/UnfilledTemplate.java line 87: > 85: > 86: /** > 87: * A {@link UnfilledTemplate} with two arguments. Suggestion: * An {@link UnfilledTemplate} with two arguments. test/hotspot/jtreg/compiler/lib/template_framework/UnfilledTemplate.java line 103: > 101: /** > 102: * Creates a {@link FilledTemplate} which can be used as a {@link Token} inside > 103: * a {@link UnfilledTemplate} for nested code generation, and it can also be used with Suggestion: * an {@link UnfilledTemplate} for nested code generation, and it can also be used with test/hotspot/jtreg/compiler/lib/template_framework/UnfilledTemplate.java line 139: > 137: > 138: /** > 139: * A {@link UnfilledTemplate} with three arguments. Suggestion: * An {@link UnfilledTemplate} with three arguments. test/hotspot/jtreg/compiler/lib/template_framework/UnfilledTemplate.java line 157: > 155: /** > 156: * Creates a {@link FilledTemplate} which can be used as a {@link Token} inside > 157: * a {@link UnfilledTemplate} for nested code generation, and it can also be used with Suggestion: * an {@link UnfilledTemplate} for nested code generation, and it can also be used with test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java line 96: > 94: // - For a chosen type, operator, and generator. > 95: // - The variable name "GOLD" and the test name "test" would get conflicts > 96: // if we instanciate the template multiple times. Thus, we use the $ prefix Suggestion: // if we instantiate the template multiple times. Thus, we use the $ prefix ------------- PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2839877707 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088754564 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088764847 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088762304 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088854248 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088857353 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088892230 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088947906 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088911264 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088956199 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088949253 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088956811 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088950157 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088957259 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088964462 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088918402 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088922386 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088970477 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088927510 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088944076 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088929241 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088931472 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088939632 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088957977 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088958376 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088958853 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088959282 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088959723 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088960151 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088960450 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088960797 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2088965249 From mchevalier at openjdk.org Wed May 14 13:43:52 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 14 May 2025 13:43:52 GMT Subject: RFR: 8355488: Add stress mode for C2 loop peeling In-Reply-To: References: Message-ID: On Tue, 13 May 2025 10:36:55 GMT, Tobias Hartmann wrote: >> Adding a `StressLoopPeeling` dev flag that randomize peeling. >> >> ## Semantics >> >> For now, the direction I've taken is to randomly take a decision in case of peeling, otherwise, rely on existing heuristics. >> >> This requires to distinguish two things: >> - not inlining because it's not legal: see for instance >> ```cpp >> assert(cl->trip_count() > 0, "peeling a fully unrolled loop"); >> ``` >> in `PhaseIdealLoop::do_peeling` >> - not inlining because it doesn't seem profitable. >> >> Peeling loops without a good reason (not containing an exiting `If` whose condition is not a member of the loop) but without a concrete way to forbid it should always be allowed. Let's stress it! >> >> Peeling too many times is not a great idea either. It uses a lot of memory, of nodes... Also, it may prevent other optimisations from kicking in. And what about interaction with future stress flags? Let's limit peeling: we give a fixed number of opportunities to peel before we give up on peeling for good. That is not the same as limiting the amount of peeling we do. Indeed, if we bound the number of times we say "yes, please, peel" given enough requests, we will always reach the bound. If we limit the number of requests, we have a more evenly distributed amount of peeling, between 0 and the bound. >> >> I've tried without the bound: I couldn't find any bug without the bound that would not reproduce with the bound. It only save some legitimate memory problems. Without a bound on the number of peeling opportunities, hotspot eats a lot of memory, but all the allocations seems reasonable: it just seems we ask too much. We could limit the number of nodes, to prevent peeling before we reach the memory limit, but that would also hinder other optimizations and (future) stress flags. >> >> >> >> ## The Flag >> >> The flag is very specialized, unlike a `StressLoopOpts` would be. My idea so far is "let's see". My idea is that it's good to be able to enable stress optimizations selectively, and have a flag like `StressLoopOpts` that would turn them all: we could use the general one in testing, and the finer-grain ones when debugging. A reason for that is that I don't see a real use-case for stressing some features but not others (which would make the number of combinations explode): having (for instance) `+StressLoopUnrolling +StressLoopPeeling` would sometimes behave like `+StressLoopUnrolling -StressLoopPeeling`, and so it's not very useful to test the latter. >> >> But once again: let'... > > src/hotspot/share/opto/compile.cpp line 666: > >> 664: _congraph(nullptr), >> 665: NOT_PRODUCT(_igv_printer(nullptr) COMMA) >> 666: NOT_PRODUCT(_peeling_rounds_of_node(comp_arena(), 8, 0, Pair(0, 0)) COMMA) > > `NOT_PRODUCT` means that it's also available in the optimized build but you only want/need it in debug. Should I use `#ifdef ASSERT`? My thinking is that I want the code to be there whenever the flag is available. This is decided here: https://github.com/openjdk/jdk/blob/a989245a2424d136f5d2a828eda666c3867b0f48/src/hotspot/share/runtime/flags/jvmFlag.cpp#L552-L556 (called from `JVMFlag::find_flag` with `return_flag == false`, from `Arguments::find_jvm_flag`, from `Arguments::parse_argument` from `Arguments::process_argument`) with https://github.com/openjdk/jdk/blob/a989245a2424d136f5d2a828eda666c3867b0f48/src/hotspot/share/runtime/flags/jvmFlag.cpp#L60-L66 My thinking is that it's unintuitive to me to offer a flag that could have no effect. Why don't we want it in optimized non-product build? We can still stress-peel and even if we won't hit an assert, we could still observe unexpected behaviors or crashes. Does that make sense? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25140#discussion_r2088982156 From pminborg at openjdk.org Wed May 14 13:48:54 2025 From: pminborg at openjdk.org (Per Minborg) Date: Wed, 14 May 2025 13:48:54 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v6] In-Reply-To: References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: On Wed, 14 May 2025 10:47:40 GMT, Andrew Haley wrote: >> This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). >> >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op >> MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op >> MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op >> MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op > > Andrew Haley has updated the pull request incrementally with three additional commits since the last revision: > > - AvoidUnalignedAccesses > - Temp > - The cherry on the cake The changes in `SegmentBulkOperations` look good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25147#issuecomment-2880312073 From epeter at openjdk.org Wed May 14 13:57:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 13:57:02 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 12:44:09 GMT, Roberto Casta?eda Lozano wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation fixes > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 150: > >> 148: * >> 149: *

>> 150: * Given a {@link UnfilledTemplate}, one must apply the required number of arguments, i.e. fill > > Suggestion: > > * Given an {@link UnfilledTemplate}, one must apply the required number of arguments, i.e. fill This is an artefact from the renaming `Template` -> `UnfilledTemplate`. We will have those again if we rename again. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2089015842 From epeter at openjdk.org Wed May 14 14:06:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 14:06:40 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v18] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Manuel H?ssig Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/95d44f3a..e09b8a11 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=16-17 Stats: 25 lines in 7 files changed: 0 ins; 0 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From chagedorn at openjdk.org Wed May 14 14:10:51 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 14 May 2025 14:10:51 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases In-Reply-To: References: <5K47IsqYMjuNzxd9ZyBuejxLNpw_y9F0-SaVAEFY3yo=.b41fb57a-78a8-425d-b405-319a13800b0a@github.com> Message-ID: <6yM8EzLBPpXvnTulQIJi8TbNgu5667IhN85Q8YhPLSA=.c63d6c10-ecb8-4c67-b056-75b965174377@github.com> On Tue, 13 May 2025 06:59:10 GMT, Roberto Casta?eda Lozano wrote: > > > Thanks for working on the Manuel, looks very useful! Have you considered using the Unified Logging (UL) instead of creating a new JVM flag for this? We already have `-Xlog:jit+compilation` that seems related to this. You might print the compile phase information with e.g. `-Xlog:jit+compilation=trace`, or add a new UL tag if necessary. > > > We want to move towards using the UL framework in the JVM compiler components, now that the preparation work by @anton-seoane is completed. > > > > > > UL is definitely the long-term solution. But given that we have more levels with this new flag (-1 to 6) than UL provides (trace, info, etc.), how could we do it with UL? > > Good point, I guess we would have to remap the IGV print levels to the (fewer) UL logging levels. That's true that these are probably too many levels. It would just sad when we want to (re-)add another level but we already used all UL levels. But maybe with UL, we want to have different tags or something like that. > But I am also OK with adding a new JVM flag in the context of this RFE and revisiting it when migrating to UL. I agree with that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25183#issuecomment-2880394501 From epeter at openjdk.org Wed May 14 14:12:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 14:12:18 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v19] In-Reply-To: References: Message-ID: <4GMjP_rzqY1q8Nf4fu-cF4WPXH2bj-D3ytw7uv3O4bs=.2870b6c7-8a51-4a4f-b04d-edb0a024cbd9@github.com> > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: another idea from Manuel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/e09b8a11..ec52074b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=17-18 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Wed May 14 14:12:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 14:12:19 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: <_nMv2NPfdCbtR_FUNc7Mfi4C72d9VD-nkwT4cJn_FAc=.b698385c-f8e5-4bdb-b1c6-7a19b602b38e@github.com> On Wed, 14 May 2025 11:10:58 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation fixes > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 249: > >> 247: // We can define a custom hook. >> 248: // Note: generally we prefer using the pre-defined CLASS_HOOK and METHOD_HOOK from the library, >> 249: // when ever possible. See also the example after this one. > > Perhaps this tutorial should quickly explain the concept of hooks before using them. That might be a duplication of the documentation in `Hook.java`, but it would help the reading flow of the tutorial to grok the concept a bit easier without having to jump between files. > > I guess it is more of a problem when reading the tutorial on Github, as in the IDE you would get a preview for the documentation. > > Perhaps like this: > Suggestion: > > // In this example, we look at the use of Hooks. They allow us to refer back in the Template and > // to outer scopes, e.g. to define a field at the top of the class from inside a method. > public static String generateWithCustomHooks() { > // We can define a custom hook. > // Note: generally we prefer using the pre-defined CLASS_HOOK and METHOD_HOOK from the library, > // when ever possible. See also the example after this one. Nice idea. Applied something similar :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2089047386 From epeter at openjdk.org Wed May 14 14:15:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 14:15:04 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: <0jms935c2JtfbL3ewhsbHPSUJff9rQCwH6Hdz0V3v1Y=.32a4d377-9cdb-4305-8c8e-f3312a0907d8@github.com> On Wed, 14 May 2025 12:51:44 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation fixes > > test/hotspot/jtreg/compiler/lib/template_framework/NameSet.java line 37: > >> 35: */ >> 36: class NameSet { >> 37: static final Random RANDOM = Utils.getRandomInstance(); > > IIUC, we can generate slightly different programs by sampling variables from name sets. For the purposes of reproducing tests it might be useful to seed the randomness and print the seed. That is exactly what `Utils.getRandomInstance();` already does! It produces lines like this: For random generator using seed: 3152575406766939100 To re-run test with same seed value please add "-Djdk.test.lib.random.seed=3152575406766939100" to command line. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2089056289 From mhaessig at openjdk.org Wed May 14 14:15:08 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 14 May 2025 14:15:08 GMT Subject: Integrated: 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock In-Reply-To: References: Message-ID: On Thu, 8 May 2025 13:22:55 GMT, Manuel H?ssig wrote: > # Issue Summary > > This PR addresses an `assert(bb->is_reachable())` that is triggered in the code for `-XX:+VerifyStack` after a deoptimization with reason `null_assert_or_unreached0` at a `getstatic` bytecode. Following the `getstatic` is an `areturn` and then an unreachable bytecode. When the code for `VerifyStack` tries to compute an oop map for the basic block of the unreachable bytecode, the assert triggers: > > getstatic Field A.val:"LB"; // if class B is not loaded, C2 deopts with reason "null_assert_or_unreached0" > areturn; > // The following is unreachable > iconst_0; > > > This is a similar problem to [JDK-8271055](https://bugs.openjdk.org/browse/JDK-8271055) (#7331), but this particular deopt with reason `null_assert_or_unreached0` at `getstatic` of a field containing an object reference [deopts at the next bytecode](https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/opto/parse3.cpp#L176-L199). The aforementioned issue introduced a check to skip stack verification of the next bytecode in the code if the execution after the deopted bytecode does not continue at the next bytecode in the code, i.e. falls through to the next bytecode. Unfortunately, this check did not include `areturn` as a bytecode that does not fall-through: > https://github.com/openjdk/jdk/blob/ad07426fab3396caefd7c08d924e085c1f6f61ba/src/hotspot/share/runtime/deoptimization.cpp#L845-L856 > > # Change Summary > > To fix the immediate issue described above, this PR adds `areturn` to the list of bytecodes that does not fall through. However, all return bytecodes exhibit the same behavior and might be susceptible to a similar issue. Even though I was not able to reproduce the same crash with `{d,f,i,l}return` because I could not get those or the preceding bytecode to deopt, I also added them to the `falls_through()` function. For the remaining bytecodes in `falls_through()` with the exception of `athrow` I wrote a regression test. > > # Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14595928439) > - [x] tier1 through tier3 on Oracle supported platforms and OSs plus Oracle internal testing > > # Acknowledgements > Special thanks to @eme64 for his hard work on reducing a reproducer that works on all platforms. This pull request has now been integrated. Changeset: 97b0dd21 Author: Manuel H?ssig Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/97b0dd2167530b3d237e748cd5da0130e38e8af2 Stats: 211 lines in 3 files changed: 209 ins; 2 del; 0 mod 8336906: C2: assert(bb->is_reachable()) failed: getting result from unreachable basicblock Co-authored-by: Emanuel Peter Co-authored-by: Dean Long Reviewed-by: epeter, dlong ------------- PR: https://git.openjdk.org/jdk/pull/25118 From chagedorn at openjdk.org Wed May 14 14:17:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 14 May 2025 14:17:59 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases [v5] In-Reply-To: References: Message-ID: On Tue, 13 May 2025 09:11:08 GMT, Manuel H?ssig wrote: >> This PR introduces the flag `-XX:PrintPhaseLevel` that works like the flag `-XX:PrintIdealGraphLevel` and prints the name phases of a C2 compilation (essentially what we have in the left side bar in IGV) to the terminal. This allows redirecting the output to a file and comparing phase decisions between two compilations. Further, it is useful in conjunction with loop opts tracing to immediately see in which phase a certain optimization happened. >> >>

>> Output with `-XX:PrintPhaseLevel=2` >> >> >>> java-fastdebug -Xbatch -XX:CompileCommand=compileonly,TestLoop.test10 -XX:CompileCommand=printcompilation,TestLoop.test* -XX:PrintPhaseLevel=2 TestLoop.java >> CompileCommand: compileonly TestLoop.test10 bool compileonly = true >> CompileCommand: PrintCompilation TestLoop.test* bool PrintCompilation = true >> 3577 98 % b 3 TestLoop::test10 @ 2 (64 bytes) >> 3584 99 b 3 TestLoop::test10 (64 bytes) >> 3648 100 % b 4 TestLoop::test10 @ 2 (64 bytes) >> 1. After Parsing >> 2. Iter GVN 1 >> 3. Incremental Inline >> 4. Incremental Boxing Inline >> 5. Before Loop Optimizations >> 6. PhaseIdealLoop 1 >> 7. PhaseIdealLoop 2 >> 8. PhaseIdealLoop 3 >> 9. Before PhaseCCP 1 >> 10. PhaseCCP 1 >> 11. Iter GVN 2 >> 12. PhaseIdealLoop iterations >> 13. After Loop Optimizations >> 14. After Macro Expansion >> 15. Barrier expand >> 16. Optimize finished >> 17. Before matching >> 18. After matching >> 19. Global code motion >> 20. Register Allocation >> 21. Final Code >> 3668 103 b 4 TestLoop::test10 (64 bytes) >> 1. After Parsing >> 2. Iter GVN 1 >> 3. Incremental Inline >> 4. Incremental Boxing Inline >> 5. Before Loop Optimizations >> 6. PhaseIdealLoop 1 >> 7. PhaseIdealLoop 2 >> 8. PhaseIdealLoop 3 >> 9. Before PhaseCCP 1 >> 10. PhaseCCP 1 >> 11. Iter GVN 2 >> 12. PhaseIdealLoop iterations >> 13. PhaseIdealLoop iterations 2 >> 14. PhaseIdealLoop iterations 3 >> 15. PhaseIdealLoop iterations 4 >> 16. PhaseIdealLoop iterations 5 >> 17. PhaseIdealLoop iterations 6 >> 18. PhaseIdealLoop iterations 7 >> 19. PhaseIdealLoop iterations 8 >> 20. PhaseIdealLoop iterations 9 >> 21. After Loop Optimizations >> 22. After Macro Expansion >> 23. Barrier expand >> 24. Optimize finished >> 25. Before matching >> 26. After matching >> 27. Global code motion >> 28. Registe... > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Move functions into !PRODUCT Looks good to me otherwise! src/hotspot/share/opto/c2_globals.hpp line 395: > 393: "6=all details printed. " \ > 394: "Level of detail of printouts can be set on a per-method level " \ > 395: "as well by using CompileCommand=option.") \ Suggestion: "as well by using CompileCommand=PrintPhaseLevel.") \ src/hotspot/share/opto/compile.cpp line 5180: > 5178: bool Compile::should_print_phase(const int level) const { > 5179: return PrintPhaseLevel > 0 && directive()->PhasePrintLevelOption >= level && > 5180: _method != nullptr; // Do not print phases for stubs. Maybe align: Suggestion: return PrintPhaseLevel > 0 && directive()->PhasePrintLevelOption >= level && _method != nullptr; // Do not print phases for stubs. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25183#pullrequestreview-2833207034 PR Review Comment: https://git.openjdk.org/jdk/pull/25183#discussion_r2084665235 PR Review Comment: https://git.openjdk.org/jdk/pull/25183#discussion_r2089049805 From epeter at openjdk.org Wed May 14 14:19:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 14:19:27 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v20] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - Apply suggestions from code review Co-authored-by: Roberto Casta?eda Lozano - Apply suggestions from code review Co-authored-by: Manuel H?ssig ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/ec52074b..63d0326e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=18-19 Stats: 12 lines in 4 files changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Wed May 14 14:28:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 14:28:00 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 13:33:55 GMT, Roberto Casta?eda Lozano wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation fixes > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 328: > >> 326: /** >> 327: * Retrieves the dollar replacement of the {@code 'name'} for the >> 328: * current Template that is being instanciated. It returns the same > > Suggestion: > > * current Template that is being instantiated. It returns the same Manuel also suggested it :) > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java line 96: > >> 94: // - For a chosen type, operator, and generator. >> 95: // - The variable name "GOLD" and the test name "test" would get conflicts >> 96: // if we instanciate the template multiple times. Thus, we use the $ prefix > > Suggestion: > > // if we instantiate the template multiple times. Thus, we use the $ prefix Manuel also suggested it :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2089082778 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2089082333 From epeter at openjdk.org Wed May 14 14:31:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 14:31:00 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 13:36:23 GMT, Roberto Casta?eda Lozano wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation fixes > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 390: > >> 388: * @param The type of the value. >> 389: * @param function The function that is applied with the provided {@code 'value'}. >> 390: * @return A token that does nothing, so that the {@link #let} cal can easily be put in a list of tokens > > The return type is something else in this case, no? Oh, good catch! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2089089776 From epeter at openjdk.org Wed May 14 14:36:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 14:36:07 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 11:51:03 GMT, Roberto Casta?eda Lozano wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation fixes > > test/hotspot/jtreg/compiler/lib/template_framework/README.md line 6: > >> 4: We want to make it easy to generate variants of tests. Often, we would like to have a set of tests, corresponding to a set of types, a set of operators, a set of constants, etc. Writing all the tests by hand is cumbersome or even impossible. When generating such tests with scripts, it would be preferable if the code generation happens automatically, and the generator script was checked into the code base. Code generation can go beyond simple regression tests, and one might want to generate random code from a list of possible templates, to fuzz individual Java features and compiler optimizations. >> 5: >> 6: The Template Framework provides a facility to generate code with Templates. Templates are essencially a list of tokens that are concatenated (i.e. rendered) to a String. The Templates can have "holes", which are filled (replaced) by different values at each Template instantiation. For example, these "holes" can be filled with different types, operators or constants. Templates can also be nested, allowing a modular use of Templates. > > Suggestion: > > The Template Framework provides a facility to generate code with Templates. A Template is essentially a list of tokens that are concatenated (i.e. rendered) to a String. The Templates can have "holes", which are filled (replaced) by different values at each Template instantiation. For example, these "holes" can be filled with different types, operators or constants. Templates can also be nested, allowing a modular use of Templates. Nice, fixed :) > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 46: > >> 44: * >> 45: *

>> 46: * The Template Framework provides a facility to generate code with Templates. Templates are essencially a list > > Same grammar/spelling glitch as in the README file. Fixed! > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 405: > >> 403: >> 404: /** >> 405: * The default amount of fuel spent per Template. It is suptracted from the current {@link #fuel} at every > > Suggestion: > > * The default amount of fuel spent per Template. It is subtracted from the current {@link #fuel} at every Manuel already caught it! > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 411: > >> 409: >> 410: /** >> 411: * The current remaining fuel for nested Templates. Every level of Template nestig > > Suggestion: > > * The current remaining fuel for nested Templates. Every level of Template nesting Manuel already caught it! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2089100667 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2089099220 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2089096506 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2089097427 From epeter at openjdk.org Wed May 14 14:43:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 14:43:40 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v21] In-Reply-To: References: Message-ID: <8F8AYnD4u0Yr7rknjiEDC3dHppW_TzmUQhn1R3JzsF8=.3649dcaf-ee3c-4405-a47a-2d9c16955ab3@github.com> > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8344942-TemplateFramework-v3' of https://github.com/eme64/jdk into JDK-8344942-TemplateFramework-v3 - more small fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/63d0326e..725f6179 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=19-20 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Wed May 14 14:51:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 14:51:58 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: <7QJJnZqTbVS7CwiGylXaI_7r1yCeQcH7JqPUechIB4A=.c40078f0-89b4-4c9b-a744-7aa429a35100@github.com> On Wed, 14 May 2025 13:20:00 GMT, Roberto Casta?eda Lozano wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation fixes > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 394: > >> 392: * @throws RendererException if there is a duplicate hashtag {@code key}. >> 393: */ >> 394: static TemplateBody let(String key, T value, Function function) { > > I found it a bit confusing to find two methods called `let` that are pretty different in nature. Maybe you could rename this one to e.g. `letIn`? @robcasloz They do pretty much the same though, they allow you to set a hashtag replacement. It is just a question of where you can place it, and if it captures the value in a Java variable as well. What do you mean to suggest with the name `setIn`? > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 461: > >> 459: * @param name The {@link Name} to be added to the current code frame. >> 460: * @return The token that performs the defining action. >> 461: */ > > The concept of "code frame" is not clear here, maybe you can introduce it or replace it by a concept that is already defined? I specified it a little: ~ 453 * Add a {@link Name} in the current scope, i.e. the innermost of either + 454 * {@link Template#body} or {@link Hook#set}. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2089135773 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2089132856 From epeter at openjdk.org Wed May 14 15:02:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 15:02:57 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: <1KfpDttB7jzWLna3uL6ZT2yvC43YL_2qvYSfS61ctXs=.4a9e5e00-4ba1-4a53-9056-02420c2890b4@github.com> On Wed, 14 May 2025 11:55:32 GMT, Roberto Casta?eda Lozano wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation fixes > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 50: > >> 48: * filled (replaced) by different values at each Template instantiation. For example, these "holes" can >> 49: * be filled with different types, operators or constants. Templates can also be nested, allowing a modular >> 50: * use of Templates. > > I suggest replacing this text with a high-level summary of what is already written in the README file (or the other way round, write a short summary in the README file and point to this file for more background). I remember discussing with @chhagedorn about what goes in the README and what in the Java file. Personally, I think as much as possible should go into the Java file. I think @chhagedorn prefers the README. So I ended up with a bit of a compromise, putting the same introduction in both places, but doing the details in the Java file. And I think I want to keep it this way. Unless @chhagedorn is ok with it if we just remove all but the first paragraph, and only refer to the Java file docs. What I'll do now anyway, based on your suggestion @robcasloz : link from the README to the Java file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2089159984 From epeter at openjdk.org Wed May 14 15:08:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 15:08:49 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v22] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more applied suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/725f6179..ed1eead6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=20-21 Stats: 13 lines in 3 files changed: 6 ins; 5 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Wed May 14 15:08:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 15:08:50 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: <1KfpDttB7jzWLna3uL6ZT2yvC43YL_2qvYSfS61ctXs=.4a9e5e00-4ba1-4a53-9056-02420c2890b4@github.com> References: <1KfpDttB7jzWLna3uL6ZT2yvC43YL_2qvYSfS61ctXs=.4a9e5e00-4ba1-4a53-9056-02420c2890b4@github.com> Message-ID: On Wed, 14 May 2025 15:00:24 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 50: >> >>> 48: * filled (replaced) by different values at each Template instantiation. For example, these "holes" can >>> 49: * be filled with different types, operators or constants. Templates can also be nested, allowing a modular >>> 50: * use of Templates. >> >> I suggest replacing this text with a high-level summary of what is already written in the README file (or the other way round, write a short summary in the README file and point to this file for more background). > > I remember discussing with @chhagedorn about what goes in the README and what in the Java file. Personally, I think as much as possible should go into the Java file. I think @chhagedorn prefers the README. So I ended up with a bit of a compromise, putting the same introduction in both places, but doing the details in the Java file. And I think I want to keep it this way. Unless @chhagedorn is ok with it if we just remove all but the first paragraph, and only refer to the Java file docs. > > What I'll do now anyway, based on your suggestion @robcasloz : link from the README to the Java file. And: in the Java file, I move the comment about the CompileFramework up, to mirror the README. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2089165536 From mhaessig at openjdk.org Wed May 14 15:00:30 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 14 May 2025 15:00:30 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases In-Reply-To: <6yM8EzLBPpXvnTulQIJi8TbNgu5667IhN85Q8YhPLSA=.c63d6c10-ecb8-4c67-b056-75b965174377@github.com> References: <5K47IsqYMjuNzxd9ZyBuejxLNpw_y9F0-SaVAEFY3yo=.b41fb57a-78a8-425d-b405-319a13800b0a@github.com> <6yM8EzLBPpXvnTulQIJi8TbNgu5667IhN85Q8YhPLSA=.c63d6c10-ecb8-4c67-b056-75b965174377@github.com> Message-ID: <6LH-wBvGZxtujmnWOfHV8iBeBHWD3KmQ8Jddk4cgAmY=.971d802f-3bab-40f7-96ba-25291f9cec39@github.com> On Wed, 14 May 2025 14:08:40 GMT, Christian Hagedorn wrote: >>> > Thanks for working on the Manuel, looks very useful! Have you considered using the Unified Logging (UL) instead of creating a new JVM flag for this? We already have `-Xlog:jit+compilation` that seems related to this. You might print the compile phase information with e.g. `-Xlog:jit+compilation=trace`, or add a new UL tag if necessary. >>> > We want to move towards using the UL framework in the JVM compiler components, now that the preparation work by @anton-seoane is completed. >>> >>> UL is definitely the long-term solution. But given that we have more levels with this new flag (-1 to 6) than UL provides (trace, info, etc.), how could we do it with UL? >> >> Good point, I guess we would have to remap the IGV print levels to the (fewer) UL logging levels. I think that would be OK, we probably do not need that many different print levels for IGV anyway. But I am also OK with adding a new JVM flag in the context of this RFE and revisiting it when migrating to UL. > >> > > Thanks for working on the Manuel, looks very useful! Have you considered using the Unified Logging (UL) instead of creating a new JVM flag for this? We already have `-Xlog:jit+compilation` that seems related to this. You might print the compile phase information with e.g. `-Xlog:jit+compilation=trace`, or add a new UL tag if necessary. >> > > We want to move towards using the UL framework in the JVM compiler components, now that the preparation work by @anton-seoane is completed. >> > >> > >> > UL is definitely the long-term solution. But given that we have more levels with this new flag (-1 to 6) than UL provides (trace, info, etc.), how could we do it with UL? >> >> Good point, I guess we would have to remap the IGV print levels to the (fewer) UL logging levels. > > That's true that these are probably too many levels. It would just sad when we want to (re-)add another level but we already used all UL levels. But maybe with UL, we want to have different tags or something like that. > >> But I am also OK with adding a new JVM flag in the context of this RFE and revisiting it when migrating to UL. > > I agree with that. Thank you for the review and the suggestions @chhagedorn. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25183#issuecomment-2880561555 From kvn at openjdk.org Wed May 14 15:10:08 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 14 May 2025 15:10:08 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v6] In-Reply-To: References: Message-ID: On Tue, 13 May 2025 18:03:09 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Update test to make it more resilient > > Signed-off-by: Ashutosh Mehra > - Remove more unused code > > Signed-off-by: Ashutosh Mehra > - Fix whitespace issue. Remove unused code. > > Signed-off-by: Ashutosh Mehra > - Add test for using AOTCodeCache with different CompressedOops > configuration > > Signed-off-by: Ashutosh Mehra > - Add check for compressed oops base address; minor refacotring > > Signed-off-by: Ashutosh Mehra > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - ... and 8 more: https://git.openjdk.org/jdk/compare/d1543429...5d7c3aa0 My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25019#pullrequestreview-2840581163 From epeter at openjdk.org Wed May 14 15:14:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 14 May 2025 15:14:04 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v17] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 13:13:08 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Review suggestions by Roberto >> >> Co-authored-by: Roberto Casta?eda Lozano > > Thank you, Emanuel, for working on this! I'm already looking forward to using it. > > I did a superficial pass to get an overview and gain understanding. Apart from some typos, my main concern is reproducability with the randomness introduced in `NameSet`. @mhaessig I applied all your suggestions! @robcasloz I applied everything, except in https://github.com/openjdk/jdk/pull/24217#discussion_r2088927510 I did not. If you still want me to do it, we can continue discussing :) ------------ Offline, we have spent quite a lot of time discussing the naming of the Template, i.e. my original `Template(WithArgs)` vs what we have now `(Un)FilledTemplate`, or further alternatives. The best suggestion so far is `TemplateWithFreeArgs` for one/two/three args where the args are not yet applied, and `RenderableTemplate` for zero, and one/two/three with applied args. We are still discussing what the transition name should be, best options so far: `with`, `withArgs`, `applyArgs`, `apply`. As explained offline: making this renaming is surprisingly time consuming, so I will only do it once I get the final approval from you both @chhagedorn @robcasloz , and maybe even @mhaessig :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2880614203 From mhaessig at openjdk.org Wed May 14 15:00:30 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 14 May 2025 15:00:30 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases [v6] In-Reply-To: References: Message-ID: > This PR introduces the flag `-XX:PrintPhaseLevel` that works like the flag `-XX:PrintIdealGraphLevel` and prints the name phases of a C2 compilation (essentially what we have in the left side bar in IGV) to the terminal. This allows redirecting the output to a file and comparing phase decisions between two compilations. Further, it is useful in conjunction with loop opts tracing to immediately see in which phase a certain optimization happened. > >

> Output with `-XX:PrintPhaseLevel=2` > > >> java-fastdebug -Xbatch -XX:CompileCommand=compileonly,TestLoop.test10 -XX:CompileCommand=printcompilation,TestLoop.test* -XX:PrintPhaseLevel=2 TestLoop.java > CompileCommand: compileonly TestLoop.test10 bool compileonly = true > CompileCommand: PrintCompilation TestLoop.test* bool PrintCompilation = true > 3577 98 % b 3 TestLoop::test10 @ 2 (64 bytes) > 3584 99 b 3 TestLoop::test10 (64 bytes) > 3648 100 % b 4 TestLoop::test10 @ 2 (64 bytes) > 1. After Parsing > 2. Iter GVN 1 > 3. Incremental Inline > 4. Incremental Boxing Inline > 5. Before Loop Optimizations > 6. PhaseIdealLoop 1 > 7. PhaseIdealLoop 2 > 8. PhaseIdealLoop 3 > 9. Before PhaseCCP 1 > 10. PhaseCCP 1 > 11. Iter GVN 2 > 12. PhaseIdealLoop iterations > 13. After Loop Optimizations > 14. After Macro Expansion > 15. Barrier expand > 16. Optimize finished > 17. Before matching > 18. After matching > 19. Global code motion > 20. Register Allocation > 21. Final Code > 3668 103 b 4 TestLoop::test10 (64 bytes) > 1. After Parsing > 2. Iter GVN 1 > 3. Incremental Inline > 4. Incremental Boxing Inline > 5. Before Loop Optimizations > 6. PhaseIdealLoop 1 > 7. PhaseIdealLoop 2 > 8. PhaseIdealLoop 3 > 9. Before PhaseCCP 1 > 10. PhaseCCP 1 > 11. Iter GVN 2 > 12. PhaseIdealLoop iterations > 13. PhaseIdealLoop iterations 2 > 14. PhaseIdealLoop iterations 3 > 15. PhaseIdealLoop iterations 4 > 16. PhaseIdealLoop iterations 5 > 17. PhaseIdealLoop iterations 6 > 18. PhaseIdealLoop iterations 7 > 19. PhaseIdealLoop iterations 8 > 20. PhaseIdealLoop iterations 9 > 21. After Loop Optimizations > 22. After Macro Expansion > 23. Barrier expand > 24. Optimize finished > 25. Before matching > 26. After matching > 27. Global code motion > 28. Register Allocation > 29. Final Code > >
> >
> Output with `-XX:PrintPhaseLevel=2` in conjunction with loo... Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from @chhagedorn Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25183/files - new: https://git.openjdk.org/jdk/pull/25183/files/06b9d49a..69873d35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25183&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25183&range=04-05 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25183.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25183/head:pull/25183 PR: https://git.openjdk.org/jdk/pull/25183 From sparasa at openjdk.org Wed May 14 15:33:53 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 14 May 2025 15:33:53 GMT Subject: RFR: 8356281: Fix for TestFPComparison failure due to incorrect result [v3] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 06:29:19 GMT, Tobias Hartmann wrote: > Looks good to me. Testing on our side passed. Thank you, Tobias! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25101#issuecomment-2880676129 From sparasa at openjdk.org Wed May 14 15:33:53 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 14 May 2025 15:33:53 GMT Subject: RFR: 8356281: Fix for TestFPComparison failure due to incorrect result [v3] In-Reply-To: <-KhR1iHMlrE3DxQflBGzBrtriF2bfiA0C4eLacFF-Uc=.3d324ec2-38e4-4119-9fcd-ae59338bde5d@github.com> References: <-KhR1iHMlrE3DxQflBGzBrtriF2bfiA0C4eLacFF-Uc=.3d324ec2-38e4-4119-9fcd-ae59338bde5d@github.com> Message-ID: On Wed, 14 May 2025 11:35:03 GMT, Jatin Bhateja wrote: > Fix looks good to me. Thanks Jatin! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25101#issuecomment-2880679618 From duke at openjdk.org Wed May 14 15:33:54 2025 From: duke at openjdk.org (duke) Date: Wed, 14 May 2025 15:33:54 GMT Subject: RFR: 8356281: Fix for TestFPComparison failure due to incorrect result [v3] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 22:46:10 GMT, Srinivas Vamsi Parasa wrote: >> This PR fixes the cause of failure in TestFPComparison while using APX NDD instructions. >> >> The test passes after using this fix as shown below: >> >> Passed: compiler/c2/irTests/TestFPComparison.java >> Test results: passed: 1 >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR SKIP >> jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison.java >> 1 1 0 0 0 >> ============================== >> TEST SUCCESS > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > add TEMP(dst) @vamsi-parasa Your change (at version 3fb569cf01839eae4721f8a8c4c4dd38fc53adf4) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25101#issuecomment-2880682066 From adinn at openjdk.org Wed May 14 15:38:58 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Wed, 14 May 2025 15:38:58 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v6] In-Reply-To: References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: On Wed, 14 May 2025 10:47:40 GMT, Andrew Haley wrote: >> This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). >> >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op >> MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op >> MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op >> MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op > > Andrew Haley has updated the pull request incrementally with three additional commits since the last revision: > > - AvoidUnalignedAccesses > - Temp > - The cherry on the cake Looks good to me. ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25147#pullrequestreview-2840674777 From sparasa at openjdk.org Wed May 14 15:42:00 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 14 May 2025 15:42:00 GMT Subject: Integrated: 8356281: Fix for TestFPComparison failure due to incorrect result In-Reply-To: References: Message-ID: On Wed, 7 May 2025 16:05:53 GMT, Srinivas Vamsi Parasa wrote: > This PR fixes the cause of failure in TestFPComparison while using APX NDD instructions. > > The test passes after using this fix as shown below: > > Passed: compiler/c2/irTests/TestFPComparison.java > Test results: passed: 1 > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR SKIP > jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison.java > 1 1 0 0 0 > ============================== > TEST SUCCESS This pull request has now been integrated. Changeset: 10436c1e Author: Srinivas Vamsi Parasa Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/10436c1e1d0a14ef8ba4d58babb23fc47c949a6f Stats: 24 lines in 1 file changed: 9 ins; 0 del; 15 mod 8356281: Fix for TestFPComparison failure due to incorrect result Reviewed-by: sviswanathan, thartmann, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/25101 From never at openjdk.org Wed May 14 16:01:57 2025 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 14 May 2025 16:01:57 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v15] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 08:38:50 GMT, Doug Simon wrote: >> src/hotspot/share/jvmci/jvmciRuntime.cpp line 852: >> >>> 850: >>> 851: void JVMCINMethodData::relocate_nmethod_mirror(nmethod* nm) { >>> 852: oop nmethod_mirror = get_nmethod_mirror(nm, /* phantom_ref */ false); >> >> Why is phantom false? > > I assume that's copied from `JVMCINMethodData::invalidate_nmethod_mirror` which was updated in https://github.com/openjdk/jdk/commit/f81c192da929d72be5134ccf195be2a985737504. The description for [JDK-8234359](https://bugs.openjdk.org/browse/JDK-8234359) implies that this somehow avoids enqueuing potentially dead object to the SATB buffer. Is that what we want here @tkrodriguez ? It should be passing true here as we are not in the middle of a GC so it should be alive and valid. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2089274697 From kvn at openjdk.org Wed May 14 16:26:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 14 May 2025 16:26:52 GMT Subject: RFR: 8356946: x86: Optimize interpreter profile updates In-Reply-To: References: Message-ID: On Wed, 14 May 2025 09:48:54 GMT, Aleksey Shipilev wrote: > Noticed two awkward things in current x86 interpreter profiling code. > > First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. > > Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. > > So we can save a few instructions / memory accesses on this path. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [ ] Linux x86_64 server fastdebug, `all` Looks good. src/hotspot/cpu/x86/interp_masm_x86.hpp line 217: > 215: void increment_mdp_data_at(Address data, bool decrement = false); > 216: void increment_mdp_data_at(Register mdp_in, int constant, > 217: bool decrement = false); `decrement` is never used? ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25223#pullrequestreview-2840817142 PR Review Comment: https://git.openjdk.org/jdk/pull/25223#discussion_r2089316142 From asmehra at openjdk.org Wed May 14 16:34:55 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 14 May 2025 16:34:55 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 15:17:02 GMT, Andrew Dinn wrote: >> Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Merge branch 'master' into preserve-runtime-blobs-master >> - Address Vladimir's comments >> >> Signed-off-by: Ashutosh Mehra >> - Remove irrelevant comment >> >> Signed-off-by: Ashutosh Mehra >> - Fix win64 compile failures >> >> Signed-off-by: Ashutosh Mehra >> - Fix AOTCodeFlags.java test >> >> Signed-off-by: Ashutosh Mehra >> - Fix compile failure in minimal config >> >> Signed-off-by: Ashutosh Mehra >> - Revert back changes that added AOTRuntimeConstants. >> Ensure CompressedOops::base and CompressedKlssPointers::base does not >> change in production run >> >> Signed-off-by: Ashutosh Mehra >> - Fix merge conflicts >> >> Signed-off-by: Ashutosh Mehra >> - Store/load AsmRemarks and DbgStrings in aot code cache >> >> Signed-off-by: Ashutosh Mehra >> - Add missing external address in aarch64 >> >> Signed-off-by: Ashutosh Mehra >> - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab > > Having discussed this with @fisk it appears that the weak reference load performed by the c2i adapters will not attempt a decode. The barrier load_at method only performs a decode when the decorators include `IN_HEAP`. `resolve_weak_handle` passes in the `IN_NATIVE` decorator which implies no decode should be performed. > > So, this means we can use the adapters even if the compressed oop base differs between training run and production run. @adinn can you please review this as well? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2880863631 From asmehra at openjdk.org Wed May 14 16:39:56 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Wed, 14 May 2025 16:39:56 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: <8W_FRkLbamdZ6l0Lkbn8WqXv_JXPjG-i5hBus2foor4=.4f80cd55-4141-46ff-8436-0cbbc9228461@github.com> Message-ID: On Mon, 12 May 2025 23:07:02 GMT, Vladimir Kozlov wrote: >>> I think for these changes we should not use AOT code when the heap base does not match. >> Something changed in compressed oops code which prevents enforcing encoding. >> We can investigate and fix it later. >> >> @vnkozlov for this PR we are relying on having relocation for COOP base, not on enforcing encoding. And that should be able to handle cases where heap base is different in assembly vs prod. Why do you suggest to not use AOT code when the heap base does not match? > > @ashu-mehra, this looks good with few comments. After you address them, please merge latest jdk - I pushed small related change to limit platforms to run with AOT. > > After that I will submit new testing. @vnkozlov thank you for reviewing and testing it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2880874827 From yzheng at openjdk.org Wed May 14 19:50:55 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 14 May 2025 19:50:55 GMT Subject: RFR: 8356971: [JVMCI] Export VM_Version::supports_avx512_simd_sort to JVMCI compiler In-Reply-To: References: Message-ID: On Wed, 14 May 2025 13:16:26 GMT, Yudi Zheng wrote: > HotSpot selects between AVX512 and AVX2 implementations of array sort/partition stubs based on the return value of VM_Version::supports_avx512_simd_sort. The AVX2 version supports fewer element types than the AVX512 version and may fail at runtime if unsupported types are encountered. This capability information should be exposed to the JVMCI compiler to properly guard against incorrect intrinsification. This is especially important because VM_Version::supports_avx512_simd_sort includes a special exclusion rule for AMD Zen4, due to performance considerations. Thanks for the review! Passed tier1-3. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25225#issuecomment-2881367102 From yzheng at openjdk.org Wed May 14 19:50:55 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 14 May 2025 19:50:55 GMT Subject: Integrated: 8356971: [JVMCI] Export VM_Version::supports_avx512_simd_sort to JVMCI compiler In-Reply-To: References: Message-ID: On Wed, 14 May 2025 13:16:26 GMT, Yudi Zheng wrote: > HotSpot selects between AVX512 and AVX2 implementations of array sort/partition stubs based on the return value of VM_Version::supports_avx512_simd_sort. The AVX2 version supports fewer element types than the AVX512 version and may fail at runtime if unsupported types are encountered. This capability information should be exposed to the JVMCI compiler to properly guard against incorrect intrinsification. This is especially important because VM_Version::supports_avx512_simd_sort includes a special exclusion rule for AMD Zen4, due to performance considerations. This pull request has now been integrated. Changeset: 948ade8e Author: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/948ade8e7003a41683600428c8e3155c7ed798db Stats: 4 lines in 3 files changed: 4 ins; 0 del; 0 mod 8356971: [JVMCI] Export VM_Version::supports_avx512_simd_sort to JVMCI compiler Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/25225 From avoitylov at openjdk.org Wed May 14 20:11:57 2025 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Wed, 14 May 2025 20:11:57 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on Cortex-A53 In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 12:39:40 GMT, Aleksei Voitylov wrote: > The root of the problem is that VectorizedHashCode intrinsic introduced by JDK-8341194 is not aware of JDK-8079203. JDK-8079203 generates additional nop with madd instruction on Cortex-A53 as a workaround for Cortex-A53 erratum 835769 "AArch64 multiply-accumulate instruction might produce incorrect result". Current VectorizedHashCode intrinsic calculates byte offset to jump inside the unrolled loop code. It assumes 2 instructions per each unrolled iteration (load and madd). JDK-8079203 adds additional nop for Cortex-A53, which breaks offset calculation logic. > ? > Current offset calculation logic is using shift instead of multiplication, power-of-2 number instructions are present in each unrolled loop iteration. To keep it simple, this fix adds one more nop into each loop iteration on Cortex-A53 in order to have 4 instruction per iteration, which is also a power-of-2. To account for that, the shift argument for offset calculation logic is increased by 1, because each loop iteration has 2 times more instructions on Cortex-A53. > ? > This fix is tested on Raspberry Pi 3 (based on Cortex-A53) by running initially reported application and by running hotspot jtreg tests (not a single test could be run on Cortex-A53 before the fix). After the fix, the specialized test hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java passes. > > The performance gain from the intrinsic is also observed on Cortex-A53 using the ArraysHashCode benchmark. Thanks Andrew! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2881424400 From duke at openjdk.org Wed May 14 20:11:57 2025 From: duke at openjdk.org (duke) Date: Wed, 14 May 2025 20:11:57 GMT Subject: RFR: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on Cortex-A53 In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 12:39:40 GMT, Aleksei Voitylov wrote: > The root of the problem is that VectorizedHashCode intrinsic introduced by JDK-8341194 is not aware of JDK-8079203. JDK-8079203 generates additional nop with madd instruction on Cortex-A53 as a workaround for Cortex-A53 erratum 835769 "AArch64 multiply-accumulate instruction might produce incorrect result". Current VectorizedHashCode intrinsic calculates byte offset to jump inside the unrolled loop code. It assumes 2 instructions per each unrolled iteration (load and madd). JDK-8079203 adds additional nop for Cortex-A53, which breaks offset calculation logic. > ? > Current offset calculation logic is using shift instead of multiplication, power-of-2 number instructions are present in each unrolled loop iteration. To keep it simple, this fix adds one more nop into each loop iteration on Cortex-A53 in order to have 4 instruction per iteration, which is also a power-of-2. To account for that, the shift argument for offset calculation logic is increased by 1, because each loop iteration has 2 times more instructions on Cortex-A53. > ? > This fix is tested on Raspberry Pi 3 (based on Cortex-A53) by running initially reported application and by running hotspot jtreg tests (not a single test could be run on Cortex-A53 before the fix). After the fix, the specialized test hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java passes. > > The performance gain from the intrinsic is also observed on Cortex-A53 using the ArraysHashCode benchmark. @voitylov Your change (at version 69bd1e2fdb6b29f16767cf839cd60f1fb20f7fb4) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24489#issuecomment-2881426030 From iveresov at openjdk.org Wed May 14 23:39:43 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 14 May 2025 23:39:43 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v21] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 84 commits: - Merge branch 'master' into pp2 - Address Ioi's comments - Merge branch 'master' into pp2 - Address Ioi's comments - 8356885: Don't emit C1 profiling for casts if TypeProfileCasts is off Reviewed-by: vlivanov, kvn - 8352755: Misconceptions about j.text.DecimalFormat digits during parsing Reviewed-by: naoto - 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() Reviewed-by: wkemper - 8356819: [macos] MacSign should use "openssl" and "faketime" from Homebrew by default Reviewed-by: asemenyuk - 8356107: [java.lang] Use @requires tag instead of exiting based on os.name or separatorChar property Reviewed-by: naoto, bpb - 8356447: Change default for EagerJVMCI to true Reviewed-by: yzheng, kvn, never - ... and 74 more: https://git.openjdk.org/jdk/compare/5e50a584...d3d51b00 ------------- Changes: https://git.openjdk.org/jdk/pull/24886/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=20 Stats: 3332 lines in 59 files changed: 3118 ins; 100 del; 114 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From sparasa at openjdk.org Thu May 15 01:24:51 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 15 May 2025 01:24:51 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v22] In-Reply-To: References: Message-ID: > Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. > > For example: > > `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding > `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Fix for UseAddressNop related failure ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/8761f770..4e749710 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=20-21 Stats: 8 lines in 2 files changed: 1 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From jkarthikeyan at openjdk.org Thu May 15 02:33:30 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 15 May 2025 02:33:30 GMT Subject: RFR: 8355512: Test compiler/vectorization/TestVectorZeroCount.java times out with -XX:TieredStopAtLevel=3 Message-ID: Hi all, This is a small patch to TestVectorZeroCount to make it only execute when C2 is enabled, to fix a timeout with -XX:TieredStopAtLevel=3. This test takes a long time to finish without C2 because it iterates through all of the integers twice. Since the intention of the test is to stress the C2-specific `numberOfLeadingZeros` and `numberOfTrailingZeros` intrinsics, I think it makes sense to limit it to running with C2 only. Reviews would be appreciated! ------------- Commit messages: - Only run TestVectorZeroCount with C2 Changes: https://git.openjdk.org/jdk/pull/25243/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25243&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355512 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25243.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25243/head:pull/25243 PR: https://git.openjdk.org/jdk/pull/25243 From jkarthikeyan at openjdk.org Thu May 15 02:33:57 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 15 May 2025 02:33:57 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v12] In-Reply-To: References: <05AJmJd1G9_Z5TzYb6kuA1KcXqN96C2-ivfhnstgfCM=.aadfc52f-7748-4abb-a497-8f5049ab608b@github.com> Message-ID: On Mon, 12 May 2025 06:27:12 GMT, Emanuel Peter wrote: >> @eme64 Thanks for the testing results! It looks like byte<->long conversion isn't supported with AVX1, so I've pushed a small to make the test to check for AVX2 in those cases instead. > > @jaskarth Excellent! I'll run another round of testing, just to be sure :) > Please ping me again in 24h! @eme64 Pinging for test results :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2882036030 From stuefe at openjdk.org Thu May 15 04:06:52 2025 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 15 May 2025 04:06:52 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases In-Reply-To: <6yM8EzLBPpXvnTulQIJi8TbNgu5667IhN85Q8YhPLSA=.c63d6c10-ecb8-4c67-b056-75b965174377@github.com> References: <5K47IsqYMjuNzxd9ZyBuejxLNpw_y9F0-SaVAEFY3yo=.b41fb57a-78a8-425d-b405-319a13800b0a@github.com> <6yM8EzLBPpXvnTulQIJi8TbNgu5667IhN85Q8YhPLSA=.c63d6c10-ecb8-4c67-b056-75b965174377@github.com> Message-ID: On Wed, 14 May 2025 14:08:40 GMT, Christian Hagedorn wrote: > > > > Thanks for working on the Manuel, looks very useful! Have you considered using the Unified Logging (UL) instead of creating a new JVM flag for this? We already have `-Xlog:jit+compilation` that seems related to this. You might print the compile phase information with e.g. `-Xlog:jit+compilation=trace`, or add a new UL tag if necessary. > > > > We want to move towards using the UL framework in the JVM compiler components, now that the preparation work by @anton-seoane is completed. > > > > > > > > > UL is definitely the long-term solution. But given that we have more levels with this new flag (-1 to 6) than UL provides (trace, info, etc.), how could we do it with UL? > > > > > > Good point, I guess we would have to remap the IGV print levels to the (fewer) UL logging levels. > > That's true that these are probably too many levels. It would just sad when we want to (re-)add another level but we already used all UL levels. But maybe with UL, we want to have different tags or something like that. > > > But I am also OK with adding a new JVM flag in the context of this RFE and revisiting it when migrating to UL. > > I agree with that. Drive-by comment: we want to use UL (it has many benefits beside cutting down on the many switches) then UL should fit all our needs. If UL has too few tracing levels for all usages, then we need to add one. I also sometimes found four levels too few. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25183#issuecomment-2882175296 From chagedorn at openjdk.org Thu May 15 06:35:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 15 May 2025 06:35:06 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases [v6] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 15:00:30 GMT, Manuel H?ssig wrote: >> This PR introduces the flag `-XX:PrintPhaseLevel` that works like the flag `-XX:PrintIdealGraphLevel` and prints the name phases of a C2 compilation (essentially what we have in the left side bar in IGV) to the terminal. This allows redirecting the output to a file and comparing phase decisions between two compilations. Further, it is useful in conjunction with loop opts tracing to immediately see in which phase a certain optimization happened. >> >>
>> Output with `-XX:PrintPhaseLevel=2` >> >> >>> java-fastdebug -Xbatch -XX:CompileCommand=compileonly,TestLoop.test10 -XX:CompileCommand=printcompilation,TestLoop.test* -XX:PrintPhaseLevel=2 TestLoop.java >> CompileCommand: compileonly TestLoop.test10 bool compileonly = true >> CompileCommand: PrintCompilation TestLoop.test* bool PrintCompilation = true >> 3577 98 % b 3 TestLoop::test10 @ 2 (64 bytes) >> 3584 99 b 3 TestLoop::test10 (64 bytes) >> 3648 100 % b 4 TestLoop::test10 @ 2 (64 bytes) >> 1. After Parsing >> 2. Iter GVN 1 >> 3. Incremental Inline >> 4. Incremental Boxing Inline >> 5. Before Loop Optimizations >> 6. PhaseIdealLoop 1 >> 7. PhaseIdealLoop 2 >> 8. PhaseIdealLoop 3 >> 9. Before PhaseCCP 1 >> 10. PhaseCCP 1 >> 11. Iter GVN 2 >> 12. PhaseIdealLoop iterations >> 13. After Loop Optimizations >> 14. After Macro Expansion >> 15. Barrier expand >> 16. Optimize finished >> 17. Before matching >> 18. After matching >> 19. Global code motion >> 20. Register Allocation >> 21. Final Code >> 3668 103 b 4 TestLoop::test10 (64 bytes) >> 1. After Parsing >> 2. Iter GVN 1 >> 3. Incremental Inline >> 4. Incremental Boxing Inline >> 5. Before Loop Optimizations >> 6. PhaseIdealLoop 1 >> 7. PhaseIdealLoop 2 >> 8. PhaseIdealLoop 3 >> 9. Before PhaseCCP 1 >> 10. PhaseCCP 1 >> 11. Iter GVN 2 >> 12. PhaseIdealLoop iterations >> 13. PhaseIdealLoop iterations 2 >> 14. PhaseIdealLoop iterations 3 >> 15. PhaseIdealLoop iterations 4 >> 16. PhaseIdealLoop iterations 5 >> 17. PhaseIdealLoop iterations 6 >> 18. PhaseIdealLoop iterations 7 >> 19. PhaseIdealLoop iterations 8 >> 20. PhaseIdealLoop iterations 9 >> 21. After Loop Optimizations >> 22. After Macro Expansion >> 23. Barrier expand >> 24. Optimize finished >> 25. Before matching >> 26. After matching >> 27. Global code motion >> 28. Registe... > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from @chhagedorn > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25183#pullrequestreview-2842426158 From duke at openjdk.org Thu May 15 06:35:43 2025 From: duke at openjdk.org (kuaiwei) Date: Thu, 15 May 2025 06:35:43 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v16] In-Reply-To: References: Message-ID: > In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. > > The benchmark result of MergeLoadBench.java > AMD EPYC 9T24 96-Core Processor: > > |name | -MergeLoads | +MergeLoads |delta| > |---|---|---|---| > |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | > |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | > |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | > |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | > |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | > |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | > |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | > |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | > |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | > |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | > |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | > |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | > |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | > |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | > |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | > |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | > |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | > |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | > |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | > |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | > |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | > |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | > |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | > |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | > |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | > |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | > |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | > |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | > |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | > |MergeLoadBench.getLongRU |3071.881 |3066.606 | -5.27 | > |Merg... kuaiwei has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Fix test error after merging - Merge remote-tracking branch 'origin/master' into dev/merge_loads - Fix for comments - Fix build error on mac and windows - Add check flag for combine operator - Make MergeLoadInfoList an in-place growable array - Fix for comments - Merge remote-tracking branch 'origin/master' into dev/merge_loads - Merge remote-tracking branch 'origin/master' into dev/merge_loads - Remove unused code - ... and 12 more: https://git.openjdk.org/jdk/compare/5e50a584...c7dd91a2 ------------- Changes: https://git.openjdk.org/jdk/pull/24023/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=15 Stats: 2691 lines in 17 files changed: 2642 ins; 0 del; 49 mod Patch: https://git.openjdk.org/jdk/pull/24023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24023/head:pull/24023 PR: https://git.openjdk.org/jdk/pull/24023 From mchevalier at openjdk.org Thu May 15 06:40:10 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 15 May 2025 06:40:10 GMT Subject: RFR: 8355488: Add stress mode for C2 loop peeling [v2] In-Reply-To: References: Message-ID: > Adding a `StressLoopPeeling` dev flag that randomize peeling. > > ## Semantics > > For now, the direction I've taken is to randomly take a decision in case of peeling, otherwise, rely on existing heuristics. > > This requires to distinguish two things: > - not inlining because it's not legal: see for instance > ```cpp > assert(cl->trip_count() > 0, "peeling a fully unrolled loop"); > ``` > in `PhaseIdealLoop::do_peeling` > - not inlining because it doesn't seem profitable. > > Peeling loops without a good reason (not containing an exiting `If` whose condition is not a member of the loop) but without a concrete way to forbid it should always be allowed. Let's stress it! > > Peeling too many times is not a great idea either. It uses a lot of memory, of nodes... Also, it may prevent other optimisations from kicking in. And what about interaction with future stress flags? Let's limit peeling: we give a fixed number of opportunities to peel before we give up on peeling for good. That is not the same as limiting the amount of peeling we do. Indeed, if we bound the number of times we say "yes, please, peel" given enough requests, we will always reach the bound. If we limit the number of requests, we have a more evenly distributed amount of peeling, between 0 and the bound. > > I've tried without the bound: I couldn't find any bug without the bound that would not reproduce with the bound. It only save some legitimate memory problems. Without a bound on the number of peeling opportunities, hotspot eats a lot of memory, but all the allocations seems reasonable: it just seems we ask too much. We could limit the number of nodes, to prevent peeling before we reach the memory limit, but that would also hinder other optimizations and (future) stress flags. > > > > ## The Flag > > The flag is very specialized, unlike a `StressLoopOpts` would be. My idea so far is "let's see". My idea is that it's good to be able to enable stress optimizations selectively, and have a flag like `StressLoopOpts` that would turn them all: we could use the general one in testing, and the finer-grain ones when debugging. A reason for that is that I don't see a real use-case for stressing some features but not others (which would make the number of combinations explode): having (for instance) `+StressLoopUnrolling +StressLoopPeeling` would sometimes behave like `+StressLoopUnrolling -StressLoopPeeling`, and so it's not very useful to test the latter. > > But once again: let's see what happens. > > > ## On the Code > > The field `_peel... Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into feat/stress/loop/peeling - Move peel count in LoopNode - interface - Limit peeling - A simpler implementation - Peel more! - Stress loop peeling ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25140/files - new: https://git.openjdk.org/jdk/pull/25140/files/7e22ea0b..8ce4fa35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25140&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25140&range=00-01 Stats: 23826 lines in 992 files changed: 12916 ins; 5871 del; 5039 mod Patch: https://git.openjdk.org/jdk/pull/25140.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25140/head:pull/25140 PR: https://git.openjdk.org/jdk/pull/25140 From chagedorn at openjdk.org Thu May 15 06:40:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 15 May 2025 06:40:53 GMT Subject: RFR: 8355512: Test compiler/vectorization/TestVectorZeroCount.java times out with -XX:TieredStopAtLevel=3 In-Reply-To: References: Message-ID: On Thu, 15 May 2025 02:29:11 GMT, Jasmine Karthikeyan wrote: > Hi all, > This is a small patch to TestVectorZeroCount to make it only execute when C2 is enabled, to fix a timeout with -XX:TieredStopAtLevel=3. This test takes a long time to finish without C2 because it iterates through all of the integers twice. Since the intention of the test is to stress the C2-specific `numberOfLeadingZeros` and `numberOfTrailingZeros` intrinsics, I think it makes sense to limit it to running with C2 only. > > Reviews would be appreciated! Looks good, thanks for fixing it! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25243#pullrequestreview-2842438852 From epeter at openjdk.org Thu May 15 06:54:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 May 2025 06:54:56 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v13] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 03:11:52 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: >> >> >> Baseline Patch >> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement >> VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) >> VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) >> VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) >> >> >> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Check for AVX2 for byte/long conversions Testing all passed! Approved, thanks for the work @jaskarth ! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23413#pullrequestreview-2842475702 From epeter at openjdk.org Thu May 15 06:59:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 May 2025 06:59:55 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v12] In-Reply-To: References: <05AJmJd1G9_Z5TzYb6kuA1KcXqN96C2-ivfhnstgfCM=.aadfc52f-7748-4abb-a497-8f5049ab608b@github.com> Message-ID: On Thu, 15 May 2025 02:30:46 GMT, Jasmine Karthikeyan wrote: >> @jaskarth Excellent! I'll run another round of testing, just to be sure :) >> Please ping me again in 24h! > > @eme64 Pinging for test results :) @jaskarth I do have one more question: I have seens patterns with `LShift` and `RShift` for sign extension in the past. For example, if we cast a signed `byte` to a signed `short`, we have to do sign extension, right? Do any of these patterns appear in your benchmarks here? Do you know under which circumstances they appear? Maybe the simple conversion examples we have here are too limited to cover all cases. Not saying that should stop integration, just wondering if it is worth finding an example and filing an RFE just to keep this as a direction to explore in the future ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2882768635 From epeter at openjdk.org Thu May 15 07:24:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 May 2025 07:24:00 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v22] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 15:08:49 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more applied suggestions Repeating from internal discussion, here my newest proposal: We could: - Move all static methods from `Template` -> `TemplateUtils` , that frees up the name Template . Alternatively, we just keep the static methods in `Template`. - `Template.make` -> produces a `Template` , specifically `Template.ZeroArgs` , `Template.OneArgs`, ... - A `Template` can then either be converted to a `Token` : `template.asToken(...args...)` - Or it can be rendered:` template.render(...args...) `. To allow custom fuel, we may have to use something like `template.renderWithFuel(fuel, ...args...)` This way, we do not have two kinds of `Template` (args not applied vs args already applied). Instead, we just have two functions: `asTemplate` and `render`. This hides the internal state. We may still have an internal class, and have to find out what to call it. But the naming is less important for a hidden class. I may call it `TemplateWithBoundArgs`, even for the `ZeroArgs` case. We can ask the reader of the internal code for forgiveness with a comment, that we are saying that all args are bound even when there are no args ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2882822708 From epeter at openjdk.org Thu May 15 07:45:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 May 2025 07:45:45 GMT Subject: RFR: 8355094: Performance drop in auto-vectorized kernel due to split store Message-ID: **Summary** Before [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155) / https://github.com/openjdk/jdk/pull/18822, we used to prefer aligning to stores. But in that change, I removed that preference, and since then we have been aligning to loads instead (there is no preference, but since loads usually come before stores in the loop body, the load gets picked). This lead to a performance regression, especially on `x64`. Especially on `x64`, it is more important to align stores than aligning loads. This is because memory operations that cross a cacheline boundary are split. And `x64` CPU's generally have more throughput for loads than for stores, so splitting a store is worse than splitting a load. On `aarch64`, the results are less clear. On two machines, the differences were marginal, but surprisingly aligning to loads was marginally faster. On another machine, aligning to stores was significantly faster. I suspect performance depends on the exact `aarch64` implementation. I'm not an `aarch64` specialist, and only have access to a limited number of machines. **Fix**: make automatic alignment configurable with `SuperWordAutomaticAlignment` (no alignment, align to store, align to load). Default is align to store. For now, I will just align to stores on all platforms. If someone has various `aarch64` machines, they are welcome do do deeper investigations. Same for other platforms. We could always turn the flag into a platform dependent one, and set different defaults depending on the exact CPU. If you are interested you can read my investigations/benchmark results below. Therre are a lot of colorful plots ? ? **FYI about Vector API:** if you are working with the Vector API, you may also want to worry about **alignment**, because there can be a **significant performance impact** (30%+ in some cases). You may also want to know about **4k aliasing**, discussed below. **Shoutout:** - @jatin-bhateja filed the regression, and explained that it was about split stores. - @mhaessig helped me talk through some of the early benchmarks. - @iwanowww pointed me to the 4k aliasing explanation. -------------------- **Introduction** I had long lived with the **theory that on modern CPUs, misalignment has no consequence, especially no performance impact**. When you google, many sources say that misalignment used to be an issue on older CPUs, but not any more. That may **technically** be true: - A misaligned load or store that does not cross a cacheline boundary has no performance difference to an aligned load or store that does not cross a cacheline boundary. - But: **A misaligned load or store that crosses a cacheline boundary is slower** than a misaligned load or store that does not cross a cacheline boundary. The reason is that a load or store that crosses a cacheline boundary is split, which means we now have two memory accesses instead of one. **So there is a connection**: alignment means the load or store cannot cross a cacheline boundary, assuming a cacheline is at least as long as the load / store (e.g. 64 byte cacheline and 64 byte load / store or smaller). Conversely, a misaligned load has a good chance to cross a cacheline boundary. Especially when we are auto vectorizing, we are accessing a contiguous block of memory, and so if our accesses are misaligned, we must cross the cacheline boundary at some point. Hence, **alignment has a performance impact in vectorization**. If we have a load and a store, but because of relative misalignment we can only align one: is it better to align the load or the store? Generally, x64 CPUs have more throughput for loads than stores. Splitting loads means we have more loads, which is not as bad as splitting more stores and having more stores going through the CPU. Hence, in most cases, it is better to align the store, and accept that the load is split. The above holds for `x64`, but on `aarch64` things are a little different / more complicated. For example, I found [JEP 315](https://openjdk.org/jeps/315), which mentions: > Avoid unaligned memory access if needed. Some CPU implementations impose penalties when issuing load/store instructions across a 16-byte boundary, a dcache-line boundary, or have different optimal alignment for different load/store instructions (see, for example, the Cortex A53 guide). If the aligned versions of intrinsics do not slow down code execution on alignment-independent CPUs, it may be beneficial to improve address alignment to help those CPUs that do have some penalties, provided it does not significantly increase code complexity. For the aarch64 machine I use, a Neoverse N1, the [N1 Optimization Guide](https://developer.arm.com/documentation/109896/latest/) says: > 4.5 Load/Store alignment >The Armv8.2-A architecture allows many types of load and store accesses to be arbitrarily aligned. The Neoverse N1 handles most unaligned accesses without performance penalties. However, there are cases which reduce bandwidth or incur additional latency, as described below. >- Load operations that cross a cache-line (64-byte) boundary. >- Quad-word load operations that are not 4B aligned. >- Store operations that cross a 16B boundary. Checking in a few other manuals, it is mostly about the 64-byte cacheline boundary for loads, and the 16-byte boundary for stores. These chips have the `neon` vector instructions, which are at most 16-byte (128 bit). >From this I would personally conclude that with full alignment to vector length, there should be maximum performance. But the results below make me question that, and it seems I don't have the full picture yet. ------------------------ **Initial investigation using the Vector API** With the Vector API, we can produce code where we have direct control over what vector instructions are generated, and including their alignment. This means we can start with some experiments independent of the auto vectorizer. I wrote a stand-alone `Benchmark.java`, you can find it at the end of this PR. I did not integrate it, because it is not very well suited for regression testing, rather for visualization only. For regression testing, I am integrating the benchmark `VectorAutoAlignment.java`. Still, I am also integrating `VectorAutoAlignmentVisualization.java`, which can be used to visualize the effect of alignment for the auto vectorizer only. Consider the following method, where we can vary the alignment of the load and store with `offset_load` and `offset_store`, respectively: public static void test1L1SVector(int offset_load, int offset_store) { for (int i = 0; i < SIZE - 64 - GRID; i += SPECIES.length()) { var v = IntVector.fromArray(SPECIES, arr1, base1 + i + offset_load); v.intoArray(arr0, base0 + i + offset_store); } } Let's start with a simple experiment, using `test1L1SVector`, with `SIZE = 2560` (produces clean results because not too many other effects) and `oneArray` (store at beginning of array, store in same array but SIZE elements later). Below the results for my AVX512 machine, that supports up to 64 byte vectors. I show the results for 64, 32, 16 and 8 bytes, i.e. 16, 8, 4, and 2 ints per vector. ![image](https://github.com/user-attachments/assets/8daeddc2-3dd5-4c23-b436-b9ffd89d4da2) x-axis ??: `offset_load` y-axis ??: `offset_store` We can see how there is a very clear grid for every size, and that the grid repeats with the vector size, i.e. the number of elements per vector. We see that store-alignment alignment has a larger effect on performance than load-alignment. With 16 element vectors, we can even see a faint diagonal effect of relative alignment between the loads and stores, though I don't know the cause of that effect. Further: we can see that the smaller the vectors, the less extreme the relative differences appear. For 16 element vectors, runtime varies from `7.5 ms` to `11.5 ms`, but for 2 element vectors it only varies from `24.8` to `29.5`. My theory is that this that for 64 byte vectors, every unaligned vector is split, leading to roughly a doubling of operations. But for 8 byte vectors, only every 8th crosses a cacheline boundary, and the effect of splitting is thus much smaller. Something else that also is visible in these results: arrays are only 8-byte aligned. Every time I run the benchmark, e.g. for different vector lengths, the alignment of the base is different. Thus, the "lines" of the "grid" do not always align between different runs of these benchmarks. **This has a quite significant implication for vector api benchmarks**: if one does not control the alignment of the arrays, one might get drastically unstable measurements, the results can quickly vary very significantly. The `neon / asimd` N1 aarch64 machine provides vectors up to 128 bits, so we can only display the results for 4 and 2 element vectors of ints: ![image](https://github.com/user-attachments/assets/a01cf707-d3b2-4fa2-946f-27a44e43d8cc) x-axis ??: offset_load y-axis ??: offset_store Strangely, it seems only load alignment has a significant effect. That is quite surprising. ------------------------ **Investigating performance loss when crossing cacheline boundary, using Vector API** While the relative performance differences for different vector lengths already match the theory that only memory accesses that cross a cacheline boundary are split, we now show this effect with a special "skip" benchmark. I ran `Benchmark.java test1L1SVectorSkip 4 2560 oneArray`, i.e. with a "skip" benchmark where every int vector has 4 elements, and we skip every 4th vector: `[0 1 2 3 ][4 5 6 7 ][8 9 10 11][ skip ]`. If the cacheline boundary lies where the we skip a vector, then we should have no performance loss compared to when we have perfect alignment. At least in theory ? Left the result for my AVX512 laptop, right the machine for the N1 aarch64 machine: ![image](https://github.com/user-attachments/assets/e42a06d2-d07c-4e0d-af28-755e79cdbbac) For comparison, the results from above, for the 4-element vectors without skipping: ![image](https://github.com/user-attachments/assets/5135e924-8f47-4339-8e6f-a87723f8f8be) Generally, we see similar 4-element wide "bands" in both directions. The results for the AVX512 machine are quite understandable, and very crisp: We have the same repeating grid as without skip, except that every 4th band in x and y direction is "skipped", i.e. has the same performance as when aligned. **It seems the theory perfectly applies for my AVX512 machine** ? **But the aarch64 results are stranger:** In x direction, i.e. for loads, every 4th band has better performance. That seems to correspond to the cacheline boundary of 64 bytes, i.e. when we skip, there is no load splitting. In y direction, i.e. for stores, every 2nd band has better performance. This is surprising, because the non skipping benchmark did not show any effect in this direction. And it seems to indicate some 32 byte effect, which neither corresponds to the 64 byte cacheline (otherwise we would have to see some effect that only shows every 4th band), nor to the 16 byte store boundaries mentioned in the N1 Optimization Guide. **This is really confusing.** We could further investigate the behavior with different element sizes and vector sizes, and different skip methods. ------------------------ **Discovering 4k aliasing artifacts in benchmark, using the Vector API** On my AVX512 machine, I found an effect that happens around `4k byte` boundaries, i.e. every `1024 ints`. For 64, 32 and 16 byte vectors, i.e. 16, 8, and 4 elements, and `SIZE = 2048`, so 8k bytes: ![image](https://github.com/user-attachments/assets/44468635-35c2-4cb3-a3be-577c5a2a5654) In the lower half triangles, we see the normal grid pattern. Modulo 4k bytes, the loads are ahead of the stores, that may explain why there is not effect. But the upper half triangles have drastically worse performance. The grid is now diagonal, probably dominated by relative alignment rather than absolute alignment. Modulo 4k bytes, the loads are behind the stores - my theory is that this conflicts with the loads having to happen first. I ran it on a larger grid (offsets from 0-127), and one can see that the effect slowly wares off (from red to orange) - ignore the noise, I had to lower the accuracy to complete this one in reasonable time: ![image](https://github.com/user-attachments/assets/a2f41cbf-d87c-4281-9372-1601dbfb3a7c) But it seems on the aarch64 machine, I cannot find this `4k byte` boundary effect. @iwanowww Pointed me to this [article about 4k aliasing](https://github.com/Kobzol/hardware-effects/blob/master/4k-aliasing/README.md). The reason is that store-to-load-forwarding at first only operates on the lowest 12 bits of the address, and when it later detects that the rest of the address does not matter, this incurs a penalty of a few cycles. Note: I only just learned about the effects of store-to-load-forwarding recently, see https://github.com/openjdk/jdk/pull/21521 / [JDK-8334431](https://bugs.openjdk.org/browse/JDK-8334431). ------------------------ **Investigation for automatic alignment in the Auto Vectorizer** To be able to investigate the performance of the Auto-Vectorizer (SuperWord), I made the automatic alignment configurable with `SuperWordAutomaticAlignment`. We can disable it, align with the store or with the load. The attached JMH benchark `VectorAutoAlignmentVisualization.bench1L1S`, with automatic alignment disabled looks like this: ![image](https://github.com/user-attachments/assets/3ee308e9-4e18-46db-a564-546be7ac8bba) This JMH benchmark is really slow, so we can also use the `Benchark.java` from below. I ran it on my AVX512 laptop with `Benchmark.java test1L1SScalar 4 2560 oneArray`: ![image](https://github.com/user-attachments/assets/768cfe0c-c8d7-4d76-8004-3e8b52ae35ee) - Top left: no alignment. - Top right: align with store. - Bottom right: align with load. We can see that with no alignment, we have a grid with 90% angles. If the stores are aligned, we get about `3.35 ms` runtime, if only loads are aligned we get about `4.4 ms`, and if neither is aligned `4.9 ms` - if both are aligned we get only `3.2 ms`. With automatic alignment on stores, we get an overall better performance. But we also see the pattern is now diagonal. In most cases, we only have the store aligned, and we get about `3.4-3.5 ms`. But when the load and store are relatively aligned, i.e. on the thin diagonals, then we get even only `3.3 ms`. These performance numbers are comparable with the numbers we see on the "no alignment" plot on the horizontal lines where the stores are aligned. With automatic alignment on loads, we get an average performance that is better than without alignment, but worse than aligning with stores. In most cases only the load is aligned, and we get `4.4 ms`. But on the rare occasion where the loads and stores are relatively aligned, i.e. the thin diagonals, we get `3.2-3.3 ms`. These numbers are comparable with the numbers we see on the "no alignment" plot on the vertical lines, where the loads are aligned. But which one of these options should we now chose? I.e. what should be the default for `SuperWordAutomaticAlignment`? In general, we do not know the alignment of the load and store, so we should assume that we land on one of the cells at random. Thus, the relevant performance metric is the average over all cells. The benchmark below does exactly this: it runs the loop for every `offset_load` and `offset_store` combination, essentially computing the average over all combinations. ------------------------ **Automatic Alignment in SuperWord (Auto Vectorization)** Results with aliasing runtime checks https://github.com/openjdk/jdk/pull/24278, on `VectorAutoAlignment`, on my AVX512 laptop: ![image](https://github.com/user-attachments/assets/e8d9eb25-03ce-4ee2-bc2e-4f15899be341) Note: before https://github.com/openjdk/jdk/pull/24278, this benchmark never vectorizes, because we cannot prove that the load and store do not alias. There are clearly some artifacts around the 4k byte boundaries. See the discussion further up about 4k aliasing. Other than those artifacts, it is very clear that aligning with stores is the best on my AVX512 CPU. Aligning the loads is significantly worse, and not aligning at all slightly worse than that. But in any case: vectorization is always very clearly profitable, no matter the alignment. Running it on a aarch64 neon OCI machine: ![image](https://github.com/user-attachments/assets/7f1c23f7-b4ae-4e40-aa4a-da9e7d0556c7) The results look quite a bit different. Vectorization is still always profitable, no matter the alignment. But now, it seems aligning loads is fastest, and there is no difference between aligning stores or no alignment at all. I also ran it on our benchmark servers: `linux aarch64`, (neon): ![image](https://github.com/user-attachments/assets/8e1d3c7a-0515-4ad5-b73c-3e5e45412ae6) `linux x64`: ![image](https://github.com/user-attachments/assets/2c3e4e81-84fc-4514-b00b-3fd7a1df330f) `macosx aarch64` (neon): ![image](https://github.com/user-attachments/assets/7976b67e-f112-4104-9cdc-9ba91e372151) `macosx x64`: ![image](https://github.com/user-attachments/assets/5f6dd1e2-81e3-4d32-834a-072074f01d83) `windows x64`: ![image](https://github.com/user-attachments/assets/8d79acc2-9ca6-496a-aee5-4f05157d6ee4) The `x64` results are fairly consistent: in most cases aligning to stores is best, except for the 4k artifacts. The `aarch64` results are less clear. On two machines we see that aligning loads is marginally faster, but on one machine aligning to stores is faster. I suspect it may depend on the exact `aarch64` implementation. ------------------------ **Standalone Benchmark.java** I did not integrate it, because it is not very well suited for regression testing, rather for visualization only. For regression testing, I am integrating the benchmark `VectorAutoAlignment.java`. Still, I am also integrating `VectorAutoAlignmentVisualization.java`, which can be used to visualize the effect of alignment for the auto vectorizer only. I usually run the benchmark with command-lines like this: ./java -XX:CompileCommand=compileonly,Benchmark*::test* -XX:CompileCommand=printcompilation,Benchmark*::* -Xbatch -XX:+PrintIdeal -XX:CompileCommand=printassembly,Benchmark*::test* -XX:ObjectAlignmentInBytes=8 -XX:CompileCommand=TraceAutoVectorization,Benchmark*::test*,SW_INFO,ALIGN_VECTOR -XX:+TraceLoopOpts -XX:LoopUnrollLimit=60 -XX:MaxVectorSize=64 -XX:SuperWordAutomaticAlignment=0 Benchmark.java test1L1SVector 4 2432 separateArrays Here some relevant flags to play with: - `ObjectAlignmentInBytes`: alignment of objects, i.e. the arrays in these benchmarks. Default is `8` bytes, which means two arrays only have a relative alignment of `8` bytes. Hence, it may not always be possible to align both references to two arrays to more than `8` bytes, i.e. we can only guarantee 64-byte alignment of at most one of them. - `TraceAutoVectorization`: with the tag `ALIGN_VECTOR` we can see which memory reference we auto-align. - `LoopUnrollLimit`: some benchmarks have a rather large loop, and only auto-vectorize if this limit is artificially increased. - `MaxVectorSize`: we can artificially lower the maximum vector length, possibly breaking larger vectors into multiple smaller ones. - `SuperWordAutomaticAlignment`: controls if and how we auto-align. import jdk.incubator.vector.*; import java.nio.ByteOrder; import java.util.ArrayList; import java.util.Set; public class Benchmark { public static int SIZE; public static VectorSpecies SPECIES; public static int[] arr0; public static int[] arr1; public static int[] arr2; public static int[] arr3; public static int base0; public static int base1; public static int base2; public static int base3; public static void main(String[] args) { if (args.length != 4) { System.out.println("Error: need 4 arguments, got " + args.length); printUsage(); } String benchmarkName = args[0]; int vectorElements = Integer.parseInt(args[1]); if (!Set.of(2, 4, 8, 16).contains(vectorElements)) { System.out.println("Error: vectorElements must be 2, 4, 8, or 16, got " + vectorElements); printUsage(); } SPECIES = VectorSpecies.of(int.class, VectorShape.forBitSize(vectorElements * 4 * 8)); SIZE = Integer.parseInt(args[2]); if (SIZE < 2000 || SIZE > 100_000) { System.out.println("Error: dataSize out of range [2000, 100_000], got " + SIZE); printUsage(); } String scenario = args[3]; switch (scenario) { // Load / Store from different arrays. Relative alignment is not known. case "separateArrays" -> { arr0 = new int[SIZE]; arr1 = new int[SIZE]; arr2 = new int[SIZE]; arr3 = new int[SIZE]; base0 = 0; base1 = 0; base2 = 0; base3 = 0; } // Load / Store on same array -> base have a known relative alignment. // Use the whole array, every access has its own "region". case "oneArray" -> { int[] arr = new int[4 * SIZE]; arr0 = arr; arr1 = arr; arr2 = arr; arr3 = arr; base0 = 0 * SIZE; base1 = 1 * SIZE; base2 = 2 * SIZE; base3 = 3 * SIZE; } // Load / Store on same array -> base have a known relative alignment. // Small offset -> the memory accesses use the same memory "region". case "oneArraySmallOffset" -> { int[] arr = new int[4 * SIZE]; arr0 = arr; arr1 = arr; arr2 = arr; arr3 = arr; base0 = 0 * (1024 + 256); base1 = 1 * (1024 + 256); base2 = 2 * (1024 + 256); base3 = 3 * (1024 + 256); } default -> { System.out.println("Error: scenario does not exist: " + scenario); printUsage(); } } BenchmarkRunner.run(benchmarkName); } public static void printUsage() { System.out.println("Usage: java Benchmark.java "); System.out.println(" benchmark:"); System.out.println(" test1L1SVector test1L1SVectorSkip test1L1SScalar"); System.out.println(" test2L1SVector test2L1SVectorSkip test2L1SScalar test2L1SScalarRearranged"); System.out.println(" test3L1SVector test3L1SVectorSkip test3L1SScalar"); System.out.println(" vectorElements: 2, 4, 8, 16"); System.out.println(" dataSize: 2000 ... 100_000. Recommended: 2048."); System.out.println(" scenario: separateArrays oneArray oneArraySmallOffset"); System.exit(0); } } public class BenchmarkRunner { // Make sure the runner has all these fields final, so we get a better chance at optimisation. public static final int SIZE = Benchmark.SIZE; public static final VectorSpecies SPECIES = Benchmark.SPECIES; public static final int REPS = 50_000; // Repeat REPS times for a benchmark measurement. public static final int RUNS = 5; // Each benchmark measurement is repeated RUNS times, and MIN runtime is chosen. public static final int GRID = 32; public static int[] arr0 = Benchmark.arr0; // store public static int[] arr1 = Benchmark.arr1; // load public static int[] arr2 = Benchmark.arr2; // load public static int[] arr3 = Benchmark.arr3; // load public static final int base0 = Benchmark.base0; public static final int base1 = Benchmark.base1; public static final int base2 = Benchmark.base2; public static final int base3 = Benchmark.base3; interface GridBenchmark { void run(int offset_load, int offset_store); } public static void run(String benchmarkName) { switch (benchmarkName) { case "test1L1SVector" -> benchmarkGrid(BenchmarkRunner::test1L1SVector); case "test2L1SVector" -> benchmarkGrid(BenchmarkRunner::test2L1SVector); case "test3L1SVector" -> benchmarkGrid(BenchmarkRunner::test3L1SVector); case "test1L1SVectorSkip" -> benchmarkGrid(BenchmarkRunner::test1L1SVectorSkip); case "test2L1SVectorSkip" -> benchmarkGrid(BenchmarkRunner::test2L1SVectorSkip); case "test3L1SVectorSkip" -> benchmarkGrid(BenchmarkRunner::test3L1SVectorSkip); case "test1L1SScalar" -> benchmarkGrid(BenchmarkRunner::test1L1SScalar); case "test2L1SScalar" -> benchmarkGrid(BenchmarkRunner::test2L1SScalar); case "test3L1SScalar" -> benchmarkGrid(BenchmarkRunner::test3L1SScalar); case "test2L1SScalarRearranged" -> benchmarkGrid(BenchmarkRunner::test2L1SScalarRearranged); default -> { System.out.println("Error: benchmark does not exist: " + benchmarkName); Benchmark.printUsage(); } } System.out.println("Done: " + benchmarkName); System.out.println("x-axis (->) LOAD_OFFSET"); System.out.println("y-axis (up) STORE_OFFSET"); System.out.println("offset_load: load alignment shift"); System.out.println("offset_store: store alignment shift"); } public static void benchmarkGrid(GridBenchmark gt) { System.out.println("Initial Warmup"); for (int i = 0; i < 10 * REPS; i++) { gt.run(0, 0); } ArrayList list = new ArrayList<>(); float total = 0; for (int offset_store = 0; offset_store < GRID; offset_store++) { String line = ""; for (int offset_load = 0; offset_load < GRID; offset_load++) { float t = Float.POSITIVE_INFINITY; for (int i = 0; i < RUNS; i++) { t = Math.min(t, benchmark(offset_load, offset_store, gt)); } total += t; line += String.format("%.5f ", t); } System.out.println(line); list.add(line); } System.out.println("Results [ms]:"); // reverse the list, so the 0/0 point is at the bottom left. for (var line : list.reversed()) { System.out.println(line); } System.out.println("total [ms]: " + total); } public static float benchmark(int offset_load, int offset_store, GridBenchmark gt) { for (int i = 0; i < REPS; i++) { gt.run(offset_load, offset_store); } long t0 = System.nanoTime(); for (int i = 0; i < REPS; i++) { gt.run(offset_load, offset_store); } long t1 = System.nanoTime(); float t = (t1 - t0) * 1e-6f; return t; } public static void vector1L1S(int offset_load, int offset_store, int i) { var v = IntVector.fromArray(SPECIES, arr1, base1 + i + offset_load); v.intoArray(arr0, base0 + i + offset_store); } public static void vector2L1S(int offset_load, int offset_store, int i) { var v0 = IntVector.fromArray(SPECIES, arr1, base1 + i + offset_load); var v1 = IntVector.fromArray(SPECIES, arr2, base2 + i + offset_load); var v = v0.add(v1); v.intoArray(arr0, base0 + i + offset_store); } public static void vector3L1S(int offset_load, int offset_store, int i) { var v0 = IntVector.fromArray(SPECIES, arr1, base1 + i + offset_load); var v1 = IntVector.fromArray(SPECIES, arr2, base2 + i + offset_load); var v2 = IntVector.fromArray(SPECIES, arr3, base3 + i + offset_load); var v = v0.add(v1).add(v2); v.intoArray(arr0, base0 + i + offset_store); } public static void test1L1SVector(int offset_load, int offset_store) { for (int i = 0; i < SIZE - 64 - GRID; i += SPECIES.length()) { vector1L1S(offset_load, offset_store, i); } } public static void test2L1SVector(int offset_load, int offset_store) { for (int i = 0; i < SIZE - 64 - GRID; i += SPECIES.length()) { vector2L1S(offset_load, offset_store, i); } } public static void test3L1SVector(int offset_load, int offset_store) { for (int i = 0; i < SIZE - 64 - GRID; i += SPECIES.length()) { vector3L1S(offset_load, offset_store, i); } } // This one is to prove that the split happens on the cache line. // // Note: this is written for vectorElements = 4 public static void test1L1SVectorSkip(int offset_load, int offset_store) { for (int i = 0; i < SIZE - 64 - GRID; i += 16) { vector1L1S(offset_load, offset_store, i + 0); vector1L1S(offset_load, offset_store, i + 4); vector1L1S(offset_load, offset_store, i + 8); // Skip the "i + 12" step, so we do not always go over the cache line. } } // This one is to prove that the split happens on the cache line. // // Note: this is written for vectorElements = 4 public static void test2L1SVectorSkip(int offset_load, int offset_store) { for (int i = 0; i < SIZE - 64 - GRID; i += 16) { vector2L1S(offset_load, offset_store, i + 0); vector2L1S(offset_load, offset_store, i + 4); vector2L1S(offset_load, offset_store, i + 8); // Skip the "i + 12" step, so we do not always go over the cache line. } } // This one is to prove that the split happens on the cache line. // // Note: this is written for vectorElements = 4 public static void test3L1SVectorSkip(int offset_load, int offset_store) { for (int i = 0; i < SIZE - 64 - GRID; i += 16) { vector3L1S(offset_load, offset_store, i + 0); vector3L1S(offset_load, offset_store, i + 4); vector3L1S(offset_load, offset_store, i + 8); // Skip the "i + 12" step, so we do not always go over the cache line. } } // Requires aliasing analysis runtime check JDK-8324751 to vectorize. public static void test1L1SScalar(int offset_load, int offset_store) { for (int i = 0; i < SIZE - GRID; i++) { int v = arr1[base1 + i + offset_load]; arr0[base0 + i + offset_store] = v; } } // Requires aliasing analysis runtime check JDK-8324751 to vectorize. public static void test2L1SScalar(int offset_load, int offset_store) { for (int i = 0; i < SIZE - GRID; i++) { int v0 = arr1[base1 + i + offset_load]; int v1 = arr2[base2 + i + offset_load]; var v = v0 + v1; arr0[base0 + i + offset_store] = v; } } // Requires aliasing analysis runtime check JDK-8324751 to vectorize. public static void test3L1SScalar(int offset_load, int offset_store) { for (int i = 0; i < SIZE - GRID; i++) { int v0 = arr1[base1 + i + offset_load]; int v1 = arr2[base2 + i + offset_load]; int v2 = arr3[base3 + i + offset_load]; var v = v0 + v1 + v2; arr0[base0 + i + offset_store] = v; } } // Vectorizes even without JDK-8324751, but requires -XX:LoopUnrollLimit=10000 because loop body is large. // Automatic alignment is ineffective here, because of the hand-unrolling -> pre-loop cannot change alignment. // Gets us some funky patterns, as automatic alignment sometimes seems to actually make things slightly worse. // // Note: this test does not react to vectorElements. public static void test2L1SScalarRearranged(int offset_load, int offset_store) { for (int i = 0; i < SIZE - 4 - GRID; i+=4) { int v00 = arr1[base1 + i + offset_load + 0]; int v10 = arr1[base1 + i + offset_load + 1]; int v20 = arr1[base1 + i + offset_load + 2]; int v30 = arr1[base1 + i + offset_load + 3]; int v01 = arr2[base2 + i + offset_load + 0]; int v11 = arr2[base2 + i + offset_load + 1]; int v21 = arr2[base2 + i + offset_load + 2]; int v31 = arr2[base2 + i + offset_load + 3]; var v0 = v00 + v01; var v1 = v10 + v11; var v2 = v20 + v21; var v3 = v30 + v31; arr0[base0 + i + offset_store + 0] = v0; arr0[base0 + i + offset_store + 1] = v1; arr0[base0 + i + offset_store + 2] = v2; arr0[base0 + i + offset_store + 3] = v3; } } } ------------- Commit messages: - Merge branch 'master' into JDK-8355094-SW-alignment - improve benchmarks - rename bench - more comments - fix whitespace - another benchmark - JDK-8355094 Changes: https://git.openjdk.org/jdk/pull/25065/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25065&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355094 Stats: 341 lines in 4 files changed: 340 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25065.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25065/head:pull/25065 PR: https://git.openjdk.org/jdk/pull/25065 From mhaessig at openjdk.org Thu May 15 08:29:52 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 15 May 2025 08:29:52 GMT Subject: RFR: 8355094: Performance drop in auto-vectorized kernel due to split store In-Reply-To: References: Message-ID: On Tue, 6 May 2025 13:21:30 GMT, Emanuel Peter wrote: > **Summary** > > Before [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155) / https://github.com/openjdk/jdk/pull/18822, we used to prefer aligning to stores. But in that change, I removed that preference, and since then we have been aligning to loads instead (there is no preference, but since loads usually come before stores in the loop body, the load gets picked). This lead to a performance regression, especially on `x64`. > > Especially on `x64`, it is more important to align stores than aligning loads. This is because memory operations that cross a cacheline boundary are split. And `x64` CPU's generally have more throughput for loads than for stores, so splitting a store is worse than splitting a load. > > On `aarch64`, the results are less clear. On two machines, the differences were marginal, but surprisingly aligning to loads was marginally faster. On another machine, aligning to stores was significantly faster. I suspect performance depends on the exact `aarch64` implementation. I'm not an `aarch64` specialist, and only have access to a limited number of machines. > > **Fix**: make automatic alignment configurable with `SuperWordAutomaticAlignment` (no alignment, align to store, align to load). Default is align to store. > > For now, I will just align to stores on all platforms. If someone has various `aarch64` machines, they are welcome do do deeper investigations. Same for other platforms. We could always turn the flag into a platform dependent one, and set different defaults depending on the exact CPU. > > If you are interested you can read my investigations/benchmark results below. Therre are a lot of colorful plots ? ? > > **FYI about Vector API:** if you are working with the Vector API, you may also want to worry about **alignment**, because there can be a **significant performance impact** (30%+ in some cases). You may also want to know about **4k aliasing**, discussed below. > > **Shoutout:** > - @jatin-bhateja filed the regression, and explained that it was about split stores. > - @mhaessig helped me talk through some of the early benchmarks. > - @iwanowww pointed me to the 4k aliasing explanation. > > -------------------- > > **Introduction** > > I had long lived with the **theory that on modern CPUs, misalignment has no consequence, especially no performance impact**. When you google, many sources say that misalignment used to be an issue on older CPUs, but not any more. > > That may **technically** be true: > - A misaligned load or store that does not cross a cacheline b... Thank you for the deep investigation, the excellent report, and most of all the colorful plots! I found a typo, but otherwise the hotspot changes look good to me. I cannot review the benchmarks, unfortunately. src/hotspot/share/opto/superword.cpp line 2676: > 2674: // it is worse if a store is split, and less bad if a load is split. > 2675: // By default, we have SuperWordAutomaticAlignment=1, i.e. we align with a > 2676: // load if possible, to avoid splitting that load. Suggestion: // By default, we have SuperWordAutomaticAlignment=1, i.e. we align with a // store if possible, to avoid splitting that store. That conflicts with what the documentation in `c2_globals.hpp` says. ------------- PR Review: https://git.openjdk.org/jdk/pull/25065#pullrequestreview-2842727472 PR Review Comment: https://git.openjdk.org/jdk/pull/25065#discussion_r2090573854 From rcastanedalo at openjdk.org Thu May 15 08:30:04 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 15 May 2025 08:30:04 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: <1KfpDttB7jzWLna3uL6ZT2yvC43YL_2qvYSfS61ctXs=.4a9e5e00-4ba1-4a53-9056-02420c2890b4@github.com> Message-ID: On Wed, 14 May 2025 15:03:13 GMT, Emanuel Peter wrote: >> I remember discussing with @chhagedorn about what goes in the README and what in the Java file. Personally, I think as much as possible should go into the Java file. I think @chhagedorn prefers the README. So I ended up with a bit of a compromise, putting the same introduction in both places, but doing the details in the Java file. And I think I want to keep it this way. Unless @chhagedorn is ok with it if we just remove all but the first paragraph, and only refer to the Java file docs. >> >> What I'll do now anyway, based on your suggestion @robcasloz : link from the README to the Java file. > > And: in the Java file, I move the comment about the CompileFramework up, to mirror the README. Fair enough, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2090595921 From rcastanedalo at openjdk.org Thu May 15 08:37:59 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 15 May 2025 08:37:59 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: <7QJJnZqTbVS7CwiGylXaI_7r1yCeQcH7JqPUechIB4A=.c40078f0-89b4-4c9b-a744-7aa429a35100@github.com> References: <7QJJnZqTbVS7CwiGylXaI_7r1yCeQcH7JqPUechIB4A=.c40078f0-89b4-4c9b-a744-7aa429a35100@github.com> Message-ID: On Wed, 14 May 2025 14:49:24 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 394: >> >>> 392: * @throws RendererException if there is a duplicate hashtag {@code key}. >>> 393: */ >>> 394: static TemplateBody let(String key, T value, Function function) { >> >> I found it a bit confusing to find two methods called `let` that are pretty different in nature. Maybe you could rename this one to e.g. `letIn`? > > @robcasloz They do pretty much the same though, they allow you to set a hashtag replacement. It is just a question of where you can place it, and if it captures the value in a Java variable as well. > > What do you mean to suggest with the name `setIn`? Note that I suggest the name `letIn`, not `setIn`. My intuition is that the variant with a third argument binds `key` to `value` **in** the scope of a `function` that is given explicitly, hence the suggestion to call it `letIn` instead of just `let`. But it's just a suggestion, feel free to disregard if you don't think it fits. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2090610270 From shade at openjdk.org Thu May 15 08:40:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 May 2025 08:40:51 GMT Subject: RFR: 8356946: x86: Optimize interpreter profile updates In-Reply-To: References: Message-ID: On Wed, 14 May 2025 16:24:23 GMT, Vladimir Kozlov wrote: >> Noticed two awkward things in current x86 interpreter profiling code. >> >> First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. >> >> Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. >> >> So we can save a few instructions / memory accesses on this path. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `tier1` >> - [ ] Linux x86_64 server fastdebug, `all` > > src/hotspot/cpu/x86/interp_masm_x86.hpp line 217: > >> 215: void increment_mdp_data_at(Address data, bool decrement = false); >> 216: void increment_mdp_data_at(Register mdp_in, int constant, >> 217: bool decrement = false); > > `decrement` is never used? Yes, decrement is never used, it's a dead code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25223#discussion_r2090616558 From rcastanedalo at openjdk.org Thu May 15 09:02:05 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 15 May 2025 09:02:05 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: <7QJJnZqTbVS7CwiGylXaI_7r1yCeQcH7JqPUechIB4A=.c40078f0-89b4-4c9b-a744-7aa429a35100@github.com> References: <7QJJnZqTbVS7CwiGylXaI_7r1yCeQcH7JqPUechIB4A=.c40078f0-89b4-4c9b-a744-7aa429a35100@github.com> Message-ID: On Wed, 14 May 2025 14:48:01 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 461: >> >>> 459: * @param name The {@link Name} to be added to the current code frame. >>> 460: * @return The token that performs the defining action. >>> 461: */ >> >> The concept of "code frame" is not clear here, maybe you can introduce it or replace it by a concept that is already defined? > > I specified it a little: > > ~ 453 * Add a {@link Name} in the current scope, i.e. the innermost of either > + 454 * {@link Template#body} or {@link Hook#set}. Thanks, "scope" works better! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2090659990 From dlunden at openjdk.org Thu May 15 09:05:00 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 15 May 2025 09:05:00 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v6] In-Reply-To: References: Message-ID: <-7Kl5q24ZQMhnJ0pxjq5AhpIhW0fiL7ZBd7QogZGrWc=.4b915ee2-f5e4-4ac3-9ddb-a6ce701f5ad9@github.com> On Mon, 12 May 2025 11:39:28 GMT, Christian Hagedorn wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/gcm.cpp >> >> Co-authored-by: Roberto Casta?eda Lozano > > src/hotspot/share/opto/gcm.cpp line 685: > >> 683: // path relative to the load if there are no paths from early to LCA that go >> 684: // through the store's block. Such stores are not anti-dependent, and there is >> 685: // no need to update the LCA nor to add anti-dependence edges. > > Suggestion: > > // no need to update the load's LCA nor to add anti-dependence edges. For consistency, we would need to then also change "the LCA" to "the load's LCA" in many more places throughout the method, which increases verbosity. Is it not OK to simply use "LCA", since we have defined that this is the load's LCA at the start? We don't mention any other LCAs, so I don't think there'll be any confusion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090665625 From epeter at openjdk.org Thu May 15 09:21:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 May 2025 09:21:34 GMT Subject: RFR: 8355094: Performance drop in auto-vectorized kernel due to split store [v2] In-Reply-To: References: Message-ID: > **Summary** > > Before [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155) / https://github.com/openjdk/jdk/pull/18822, we used to prefer aligning to stores. But in that change, I removed that preference, and since then we have been aligning to loads instead (there is no preference, but since loads usually come before stores in the loop body, the load gets picked). This lead to a performance regression, especially on `x64`. > > Especially on `x64`, it is more important to align stores than aligning loads. This is because memory operations that cross a cacheline boundary are split. And `x64` CPU's generally have more throughput for loads than for stores, so splitting a store is worse than splitting a load. > > On `aarch64`, the results are less clear. On two machines, the differences were marginal, but surprisingly aligning to loads was marginally faster. On another machine, aligning to stores was significantly faster. I suspect performance depends on the exact `aarch64` implementation. I'm not an `aarch64` specialist, and only have access to a limited number of machines. > > **Fix**: make automatic alignment configurable with `SuperWordAutomaticAlignment` (no alignment, align to store, align to load). Default is align to store. > > For now, I will just align to stores on all platforms. If someone has various `aarch64` machines, they are welcome do do deeper investigations. Same for other platforms. We could always turn the flag into a platform dependent one, and set different defaults depending on the exact CPU. > > If you are interested you can read my investigations/benchmark results below. Therre are a lot of colorful plots ? ? > > **FYI about Vector API:** if you are working with the Vector API, you may also want to worry about **alignment**, because there can be a **significant performance impact** (30%+ in some cases). You may also want to know about **4k aliasing**, discussed below. > > **Shoutout:** > - @jatin-bhateja filed the regression, and explained that it was about split stores. > - @mhaessig helped me talk through some of the early benchmarks. > - @iwanowww pointed me to the 4k aliasing explanation. > > -------------------- > > **Introduction** > > I had long lived with the **theory that on modern CPUs, misalignment has no consequence, especially no performance impact**. When you google, many sources say that misalignment used to be an issue on older CPUs, but not any more. > > That may **technically** be true: > - A misaligned load or store that does not cross a cacheline b... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/superword.cpp Co-authored-by: Manuel H?ssig ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25065/files - new: https://git.openjdk.org/jdk/pull/25065/files/bc71ef86..80914d8a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25065&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25065&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25065.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25065/head:pull/25065 PR: https://git.openjdk.org/jdk/pull/25065 From epeter at openjdk.org Thu May 15 09:21:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 May 2025 09:21:34 GMT Subject: RFR: 8355094: Performance drop in auto-vectorized kernel due to split store [v2] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 08:15:24 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/superword.cpp >> >> Co-authored-by: Manuel H?ssig > > src/hotspot/share/opto/superword.cpp line 2676: > >> 2674: // it is worse if a store is split, and less bad if a load is split. >> 2675: // By default, we have SuperWordAutomaticAlignment=1, i.e. we align with a >> 2676: // load if possible, to avoid splitting that load. > > Suggestion: > > // By default, we have SuperWordAutomaticAlignment=1, i.e. we align with a > // store if possible, to avoid splitting that store. > > That conflicts with what the documentation in `c2_globals.hpp` says. Yikes! Thanks for the catch! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25065#discussion_r2090697049 From dlunden at openjdk.org Thu May 15 09:21:57 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 15 May 2025 09:21:57 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v6] In-Reply-To: References: Message-ID: <3MDXPZrZdrlYV-jdjA_uK6gONBoQOh2ld-6-MpncqA8=.c50c8d1d-9baf-496a-ae2c-412901b27118@github.com> On Mon, 12 May 2025 12:44:48 GMT, Christian Hagedorn wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/gcm.cpp >> >> Co-authored-by: Roberto Casta?eda Lozano > > src/hotspot/share/opto/gcm.cpp line 762: > >> 760: // and other inputs are first available. (Computed by schedule_early.) >> 761: // For normal loads, 'early' is the shallowest place (dom graph wise) >> 762: // to look for anti-deps between this load and any store. > > Just noticed when reading through the method. Cannot suggest since it's hidden: > L766-768: > - different than the schedule_early block in that it could be -> different from the schedule_early block when it is > - anti-dependences -> anti-dependencies. A note on "dependencies" and "dependences": these are plural forms of "dependency" and "dependence", respectively. From what I can tell, the terms can be used more or less interchangeably, but there are some subtle differences. I will not advocate for one or the other, but as can be seen from the method name itself, other related method names, and pre-existing source code comments, the currently chosen term is "dependence". The plural is therefore "dependences", and not "dependencies". > src/hotspot/share/opto/gcm.cpp line 797: > >> 795: // The input load uses some memory state (initial_mem). >> 796: Node* initial_mem = load->in(MemNode::Memory); >> 797: // To find anti-dependences we must look for users of the same memory state. > > Suggestion: > > // To find anti-dependencies, we must look for users of the same memory state. See other comment above! > src/hotspot/share/opto/gcm.cpp line 964: > >> 962: if (use_mem_state->is_Phi()) { >> 963: // We have reached a memory Phi node. On our search from initial_mem to >> 964: // the Phi, we have found no anti-dependences (otherwise, we would have > > Suggestion: > > // the Phi, we have found no anti-dependencies (otherwise, we would have See other comment above! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090694025 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090695143 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090697718 From dlunden at openjdk.org Thu May 15 09:34:39 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 15 May 2025 09:34:39 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v7] In-Reply-To: References: Message-ID: > The current documentation for `PhaseCFG::insert_anti_dependences` is difficult to follow and sometimes even misleading. We should ensure the method is appropriately documented. > > ### Changeset > > - Rename `PhaseCFG::insert_anti_dependences` to `PhaseCFG::raise_above_anti_dependences`. The purpose of `PhaseCFG::raise_above_anti_dependences` is twofold: raise the load's LCA so that the load is scheduled before anti-dependent stores, and if necessary add anti-dependence edges between the load and certain anti-dependent stores (to ensure we later "raise" the load before anti-dependent stores in LCM). The name `PhaseCFG::insert_anti_dependences` suggests that we only add anti-dependence edges. The name `PhaseCFG::raise_above_anti_dependences`, therefore, seems more appropriate. > - Significantly add to and revise the source code documentation of `PhaseCFG::raise_above_anti_dependences`. > - Add, move, and revise `assert`s in `PhaseCFG::raise_above_anti_dependences`, including improved `assert` messages in a few places. > - In the main worklist loop of `PhaseCFG::raise_above_anti_dependences`: > - Clean up how we identify the search root (avoid mutation). > - Add a missing early exit for `Phi` nodes when `LCA == early`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14706896111) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Update after comments from Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24926/files - new: https://git.openjdk.org/jdk/pull/24926/files/11e37390..aec59a15 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24926&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24926&range=05-06 Stats: 34 lines in 1 file changed: 2 ins; 1 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/24926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24926/head:pull/24926 PR: https://git.openjdk.org/jdk/pull/24926 From dlunden at openjdk.org Thu May 15 09:34:41 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 15 May 2025 09:34:41 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v6] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 11:26:58 GMT, Christian Hagedorn wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/gcm.cpp >> >> Co-authored-by: Roberto Casta?eda Lozano > > src/hotspot/share/opto/gcm.cpp line 666: > >> 664: }; >> 665: >> 666: //------------------------raise_above_anti_dependences--------------------------- > > These legacy headers can be removed when we touch them. > Suggestion: Thanks, fixed! > src/hotspot/share/opto/gcm.cpp line 667: > >> 665: >> 666: //------------------------raise_above_anti_dependences--------------------------- >> 667: // Enforce a scheduling of the argument load that ensures anti-dependent stores > > I suggest to add `'` to make a mapping to the parameter: > Suggestion: > > // Enforce a scheduling of the given 'load' that ensures anti-dependent stores > > > Same on L670 but somehow I cannot make a suggestion there. Maybe due to the existing PR comment? Thanks, fixed! > src/hotspot/share/opto/gcm.cpp line 672: > >> 670: // The argument load has a current scheduling range in the dominator tree that >> 671: // starts at the load's early block (computed in schedule_early) and ends at >> 672: // the argument LCA block. However, there may still exist anti-dependent stores > > argument = load? > Suggestion: > > // the load's LCA block. However, there may still exist anti-dependent stores LCA is actually a separate argument to the method. Rewrote it a bit now so it is hopefully clearer, have a look and see what you think. > src/hotspot/share/opto/gcm.cpp line 673: > >> 671: // starts at the load's early block (computed in schedule_early) and ends at >> 672: // the argument LCA block. However, there may still exist anti-dependent stores >> 673: // in between the early block and the LCA that overwrite memory that the load > > Suggestion: > > // between the early block and the LCA that overwrite memory that the load Thanks, fixed! > src/hotspot/share/opto/gcm.cpp line 679: > >> 677: // latest in the store's block, and >> 678: // 2. if the load may get scheduled in the store's block, additionally insert >> 679: // an anti-dependence edge from the load to the store to ensure LCM > > Maybe you can also mention here that this is done by adding a precedence edge: > > Suggestion: > > // an anti-dependence edge (i.e. precedence edge) from the load to the store to ensure LCM Thanks, added! > src/hotspot/share/opto/gcm.cpp line 721: > >> 719: // >> 720: // The raise_above_anti_dependences method returns the updated LCA and ensures >> 721: // there are no anti-dependent stores between the load's early block and the > > Maybe to be explicit: > Suggestion: > > // The raise_above_anti_dependences method returns the updated LCA and ensures > // there are no anti-dependent stores in any block between the load's early block and the Thanks, added! > src/hotspot/share/opto/gcm.cpp line 724: > >> 722: // updated LCA. Any stores in the updated LCA will have new anti-dependence >> 723: // edges back to the load. The caller may schedule the load in the LCA, or it >> 724: // may hoist the load above the LCA, if it is not the early block. > > Suggestion: > > // may hoist the load above the LCA, if the updated LCA is not the early block. Thanks, updated. I also changed other occurrences of "the LCA" in this paragraph to "the updated LCA" for consistency. > src/hotspot/share/opto/gcm.cpp line 725: > >> 723: // edges back to the load. The caller may schedule the load in the LCA, or it >> 724: // may hoist the load above the LCA, if it is not the early block. >> 725: Block* PhaseCFG::raise_above_anti_dependences(Block* LCA, Node* load, bool verify) { > > Could not hurt: > Suggestion: > > Block* PhaseCFG::raise_above_anti_dependences(Block* LCA, Node* load, const bool verify) { Thanks, fixed! > src/hotspot/share/opto/gcm.cpp line 758: > >> 756: node_idx_t load_index = load->_idx; >> 757: >> 758: // Note the earliest legal placement of 'load', as determined by > > Note = get? > > Suggestion: > > // Get the earliest legal placement of 'load', as determined by I changed to "Record the earliest ..." instead as I think that is more clear. > src/hotspot/share/opto/gcm.cpp line 760: > >> 758: // Note the earliest legal placement of 'load', as determined by >> 759: // the unique point in the dominator tree where all memory effects >> 760: // and other inputs are first available. (Computed by schedule_early.) > > Suggestion: > > // and other inputs are first available (computed by schedule_early). Thanks, fixed! Also made some other edits in this comment. > src/hotspot/share/opto/gcm.cpp line 780: > >> 778: ResourceArea* area = Thread::current()->resource_area(); >> 779: >> 780: // Bookkeeping of possibly anti-dependent stores that we find outside of the > > Suggestion: > > // Bookkeeping of possibly anti-dependent stores that we find outside the Thanks, fixed! > src/hotspot/share/opto/gcm.cpp line 854: > >> 852: // - just past a MergeMem with the edge (MergeMem, use_mem_state). >> 853: assert(def_mem_state == nullptr || def_mem_state == initial_mem || >> 854: def_mem_state->is_MergeMem(), > > Suggestion: > > assert(def_mem_state == nullptr || def_mem_state == initial_mem || > def_mem_state->is_MergeMem(), Thanks, fixed! > src/hotspot/share/opto/gcm.cpp line 857: > >> 855: "unexpected memory state"); >> 856: >> 857: uint op = use_mem_state->Opcode(); > > For good measure: > Suggestion: > > const uint op = use_mem_state->Opcode(); Thanks, fixed! > src/hotspot/share/opto/gcm.cpp line 894: > >> 892: >> 893: assert(!use_mem_state->is_MergeMem(), >> 894: "use_mem_state should be either a store or a memory Phi"); > > Suggestion: > > assert(!use_mem_state->is_MergeMem(), > "use_mem_state should be either a store or a memory Phi"); Thanks, fixed! > src/hotspot/share/opto/gcm.cpp line 973: > >> 971: // |?? >> 972: // ||| >> 973: // Phi > > How about: > Suggestion: > > // def_mem_state > // | > // | ? ? > // \ | / > // Phi Yes, looks better, thanks (fixed). > src/hotspot/share/opto/gcm.cpp line 1026: > >> 1024: } else if (use_mem_state_block != early) { >> 1025: // We found an anti-dependent store outside the load's 'early' block. >> 1026: // The store may be between the current LCA and earliest possible block > > Suggestion: > > // The store may be between the current LCA and the earliest possible block Thanks, fixed! > src/hotspot/share/opto/gcm.cpp line 1054: > >> 1052: } >> 1053: } >> 1054: // Worklist is now empty; we have visited all possible anti-dependences. > > Suggestion: > > // Worklist is now empty; we have visited all possible anti-dependencies. See other comment above! > src/hotspot/share/opto/gcm.cpp line 1058: > >> 1056: // Finished if 'load' must be scheduled in its 'early' block. >> 1057: // If we found any stores there, they have already been given >> 1058: // precedence edges. > > Might be clearer since we always talked about anti-dependency edges while the concept to implement them are precedence edges. > Suggestion: > > // anti-dependency edges. Yes, best to be consistent. Fixed (although I changed to anti-dependence, see other comment above). > src/hotspot/share/opto/gcm.cpp line 1087: > >> 1085: // load from sinking past any block containing a store that may overwrite >> 1086: // memory that the load must witness. >> 1087: > > Suggestion: > > // > // The raised LCA will be a lower bound for placing the load, preventing the > // load from sinking past any block containing a store that may overwrite > // memory that the load must witness. > // Thanks, fixed! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090706887 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090707290 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090708819 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090709124 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090709509 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090710421 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090711906 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090712169 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090714079 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090715205 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090715489 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090717606 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090717853 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090717990 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090718600 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090718878 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090719119 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090721321 PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090721476 From dlunden at openjdk.org Thu May 15 09:34:41 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 15 May 2025 09:34:41 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v6] In-Reply-To: <3MDXPZrZdrlYV-jdjA_uK6gONBoQOh2ld-6-MpncqA8=.c50c8d1d-9baf-496a-ae2c-412901b27118@github.com> References: <3MDXPZrZdrlYV-jdjA_uK6gONBoQOh2ld-6-MpncqA8=.c50c8d1d-9baf-496a-ae2c-412901b27118@github.com> Message-ID: On Thu, 15 May 2025 09:16:58 GMT, Daniel Lund?n wrote: >> src/hotspot/share/opto/gcm.cpp line 762: >> >>> 760: // and other inputs are first available. (Computed by schedule_early.) >>> 761: // For normal loads, 'early' is the shallowest place (dom graph wise) >>> 762: // to look for anti-deps between this load and any store. >> >> Just noticed when reading through the method. Cannot suggest since it's hidden: >> L766-768: >> - different than the schedule_early block in that it could be -> different from the schedule_early block when it is >> - anti-dependences -> anti-dependencies. > > A note on "dependencies" and "dependences": these are plural forms of "dependency" and "dependence", respectively. From what I can tell, the terms can be used more or less interchangeably, but there are some subtle differences. I will not advocate for one or the other, but as can be seen from the method name itself, other related method names, and pre-existing source code comments, the currently chosen term is "dependence". The plural is therefore "dependences", and not "dependencies". > * different than the schedule_early block in that it could be -> different from the schedule_early block when it is Thanks, fixed! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090716265 From jsjolen at openjdk.org Thu May 15 09:54:53 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 15 May 2025 09:54:53 GMT Subject: RFR: 8356946: x86: Optimize interpreter profile updates In-Reply-To: References: Message-ID: On Wed, 14 May 2025 09:48:54 GMT, Aleksey Shipilev wrote: > Noticed two awkward things in current x86 interpreter profiling code. > > First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. > > Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. > > So we can save a few instructions / memory accesses on this path. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [ ] Linux x86_64 server fastdebug, `all` Marked as reviewed by jsjolen (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25223#pullrequestreview-2843046293 From shade at openjdk.org Thu May 15 10:48:09 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 15 May 2025 10:48:09 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v17] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: - Merge branch 'master' into JDK-8231269-compile-task-weaks - Rename CompilerTask::is_unloaded back to avoid losing comment context - Simplify select_for_compilation - Merge branch 'master' into JDK-8231269-compile-task-weaks - More touchups - Fix release builds - More thorough locking and redefinition escape hatch - Fix build failures: add more headers - Tracking UMH state more accurately - Rework for safer concurrency - ... and 20 more: https://git.openjdk.org/jdk/compare/5c73dfc2...4d33a4d5 ------------- Changes: https://git.openjdk.org/jdk/pull/24018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=16 Stats: 422 lines in 12 files changed: 379 ins; 19 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From avoitylov at openjdk.org Thu May 15 11:09:06 2025 From: avoitylov at openjdk.org (Aleksei Voitylov) Date: Thu, 15 May 2025 11:09:06 GMT Subject: Integrated: 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on Cortex-A53 In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 12:39:40 GMT, Aleksei Voitylov wrote: > The root of the problem is that VectorizedHashCode intrinsic introduced by JDK-8341194 is not aware of JDK-8079203. JDK-8079203 generates additional nop with madd instruction on Cortex-A53 as a workaround for Cortex-A53 erratum 835769 "AArch64 multiply-accumulate instruction might produce incorrect result". Current VectorizedHashCode intrinsic calculates byte offset to jump inside the unrolled loop code. It assumes 2 instructions per each unrolled iteration (load and madd). JDK-8079203 adds additional nop for Cortex-A53, which breaks offset calculation logic. > ? > Current offset calculation logic is using shift instead of multiplication, power-of-2 number instructions are present in each unrolled loop iteration. To keep it simple, this fix adds one more nop into each loop iteration on Cortex-A53 in order to have 4 instruction per iteration, which is also a power-of-2. To account for that, the shift argument for offset calculation logic is increased by 1, because each loop iteration has 2 times more instructions on Cortex-A53. > ? > This fix is tested on Raspberry Pi 3 (based on Cortex-A53) by running initially reported application and by running hotspot jtreg tests (not a single test could be run on Cortex-A53 before the fix). After the fix, the specialized test hotspot/jtreg/compiler/intrinsics/TestArraysHashCode.java passes. > > The performance gain from the intrinsic is also observed on Cortex-A53 using the ArraysHashCode benchmark. This pull request has now been integrated. Changeset: 883e52aa Author: Aleksei Voitylov Committer: Dmitry Chuyko URL: https://git.openjdk.org/jdk/commit/883e52aa105727f4bc852d1497e049b689695152 Stats: 14 lines in 2 files changed: 12 ins; 0 del; 2 mod 8353237: [AArch64] Incorrect result of VectorizedHashCode intrinsic on Cortex-A53 Reviewed-by: aph ------------- PR: https://git.openjdk.org/jdk/pull/24489 From chagedorn at openjdk.org Thu May 15 11:41:58 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 15 May 2025 11:41:58 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v6] In-Reply-To: <-7Kl5q24ZQMhnJ0pxjq5AhpIhW0fiL7ZBd7QogZGrWc=.4b915ee2-f5e4-4ac3-9ddb-a6ce701f5ad9@github.com> References: <-7Kl5q24ZQMhnJ0pxjq5AhpIhW0fiL7ZBd7QogZGrWc=.4b915ee2-f5e4-4ac3-9ddb-a6ce701f5ad9@github.com> Message-ID: On Thu, 15 May 2025 09:02:12 GMT, Daniel Lund?n wrote: >> src/hotspot/share/opto/gcm.cpp line 685: >> >>> 683: // path relative to the load if there are no paths from early to LCA that go >>> 684: // through the store's block. Such stores are not anti-dependent, and there is >>> 685: // no need to update the LCA nor to add anti-dependence edges. >> >> Suggestion: >> >> // no need to update the load's LCA nor to add anti-dependence edges. > > For consistency, we would need to then also change "the LCA" to "the load's LCA" in many more places throughout the method, which increases verbosity. Is it not OK to simply use "LCA", since we have defined that this is the load's LCA at the start? We don't mention any other LCAs, so I don't think there'll be any confusion. I matched that with the comment suggestion earlier but you changed that now to a better version. So, I agree with you here, let's leave it as it is :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090973749 From dlunden at openjdk.org Thu May 15 11:48:55 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 15 May 2025 11:48:55 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v6] In-Reply-To: References: <-7Kl5q24ZQMhnJ0pxjq5AhpIhW0fiL7ZBd7QogZGrWc=.4b915ee2-f5e4-4ac3-9ddb-a6ce701f5ad9@github.com> Message-ID: On Thu, 15 May 2025 11:39:20 GMT, Christian Hagedorn wrote: >> For consistency, we would need to then also change "the LCA" to "the load's LCA" in many more places throughout the method, which increases verbosity. Is it not OK to simply use "LCA", since we have defined that this is the load's LCA at the start? We don't mention any other LCAs, so I don't think there'll be any confusion. > > I matched that with the comment suggestion earlier but you changed that now to a better version. So, I agree with you here, let's leave it as it is :-) Ah, I see. Good! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2090979211 From chagedorn at openjdk.org Thu May 15 11:48:54 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 15 May 2025 11:48:54 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v7] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 09:34:39 GMT, Daniel Lund?n wrote: >> The current documentation for `PhaseCFG::insert_anti_dependences` is difficult to follow and sometimes even misleading. We should ensure the method is appropriately documented. >> >> ### Changeset >> >> - Rename `PhaseCFG::insert_anti_dependences` to `PhaseCFG::raise_above_anti_dependences`. The purpose of `PhaseCFG::raise_above_anti_dependences` is twofold: raise the load's LCA so that the load is scheduled before anti-dependent stores, and if necessary add anti-dependence edges between the load and certain anti-dependent stores (to ensure we later "raise" the load before anti-dependent stores in LCM). The name `PhaseCFG::insert_anti_dependences` suggests that we only add anti-dependence edges. The name `PhaseCFG::raise_above_anti_dependences`, therefore, seems more appropriate. >> - Significantly add to and revise the source code documentation of `PhaseCFG::raise_above_anti_dependences`. >> - Add, move, and revise `assert`s in `PhaseCFG::raise_above_anti_dependences`, including improved `assert` messages in a few places. >> - In the main worklist loop of `PhaseCFG::raise_above_anti_dependences`: >> - Clean up how we identify the search root (avoid mutation). >> - Add a missing early exit for `Phi` nodes when `LCA == early`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14706896111) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Update after comments from Christian Thanks for doing all the updates, looks good to me now! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24926#pullrequestreview-2843388295 From rcastanedalo at openjdk.org Thu May 15 11:58:55 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 15 May 2025 11:58:55 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v7] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 09:34:39 GMT, Daniel Lund?n wrote: >> The current documentation for `PhaseCFG::insert_anti_dependences` is difficult to follow and sometimes even misleading. We should ensure the method is appropriately documented. >> >> ### Changeset >> >> - Rename `PhaseCFG::insert_anti_dependences` to `PhaseCFG::raise_above_anti_dependences`. The purpose of `PhaseCFG::raise_above_anti_dependences` is twofold: raise the load's LCA so that the load is scheduled before anti-dependent stores, and if necessary add anti-dependence edges between the load and certain anti-dependent stores (to ensure we later "raise" the load before anti-dependent stores in LCM). The name `PhaseCFG::insert_anti_dependences` suggests that we only add anti-dependence edges. The name `PhaseCFG::raise_above_anti_dependences`, therefore, seems more appropriate. >> - Significantly add to and revise the source code documentation of `PhaseCFG::raise_above_anti_dependences`. >> - Add, move, and revise `assert`s in `PhaseCFG::raise_above_anti_dependences`, including improved `assert` messages in a few places. >> - In the main worklist loop of `PhaseCFG::raise_above_anti_dependences`: >> - Clean up how we identify the search root (avoid mutation). >> - Add a missing early exit for `Phi` nodes when `LCA == early`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14706896111) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Update after comments from Christian Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24926#pullrequestreview-2843418112 From roland at openjdk.org Thu May 15 12:09:23 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 15 May 2025 12:09:23 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v6] In-Reply-To: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: <8BJorsTgiK1pTElabu0NZFko5n4mlpAhadlt87w_v2s=.f19e86d1-d646-46eb-860f-cbbadf37ada3@github.com> > An `Initialize` node for an `Allocate` node is created with a memory > `Proj` of adr type raw memory. In order for stores to be captured, the > memory state out of the allocation is a `MergeMem` with slices for the > various object fields/array element set to the raw memory `Proj` of > the `Initialize` node. If `Phi`s need to be created during later > transformations from this memory state, The `Phi` for a particular > slice gets its adr type from the type of the `Proj` which is raw > memory. If during macro expansion, the `Allocate` is found to have no > use and so can be removed, the `Proj` out of the `Initialize` is > replaced by the memory state on input to the `Allocate`. A `Phi` for > some slice for a field of an object will end up with the raw memory > state on input to the `Allocate` node. As a result, memory state at > the `Phi` is incorrect and incorrect execution can happen. > > The fix I propose is, rather than have a single `Proj` for the memory > state out of the `Initialize` with adr type raw memory, to use one > `Proj` per slice added to the memory state after the `Initalize`. Each > of the `Proj` should return the right adr type for its slice. For that > I propose having a new type of `Proj`: `NarrowMemProj` that captures > the right adr type. > > Logic for the construction of the `Allocate`/`Initialize` subgraph is > tweaked so the right adr type captured in is own `NarrowMemProj` is > added to the memory sugraph. Code that removes an allocation or moves > it also has to be changed so it correctly takes the multiple memory > projections out of the `Initialize` node into account. > > One tricky issue is that when EA split types for a scalar replaceable > `Allocate` node: > > 1- the adr type captured in the `NarrowMemProj` becomes out of sync > with the type of the slices for the allocation > > 2- before EA, the memory state for one particular field out of the > `Initialize` node can be used for a `Store` to the just allocated > object or some other. So we can have a chain of `Store`s, some to > the newly allocated object, some to some other objects, all of them > using the state of `NarrowMemProj` out of the `Initialize`. After > split unique types, the `NarrowMemProj` is for the slice of a > particular allocation. So `Store`s to some other objects shouldn't > use that memory state but the memory state before the `Allocate`. > > For that, I added logic to update the adr type of `NarrowMemProj` > during split unique types and update the memory input of `Store`s that > don't depend on the memory state ... Roland Westrelin has updated the pull request incrementally with 14 additional commits since the last revision: - typo - more - more - more - more - more - more - more - more - review - ... and 4 more: https://git.openjdk.org/jdk/compare/7afc47e4...af8480c0 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24570/files - new: https://git.openjdk.org/jdk/pull/24570/files/7afc47e4..af8480c0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=04-05 Stats: 363 lines in 13 files changed: 237 ins; 52 del; 74 mod Patch: https://git.openjdk.org/jdk/pull/24570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24570/head:pull/24570 PR: https://git.openjdk.org/jdk/pull/24570 From mchevalier at openjdk.org Thu May 15 12:12:31 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 15 May 2025 12:12:31 GMT Subject: RFR: 8355488: Add stress mode for C2 loop peeling [v3] In-Reply-To: References: Message-ID: > Adding a `StressLoopPeeling` dev flag that randomize peeling. > > ## Semantics > > For now, the direction I've taken is to randomly take a decision in case of peeling, otherwise, rely on existing heuristics. > > This requires to distinguish two things: > - not inlining because it's not legal: see for instance > ```cpp > assert(cl->trip_count() > 0, "peeling a fully unrolled loop"); > ``` > in `PhaseIdealLoop::do_peeling` > - not inlining because it doesn't seem profitable. > > Peeling loops without a good reason (not containing an exiting `If` whose condition is not a member of the loop) but without a concrete way to forbid it should always be allowed. Let's stress it! > > Peeling too many times is not a great idea either. It uses a lot of memory, of nodes... Also, it may prevent other optimisations from kicking in. And what about interaction with future stress flags? Let's limit peeling: we give a fixed number of opportunities to peel before we give up on peeling for good. That is not the same as limiting the amount of peeling we do. Indeed, if we bound the number of times we say "yes, please, peel" given enough requests, we will always reach the bound. If we limit the number of requests, we have a more evenly distributed amount of peeling, between 0 and the bound. > > I've tried without the bound: I couldn't find any bug without the bound that would not reproduce with the bound. It only save some legitimate memory problems. Without a bound on the number of peeling opportunities, hotspot eats a lot of memory, but all the allocations seems reasonable: it just seems we ask too much. We could limit the number of nodes, to prevent peeling before we reach the memory limit, but that would also hinder other optimizations and (future) stress flags. > > > > ## The Flag > > The flag is very specialized, unlike a `StressLoopOpts` would be. My idea so far is "let's see". My idea is that it's good to be able to enable stress optimizations selectively, and have a flag like `StressLoopOpts` that would turn them all: we could use the general one in testing, and the finer-grain ones when debugging. A reason for that is that I don't see a real use-case for stressing some features but not others (which would make the number of combinations explode): having (for instance) `+StressLoopUnrolling +StressLoopPeeling` would sometimes behave like `+StressLoopUnrolling -StressLoopPeeling`, and so it's not very useful to test the latter. > > But once again: let's see what happens. > > > ## On the Code > > The field `_peel... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Back to PRODUCT for consistency ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25140/files - new: https://git.openjdk.org/jdk/pull/25140/files/8ce4fa35..a2dd68c9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25140&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25140&range=01-02 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25140.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25140/head:pull/25140 PR: https://git.openjdk.org/jdk/pull/25140 From roland at openjdk.org Thu May 15 12:15:55 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 15 May 2025 12:15:55 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v2] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Wed, 23 Apr 2025 11:59:12 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> TestIterativeEA fix > > src/hotspot/share/opto/escape.cpp line 4123: > >> 4121: result = result->in(MemNode::Memory); >> 4122: } >> 4123: if (!is_instance && result->Opcode() == Op_NarrowMemProj) { > > Seems you are checking for `NarrowMemProj` and casting to it in multiple places. Why not enable the macro to do `is_...` and `as_...`? Done in new commit. > src/hotspot/share/opto/escape.cpp line 4126: > >> 4124: // Memory for non known instance can safely skip over a known instance allocation (that memory state doesn't access >> 4125: // the result of an allocation for a known instance). >> 4126: assert(result->as_Proj()->_con == TypeFunc::Memory, "a NarrowMemProj can only be a memory projection"); > > Can we verify that already in the `NarrowMemProj`, i.e. its constructor? `TypeFunc::Memory` is now hardcoded in the `NarrowMemProj` constructor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2091030783 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2091033143 From roland at openjdk.org Thu May 15 12:15:56 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 15 May 2025 12:15:56 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v2] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Wed, 23 Apr 2025 12:27:51 GMT, Emanuel Peter wrote: >> Ah, or does it already get printed from the `adr_type()` i.e. the virtual method? > > Hmm, no I don't think so... right? Done in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2091030444 From mchevalier at openjdk.org Thu May 15 12:16:53 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 15 May 2025 12:16:53 GMT Subject: RFR: 8355488: Add stress mode for C2 loop peeling [v3] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 13:41:22 GMT, Marc Chevalier wrote: >> src/hotspot/share/opto/compile.cpp line 666: >> >>> 664: _congraph(nullptr), >>> 665: NOT_PRODUCT(_igv_printer(nullptr) COMMA) >>> 666: NOT_PRODUCT(_peeling_rounds_of_node(comp_arena(), 8, 0, Pair(0, 0)) COMMA) >> >> `NOT_PRODUCT` means that it's also available in the optimized build but you only want/need it in debug. > > Should I use `#ifdef ASSERT`? My thinking is that I want the code to be there whenever the flag is available. This is decided here: > https://github.com/openjdk/jdk/blob/a989245a2424d136f5d2a828eda666c3867b0f48/src/hotspot/share/runtime/flags/jvmFlag.cpp#L552-L556 > (called from `JVMFlag::find_flag` with `return_flag == false`, from `Arguments::find_jvm_flag`, from `Arguments::parse_argument` from `Arguments::process_argument`) with > https://github.com/openjdk/jdk/blob/a989245a2424d136f5d2a828eda666c3867b0f48/src/hotspot/share/runtime/flags/jvmFlag.cpp#L60-L66 > > My thinking is that it's unintuitive to me to offer a flag that could have no effect. Why don't we want it in optimized non-product build? We can still stress-peel and even if we won't hit an assert, we could still observe unexpected behaviors or crashes. Does that make sense? As discussed, I came back to `PRODUCT` for consistency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25140#discussion_r2091034546 From mchevalier at openjdk.org Thu May 15 12:16:56 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 15 May 2025 12:16:56 GMT Subject: RFR: 8355488: Add stress mode for C2 loop peeling [v3] In-Reply-To: References: Message-ID: On Tue, 13 May 2025 10:33:33 GMT, Tobias Hartmann wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Back to PRODUCT for consistency > > src/hotspot/share/opto/compile.cpp line 5295: > >> 5293: >> 5294: uint& Compile::peeling_rounds_at_node(const Node* const head) { >> 5295: for(int i = 0; i < _peeling_rounds_of_node.length(); ++i) { > > Suggestion: > > for (int i = 0; i < _peeling_rounds_of_node.length(); ++i) { This code is gone. > src/hotspot/share/opto/compile.cpp line 5297: > >> 5295: for(int i = 0; i < _peeling_rounds_of_node.length(); ++i) { >> 5296: auto& head_and_round_count = _peeling_rounds_of_node.at(i); >> 5297: if(head_and_round_count.first == head->_idx) { > > Suggestion: > > if (head_and_round_count.first == head->_idx) { This code is gone. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25140#discussion_r2091035138 PR Review Comment: https://git.openjdk.org/jdk/pull/25140#discussion_r2091035362 From roland at openjdk.org Thu May 15 12:20:00 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 15 May 2025 12:20:00 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v2] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Wed, 23 Apr 2025 12:29:06 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> TestIterativeEA fix > > test/hotspot/jtreg/compiler/macronodes/TestEliminationOfAllocationWithoutUse.java line 39: > >> 37: * @run main/othervm -Xcomp >> 38: * -XX:CompileCommand=compileonly,compiler.macronodes.TestEliminationOfAllocationWithoutUse::test* >> 39: * compiler.macronodes.TestEliminationOfAllocationWithoutUse > > Would a run without Xcomp make sense? Some test methods run a loop for only a few iterations, not sure a run without `-Xcomp` makes much sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2091039980 From roland at openjdk.org Thu May 15 12:35:54 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 15 May 2025 12:35:54 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v6] In-Reply-To: <8BJorsTgiK1pTElabu0NZFko5n4mlpAhadlt87w_v2s=.f19e86d1-d646-46eb-860f-cbbadf37ada3@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <8BJorsTgiK1pTElabu0NZFko5n4mlpAhadlt87w_v2s=.f19e86d1-d646-46eb-860f-cbbadf37ada3@github.com> Message-ID: <4dJ_Ihb3MSWot7pVQ4nvUpHIxjdofuyw1dJyVcEJJt4=.4b5f6691-5891-4d8c-aaff-10fd27264eab@github.com> On Thu, 15 May 2025 12:09:23 GMT, Roland Westrelin wrote: >> An `Initialize` node for an `Allocate` node is created with a memory >> `Proj` of adr type raw memory. In order for stores to be captured, the >> memory state out of the allocation is a `MergeMem` with slices for the >> various object fields/array element set to the raw memory `Proj` of >> the `Initialize` node. If `Phi`s need to be created during later >> transformations from this memory state, The `Phi` for a particular >> slice gets its adr type from the type of the `Proj` which is raw >> memory. If during macro expansion, the `Allocate` is found to have no >> use and so can be removed, the `Proj` out of the `Initialize` is >> replaced by the memory state on input to the `Allocate`. A `Phi` for >> some slice for a field of an object will end up with the raw memory >> state on input to the `Allocate` node. As a result, memory state at >> the `Phi` is incorrect and incorrect execution can happen. >> >> The fix I propose is, rather than have a single `Proj` for the memory >> state out of the `Initialize` with adr type raw memory, to use one >> `Proj` per slice added to the memory state after the `Initalize`. Each >> of the `Proj` should return the right adr type for its slice. For that >> I propose having a new type of `Proj`: `NarrowMemProj` that captures >> the right adr type. >> >> Logic for the construction of the `Allocate`/`Initialize` subgraph is >> tweaked so the right adr type captured in is own `NarrowMemProj` is >> added to the memory sugraph. Code that removes an allocation or moves >> it also has to be changed so it correctly takes the multiple memory >> projections out of the `Initialize` node into account. >> >> One tricky issue is that when EA split types for a scalar replaceable >> `Allocate` node: >> >> 1- the adr type captured in the `NarrowMemProj` becomes out of sync >> with the type of the slices for the allocation >> >> 2- before EA, the memory state for one particular field out of the >> `Initialize` node can be used for a `Store` to the just allocated >> object or some other. So we can have a chain of `Store`s, some to >> the newly allocated object, some to some other objects, all of them >> using the state of `NarrowMemProj` out of the `Initialize`. After >> split unique types, the `NarrowMemProj` is for the slice of a >> particular allocation. So `Store`s to some other objects shouldn't >> use that memory state but the memory state before the `Allocate`. >> >> For that, I added logic to update the adr type of `NarrowMemProj` >> during split uni... > > Roland Westrelin has updated the pull request incrementally with 14 additional commits since the last revision: > > - typo > - more > - more > - more > - more > - more > - more > - more > - more > - review > - ... and 4 more: https://git.openjdk.org/jdk/compare/7afc47e4...af8480c0 I reworked this quite a bit. 1. In line with Emanuel's comment about making `_adr_type` `const` in `NarrowMemProj`, I changed the code in EA so, rather than update in place the `_adr_type` of the existing `NarrowMemProj`s, it creates new ones with the new `_adr_type`. That simplifies things because rather than replace existing `NarrowMemProj`s with new ones (which is what the update in place did in practice), the new code adds extra `NarrowMemProj`s and doesn't remove the existing ones. That, in turn, simplifies the logic elsewhere in EA because there's no need for the memory state that's not specific to a known allocations to be rewired around known allocations. 2. I added asserts to catch cases where `proj_out` is called but the node has more than one matching projection. With those asserts, I caught some false positive/cases where we got lucky and worked around them by reworking the code so it doesn't use `proj_out`. That's the case in `PhaseIdealLoop::intrinsify_fill()`: we can end up there with more than one `FramePtr` projection because the code pattern used elsewhere is to add one more projection and let identical projections common during igvn. Also in `PhaseIdealLoop::fix_ctrl_uses()`, the loop exit can have 3 projections in some cases. That's a transitory state before everything is wired correctly (a cases where we are lucky). 3. I tried to abstract away the many ways we iterate over projections in this patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2883651987 From roland at openjdk.org Thu May 15 12:35:55 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 15 May 2025 12:35:55 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v5] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Fri, 25 Apr 2025 13:06:17 GMT, Roberto Casta?eda Lozano wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/share/opto/escape.cpp >> >> Co-authored-by: Emanuel Peter >> - Update src/hotspot/share/opto/escape.cpp >> >> Co-authored-by: Emanuel Peter > > src/hotspot/share/opto/escape.cpp line 4879: > >> 4877: const TypePtr* at = mem->adr_type(); >> 4878: uint alias_idx = (uint) _compile->get_alias_index(at->is_ptr()); >> 4879: if (idx == i) { > > Could you rename this and all other occurrences of `idx` below to make the changeset buildable again? Sorry @robcasloz I was in the middle of a significant update making the existing PR effectively outdated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2091067387 From roland at openjdk.org Thu May 15 12:35:56 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 15 May 2025 12:35:56 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v5] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Mon, 28 Apr 2025 09:21:44 GMT, Galder Zamarre?o wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/share/opto/escape.cpp >> >> Co-authored-by: Emanuel Peter >> - Update src/hotspot/share/opto/escape.cpp >> >> Co-authored-by: Emanuel Peter > > src/hotspot/share/opto/library_call.cpp line 5554: > >> 5552: if (proj->_con == TypeFunc::Memory) { >> 5553: int alias_idx = C->get_alias_index(proj->adr_type()); >> 5554: assert(alias_idx == Compile::AliasIdxRaw || alias_idx == elemidx || alias_idx == mark_idx || alias_idx == klass_idx, "should be raw memory or array element type"); > > Shouldn't this `assert` be wrapped around an `#ifdef ASSERT` section? `assert` is a nop if `ASSERT` not defined. Does that answer your question? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2091068927 From roland at openjdk.org Thu May 15 12:39:54 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 15 May 2025 12:39:54 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v5] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Mon, 28 Apr 2025 13:20:45 GMT, Roberto Casta?eda Lozano wrote: > Thanks for working on this, Roland! A "dumb" question: could the issue also be addressed by ensuring that dead allocations are removed earlier (e.g. in the call to `PhaseMacroExpand::eliminate_allocate_node` performed as part of escape analysis/scalar replacement, before loop optimizations)? It seems this would also prevent the miscompilations in `TestEliminationOfAllocationWithoutUse`, no? I don't thing that would work (but haven't tried). The problem is that the memory graph is broken around `Initialize` nodes from the time they are added to the IR (parse time) and that's only exposed once they are removed from the graph. But I don't think it matters when they are removed. What you suggest does also sound quite conservative: it could very well be that an allocation looses all its uses after some rounds of optimization but in the scheme you suggest, that allocation wouldn't be optimized out. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2883668589 From roland at openjdk.org Thu May 15 12:39:54 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 15 May 2025 12:39:54 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Wed, 23 Apr 2025 12:36:31 GMT, Emanuel Peter wrote: >>> It would be great if we have union memory slices for this. >> >> Something like that would fix it but it would be trickier to get right that this point fix, I think. Do you see any other use for it? > > @rwestrel You should update the bug title, it just sounds too generic. It is really helpful when the "blame" history gives you a helpful comment rather than `Umbrella` or `incorrect result` or some bug numbers. > > I suggest `Use NarrowMemProj to project / split memory slice after Initialize`, but you probably have an even better idea ;) @eme64 ready for another round of reviews when you have time ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2883669669 From dlunden at openjdk.org Thu May 15 12:42:56 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 15 May 2025 12:42:56 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v7] In-Reply-To: References: Message-ID: <8DoI4IJ99wzvpeejqiC8nXIEpNADQA4vjOKfpXOvYL0=.9da304e1-8b3e-448d-9917-362cd8b93dc2@github.com> On Tue, 29 Apr 2025 04:58:39 GMT, Galder Zamarre?o wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Update after comments from Christian > > src/hotspot/share/opto/gcm.cpp line 889: > >> 887: // since the load will be forced into a block preceding the Phi. >> 888: pred_block->set_raise_LCA_mark(load_index); >> 889: assert(!LCA_orig->dominates(pred_block) || > > Has this assert moved elsewhere? Or do we really want to remove it altogether? Resolving this now, hopefully I've answered your question @galderz! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2091081640 From dlunden at openjdk.org Thu May 15 12:42:57 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 15 May 2025 12:42:57 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v7] In-Reply-To: References: <3xXLZZOHl6oejisEzmNv206aQo4y6FuJoWhsOO_GWqM=.682d7701-baa2-4654-8216-e4de526456d1@github.com> <_jqax2exjj4DnvqP-lVK4kiwJ59C0XS6B8DE6quAHGc=.579945f7-36a6-49f8-9b04-c0fe63f60a5f@github.com> <4n5OGLVPn8sEuDgcJqZ5oKco3N_trnSxHNwyBawRQF4=.fe8ecf59-fb55-49c9-b8da-99efee63dde4@github.com> <21d6dUz886V6-TbWTNyt22KX6UaBNbsfZQz3hnQVjNA=.edb3bbd3-0dba-4e54-92cf-88e680cc5149@github.com> Message-ID: On Wed, 7 May 2025 09:28:37 GMT, Tobias Hartmann wrote: >> Thanks @TobiHartmann. Note that this changeset does remove the assert, and replaces it with another (stronger, in theory) assert. > > Ah right, I missed that. @robcasloz suggested that I simply add the new assert to the commit just before the fix which `test4` is a regression test for (much simpler than trying to revert the fix in mainline), and check if it triggers when the old assert triggers. When I manually disable loop strip mining verification (as Tobias suggested), the new assert triggers, as expected, whenever the old assert triggers. Resolving this thread now! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2091078123 From dlunden at openjdk.org Thu May 15 12:50:57 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 15 May 2025 12:50:57 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v7] In-Reply-To: <36IC7thNmhW7_AN1MsC4XWNGhBAmhlQweEylKPAENiA=.9df24353-0e3e-4092-8f59-7cf28cbbdeb7@github.com> References: <36IC7thNmhW7_AN1MsC4XWNGhBAmhlQweEylKPAENiA=.9df24353-0e3e-4092-8f59-7cf28cbbdeb7@github.com> Message-ID: On Wed, 30 Apr 2025 09:48:39 GMT, Daniel Lund?n wrote: >> src/hotspot/share/opto/gcm.cpp line 912: >> >>> 910: // they CAN write to Java memory. >>> 911: if (muse->ideal_Opcode() == Op_CallStaticJava) { >>> 912: assert(muse->is_MachSafePoint(), ""); >> >> I know there was not assert message before, but can we use the opportunity to add a meaningful message for this assert? There's another empty message assert a few lines before. > > Thanks for the comments @galderz! I do not know the specifics of this particular part of `insert_anti_dependences`. I could add generic assert messages, based on the checks, for the purpose of avoiding empty messages. But I'm not sure those are then meaningful messages. Resolving this as well now; I cannot add assert messages that provide more information than the checks themselves. Let's leave any changes for a future RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2091095671 From dlunden at openjdk.org Thu May 15 12:58:01 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 15 May 2025 12:58:01 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v7] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 11:56:42 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Update after comments from Christian > > Marked as reviewed by rcastanedalo (Reviewer). Thanks for the reviews @robcasloz and @chhagedorn! Only documentation changes since the initial PR commit, and GHA is clean after https://github.com/openjdk/jdk/pull/24926/commits/aec59a1570445c7bcaeef25eb82d00dba7890e2e. Integrating now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24926#issuecomment-2883713352 From dlunden at openjdk.org Thu May 15 12:58:02 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 15 May 2025 12:58:02 GMT Subject: Integrated: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences In-Reply-To: References: Message-ID: <63Icxp79H2FTx5RbawhTAH0_YQ9cVkzxdIu-qwszz_4=.a0d81628-8071-4c5c-8ca2-f6bc9aa12634@github.com> On Mon, 28 Apr 2025 15:28:52 GMT, Daniel Lund?n wrote: > The current documentation for `PhaseCFG::insert_anti_dependences` is difficult to follow and sometimes even misleading. We should ensure the method is appropriately documented. > > ### Changeset > > - Rename `PhaseCFG::insert_anti_dependences` to `PhaseCFG::raise_above_anti_dependences`. The purpose of `PhaseCFG::raise_above_anti_dependences` is twofold: raise the load's LCA so that the load is scheduled before anti-dependent stores, and if necessary add anti-dependence edges between the load and certain anti-dependent stores (to ensure we later "raise" the load before anti-dependent stores in LCM). The name `PhaseCFG::insert_anti_dependences` suggests that we only add anti-dependence edges. The name `PhaseCFG::raise_above_anti_dependences`, therefore, seems more appropriate. > - Significantly add to and revise the source code documentation of `PhaseCFG::raise_above_anti_dependences`. > - Add, move, and revise `assert`s in `PhaseCFG::raise_above_anti_dependences`, including improved `assert` messages in a few places. > - In the main worklist loop of `PhaseCFG::raise_above_anti_dependences`: > - Clean up how we identify the search root (avoid mutation). > - Add a missing early exit for `Phi` nodes when `LCA == early`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/14706896111) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. This pull request has now been integrated. Changeset: 5cb23171 Author: Daniel Lund?n URL: https://git.openjdk.org/jdk/commit/5cb231714f364064bb5a59db8eb07d43823478eb Stats: 337 lines in 7 files changed: 185 ins; 42 del; 110 mod 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences Reviewed-by: rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/24926 From chagedorn at openjdk.org Thu May 15 13:04:04 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 15 May 2025 13:04:04 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v20] In-Reply-To: <8qCb90dePuowml3bmtaa-dWvdY57rYEg1MfFHRIRAro=.da8f8cc2-96aa-486c-a616-dbbdb123a003@github.com> References: <8qCb90dePuowml3bmtaa-dWvdY57rYEg1MfFHRIRAro=.da8f8cc2-96aa-486c-a616-dbbdb123a003@github.com> Message-ID: On Fri, 9 May 2025 12:41:14 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - Emanuel's review > - Christian's review Sorry for the delay, here are some more mostly minor comments. Will make another pass again afterwards. src/hotspot/share/opto/c2_globals.hpp line 838: > 836: \ > 837: product(bool, ShortRunningLongLoop, true, DIAGNOSTIC, \ > 838: "long counted loop/long range checks: don't create loop nest if" \ Suggestion: "long counted loop/long range checks: don't create loop nest if " \ src/hotspot/share/opto/c2_globals.hpp line 839: > 837: product(bool, ShortRunningLongLoop, true, DIAGNOSTIC, \ > 838: "long counted loop/long range checks: don't create loop nest if" \ > 839: "loop runs for small enough number of iterations") \ Suggestion: "loop runs for small enough number of iterations.") \ src/hotspot/share/opto/c2_globals.hpp line 842: > 840: \ > 841: develop(bool, StressShortRunningLongLoop, false, \ > 842: "Speculate all long counted loops are short running when bounds" \ Suggestion: "Speculate all long counted loops are short running when bounds " \ src/hotspot/share/opto/c2_globals.hpp line 843: > 841: develop(bool, StressShortRunningLongLoop, false, \ > 842: "Speculate all long counted loops are short running when bounds" \ > 843: "are unknown even if profile data doesn't say so") \ Suggestion: "are unknown even if profile data doesn't say so.") \ src/hotspot/share/opto/castnode.cpp line 416: > 414: } > 415: } > 416: // if it's a cast created by PhaseIdealLoop::create_loop_nest(), don't transform it until the counted loop is created Maybe be more specific here? Suggestion: // If it's a cast created by PhaseIdealLoop::short_running_loop(), don't transform it until the counted loop is created src/hotspot/share/opto/graphKit.cpp line 4055: > 4053: if (ShortRunningLongLoop) { > 4054: // Will narrow the limit down with a cast node. Predicates added later may depend on the cast so should be last when > 4055: // starting from the loop. Suggestion: // walking up from the loop. src/hotspot/share/opto/loopnode.cpp line 1128: > 1126: class NodeInShortLoopBody : public NodeInLoopBody { > 1127: PhaseIdealLoop* _phase; > 1128: IdealLoopTree* _ilt; The pointers can be made const: Suggestion: PhaseIdealLoop* const _phase; IdealLoopTree* const _ilt; src/hotspot/share/opto/loopnode.cpp line 1172: > 1170: // Only process if we are in the correct Predicate Block. > 1171: return; > 1172: } Do we really need this check? Could we not just clone all Template Assertion Predicates that we find? I think with the recent Assertion Predicate changes, we are sure that all Template Assertion Predicates found belong to this loop. Otherwise, they would already be marked useless and `visit()` is not called on them. src/hotspot/share/opto/loopnode.cpp line 1180: > 1178: > 1179: // If bounds are known, or profile data indicates it runs for a small enough number of iterations, so the loop doesn't > 1180: // need an outer loop, don't create the outer loop I suggest to add more details here from the first paragraph of the PR description where you describe the motivation. Maybe something like: // If the loop is either statically known to run for a small enough number of iterations or if profile data indicates // that, we don't want an outer loop for the following reasons: // // // In the short running case, turn the loop into a regular loop again and transform the long range checks: // - LongCountedLoop: Create LoopNode but keep the loop limit type with a CastLL node to avoid that we later try to // create a Loop Limit Check when turning the LoopNode into a CountedLoopNode. // - CountedLoop: Can be reused. src/hotspot/share/opto/loopnode.cpp line 1191: > 1189: loop->compute_trip_count(this, bt); > 1190: // Loop must run for no more than iter_limits as it guarantees no overflow of scale * iv in long range checks. > 1191: bool known_short_running_loop = head->trip_count() <= iters_limit / ABS(stride_con); Can you also add a comment about the decision of the hardcoded `iters_limit / ABS(stride_con)` limit to indicate a short running long loop? src/hotspot/share/opto/loopnode.cpp line 1198: > 1196: profile_short_running_loop = true; > 1197: } else { > 1198: profile_short_running_loop = !head->is_profile_trip_failed() && head->profile_trip_cnt() < iters_limit / ABS(stride_con); Why do we compare with `<=` above but here with `<`? src/hotspot/share/opto/loopnode.cpp line 1225: > 1223: // Predicate). The current limit could, itself, be dependent on an existing predicate. Clone parse and template > 1224: // assertion predicates below existing predicates to get proper ordering of predicates when walking from the loop > 1225: // up: future predicates, Short Running Long Loop Predicate, existing predicates. Maybe you missed the visualization I've added in a comment for an earlier commit. I would find it quite useful to quickly grasp the idea, what do you think? // // Existing Hoisted // Check Predicates // | // New Short Running Long // Loop Predicate // | // Cloned Parse Predicates and // Template Assertion Predicates // | // Loop src/hotspot/share/opto/loopnode.cpp line 1263: > 1261: #ifndef PRODUCT > 1262: // report that the loop predication has been actually performed > 1263: // for this loop The trace message below already suggests that the predicate was added. So you might want to just remove this comment. Suggestion: src/hotspot/share/opto/loopnode.cpp line 1265: > 1263: // for this loop > 1264: if (TraceLoopLimitCheck) { > 1265: tty->print_cr("Short Loop Check generated:"); Suggestion: tty->print_cr("Short Long Loop Check Predicate generated:"); src/hotspot/share/opto/loopnode.cpp line 1269: > 1267: } > 1268: #endif > 1269: entry_control = head->skip_strip_mined()->in(LoopNode::EntryControl); It looks like this line rather belongs to the `Predicate` on L1275? Might have been moved here by accident. src/hotspot/share/opto/loopnode.cpp line 1273: > 1271: // We're turning a long counted loop into a regular loop that will be converted into an int count loop. That loop > 1272: // won't need loop limit checks (iters_limit guarantees that). Add a cast to make sure that, whatever transformation > 1273: // happens by the time the counted loop is created, c2 knows enough about the loop's limit that it doesn't try to Suggestion: // happens by the time the counted loop is created, C2 knows enough about the loop's limit that it doesn't try to src/hotspot/share/opto/loopnode.cpp line 1290: > 1288: > 1289: Node* int_zero = _igvn.intcon(0); > 1290: set_ctrl(int_zero, C->root()); I think you can just use `PhaseIdealLoop::intcon()` which takes care of `set_ctrl()`: Suggestion: Node* int_zero = intcon(0); src/hotspot/share/opto/loopnode.cpp line 1298: > 1296: // Clone the iv data nodes as an integer iv > 1297: Node* int_stride = _igvn.intcon(stride_con); > 1298: set_ctrl(int_stride, C->root()); Same here: Suggestion: Node* int_stride = intcon(stride_con); src/hotspot/share/opto/loopnode.cpp line 1299: > 1297: Node* int_stride = _igvn.intcon(stride_con); > 1298: set_ctrl(int_stride, C->root()); > 1299: Node* inner_phi = new PhiNode(head, new_phi_t); You only seem to define `new_phi_t` as `TypeInt::INT` and then use it here once. Maybe just inline it? Suggestion: Node* inner_phi = new PhiNode(head, TypeInt::INT); src/hotspot/share/opto/loopnode.cpp line 1302: > 1300: Node* inner_incr = new AddINode(inner_phi, int_stride); > 1301: Node* inner_cmp = nullptr; > 1302: inner_cmp = new CmpINode(inner_incr, new_limit); Can be merged: Suggestion: Node* inner_cmp = new CmpINode(inner_incr, new_limit); src/hotspot/share/opto/loopnode.hpp line 310: > 308: > 309: void set_trip_count(julong tc) { _trip_count = checked_cast(tc); } > 310: julong trip_count() { return _trip_count; } Can be made const: Suggestion: julong trip_count() const { return _trip_count; } src/hotspot/share/opto/loopnode.hpp line 398: > 396: > 397: void set_trip_count(julong tc) { _trip_count = tc; } > 398: julong trip_count() { return _trip_count; } Can be made const: Suggestion: julong trip_count() const { return _trip_count; } src/hotspot/share/opto/predicates.cpp line 945: > 943: tty->print_cr("- Loop Predicate Block:"); > 944: _loop_predicate_block.dump(" "); > 945: tty->print_cr("- Short Running Loop Predicate Block:"); Suggestion: tty->print_cr("- Short Running Long Loop Predicate Block:"); src/hotspot/share/opto/predicates.hpp line 77: > 75: * be added once above the Loop Limit Check Parse Predicate for a loop. > 76: * - Short Short: This predicate is created when a long counted loop is transformed into an int counted > 77: * Running Long loop. In the general, that transformation requires an outer loop to guarantee that the new Suggestion: * Running Long loop. In general, that transformation requires an outer loop to guarantee that the new ------------- PR Review: https://git.openjdk.org/jdk/pull/21630#pullrequestreview-2842765753 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2090598643 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2090599557 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2090598904 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2090599820 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091094942 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091096027 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091097871 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091107235 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2090658362 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091119720 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091051479 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2090861487 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091056745 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091054533 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091078881 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091058908 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091091033 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091091490 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091067560 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091060848 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091091968 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091092991 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2090855070 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2090854689 From chagedorn at openjdk.org Thu May 15 13:04:05 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 15 May 2025 13:04:05 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v16] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 12:10:05 GMT, Roland Westrelin wrote: >> src/hotspot/share/runtime/deoptimization.hpp line 122: >> >>> 120: Reason_short_running_long_loop, // profile reports loop runs for small number of iterations >>> 121: #if INCLUDE_JVMCI >>> 122: Reason_aliasing = Reason_short_running_long_loop, // optimistic assumption about aliasing failed >> >> Why is that required? > > Otherwise, this assert: > > > assert((1 << _reason_bits) >= Reason_LIMIT, "enough bits"); > > > fails. Rather than tweak the allocation of bits to `_action_bits`, `_reason_bits`, `_debug_id_bits`, to extend `_reason_bits`, I thought it was simpler to have c2 and graal share the encoding of a reason given graal doesn't use the new `Reason_short_running_long_loop` and c2 doesn't use the jvmci specific `Reason_aliasing`. Makes sense, thanks for the explanation! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2090856178 From jkarthikeyan at openjdk.org Thu May 15 13:11:56 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 15 May 2025 13:11:56 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v13] In-Reply-To: References: Message-ID: <_mVZnlA3UyELotkqHbaYTv0GeUfUc_Q4rirfUClbje4=.f6b5f69d-6020-4b17-8906-a750d138cd43@github.com> On Mon, 12 May 2025 03:11:52 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: >> >> >> Baseline Patch >> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement >> VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) >> VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) >> VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) >> >> >> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Check for AVX2 for byte/long conversions This is a good point, while testing I experimented with patterns like this: private static short[] testSubwordVector(short[] out, int[] in) { for (int i = 0; i < 512; i++) { out[i] = (short) (((short) in[i]) + (short) in[i]); } return out; } The IR it produces looks like: `StoreC(AddI(RShiftI(LShiftI(LoadI, 16), 16)`. The same thing happens for sign extension as well. I didn't investigate too deeply, but I think the shifts prevent this pattern from vectorizing. The shifts are needed in the scalar IR since we don't have a `AddS` node, but in the future, when translating the IR to the vector graph we could convert the shift pattern into a `VectorCastX2Y` node as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2883759173 From chagedorn at openjdk.org Thu May 15 13:13:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 15 May 2025 13:13:53 GMT Subject: RFR: 8355488: Add stress mode for C2 loop peeling [v3] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 12:12:31 GMT, Marc Chevalier wrote: >> Adding a `StressLoopPeeling` dev flag that randomize peeling. >> >> ## Semantics >> >> For now, the direction I've taken is to randomly take a decision in case of peeling, otherwise, rely on existing heuristics. >> >> This requires to distinguish two things: >> - not inlining because it's not legal: see for instance >> ```cpp >> assert(cl->trip_count() > 0, "peeling a fully unrolled loop"); >> ``` >> in `PhaseIdealLoop::do_peeling` >> - not inlining because it doesn't seem profitable. >> >> Peeling loops without a good reason (not containing an exiting `If` whose condition is not a member of the loop) but without a concrete way to forbid it should always be allowed. Let's stress it! >> >> Peeling too many times is not a great idea either. It uses a lot of memory, of nodes... Also, it may prevent other optimisations from kicking in. And what about interaction with future stress flags? Let's limit peeling: we give a fixed number of opportunities to peel before we give up on peeling for good. That is not the same as limiting the amount of peeling we do. Indeed, if we bound the number of times we say "yes, please, peel" given enough requests, we will always reach the bound. If we limit the number of requests, we have a more evenly distributed amount of peeling, between 0 and the bound. >> >> I've tried without the bound: I couldn't find any bug without the bound that would not reproduce with the bound. It only save some legitimate memory problems. Without a bound on the number of peeling opportunities, hotspot eats a lot of memory, but all the allocations seems reasonable: it just seems we ask too much. We could limit the number of nodes, to prevent peeling before we reach the memory limit, but that would also hinder other optimizations and (future) stress flags. >> >> >> >> ## The Flag >> >> The flag is very specialized, unlike a `StressLoopOpts` would be. My idea so far is "let's see". My idea is that it's good to be able to enable stress optimizations selectively, and have a flag like `StressLoopOpts` that would turn them all: we could use the general one in testing, and the finer-grain ones when debugging. A reason for that is that I don't see a real use-case for stressing some features but not others (which would make the number of combinations explode): having (for instance) `+StressLoopUnrolling +StressLoopPeeling` would sometimes behave like `+StressLoopUnrolling -StressLoopPeeling`, and so it's not very useful to test the latter. >> >> But once again: let'... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Back to PRODUCT for consistency One suggestion, otherwise, it looks good to me! Worth mentioning, this stress flag already found a bug: [JDK-8356084](https://bugs.openjdk.org/browse/JDK-8356084) src/hotspot/share/opto/loopTransform.cpp line 523: > 521: // In case of stress, let's just pick randomly... > 522: return phase->C->random() % 2 == 0 ? estimate : 0; > 523: } Nice, looks much better :-) `_peeling_opportunities_count` could be confused with real opportunities without stressing. Maybe we can rename it to `_stress_peeling_opportunities`? ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25140#pullrequestreview-2843635214 PR Review Comment: https://git.openjdk.org/jdk/pull/25140#discussion_r2091136460 From epeter at openjdk.org Thu May 15 13:18:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 May 2025 13:18:02 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v13] In-Reply-To: References: Message-ID: <3bpbyYd0EThEBJwkRk4FkijqvN_4YHm_QuOIiYw5234=.b4d12b58-0483-4ceb-878d-be6c972f8f85@github.com> On Mon, 12 May 2025 03:11:52 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: >> >> >> Baseline Patch >> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement >> VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) >> VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) >> VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) >> >> >> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Check for AVX2 for byte/long conversions > This is a good point, while testing I experimented with patterns like this: > > ```java > private static short[] testSubwordVector(short[] out, int[] in) { > for (int i = 0; i < 512; i++) { > out[i] = (short) (((short) in[i]) + (short) in[i]); > } > > return out; > } > ``` > > The IR it produces looks like: `StoreC(AddI(RShiftI(LShiftI(LoadI, 16), 16)`. The same thing happens for sign extension as well. I didn't investigate too deeply, but I think the shifts prevent this pattern from vectorizing. The shifts are needed in the scalar IR since we don't have a `AddS` node, but in the future, when translating the IR to the vector graph we could convert the shift pattern into a `VectorCastX2Y` node as well. I suppose there are 2 options here, when vectorizing: - Cast between `short <-> int`, do the add in `int`. - Somehow detect that this is an "`AddS`" in the type analysis phase of SuperWord. And then hack the graph so that we do not need the shifts. This would be more complicated, but might give us better results in the end. Is there already an RFE for this? If not, would you mind filing one? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2883775942 From thartmann at openjdk.org Thu May 15 13:22:02 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 15 May 2025 13:22:02 GMT Subject: RFR: 8355488: Add stress mode for C2 loop peeling [v3] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 12:12:31 GMT, Marc Chevalier wrote: >> Adding a `StressLoopPeeling` dev flag that randomize peeling. >> >> ## Semantics >> >> For now, the direction I've taken is to randomly take a decision in case of peeling, otherwise, rely on existing heuristics. >> >> This requires to distinguish two things: >> - not inlining because it's not legal: see for instance >> ```cpp >> assert(cl->trip_count() > 0, "peeling a fully unrolled loop"); >> ``` >> in `PhaseIdealLoop::do_peeling` >> - not inlining because it doesn't seem profitable. >> >> Peeling loops without a good reason (not containing an exiting `If` whose condition is not a member of the loop) but without a concrete way to forbid it should always be allowed. Let's stress it! >> >> Peeling too many times is not a great idea either. It uses a lot of memory, of nodes... Also, it may prevent other optimisations from kicking in. And what about interaction with future stress flags? Let's limit peeling: we give a fixed number of opportunities to peel before we give up on peeling for good. That is not the same as limiting the amount of peeling we do. Indeed, if we bound the number of times we say "yes, please, peel" given enough requests, we will always reach the bound. If we limit the number of requests, we have a more evenly distributed amount of peeling, between 0 and the bound. >> >> I've tried without the bound: I couldn't find any bug without the bound that would not reproduce with the bound. It only save some legitimate memory problems. Without a bound on the number of peeling opportunities, hotspot eats a lot of memory, but all the allocations seems reasonable: it just seems we ask too much. We could limit the number of nodes, to prevent peeling before we reach the memory limit, but that would also hinder other optimizations and (future) stress flags. >> >> >> >> ## The Flag >> >> The flag is very specialized, unlike a `StressLoopOpts` would be. My idea so far is "let's see". My idea is that it's good to be able to enable stress optimizations selectively, and have a flag like `StressLoopOpts` that would turn them all: we could use the general one in testing, and the finer-grain ones when debugging. A reason for that is that I don't see a real use-case for stressing some features but not others (which would make the number of combinations explode): having (for instance) `+StressLoopUnrolling +StressLoopPeeling` would sometimes behave like `+StressLoopUnrolling -StressLoopPeeling`, and so it's not very useful to test the latter. >> >> But once again: let'... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Back to PRODUCT for consistency Thanks for making these changes. Looks good to me, I just added some minor suggestions. src/hotspot/share/opto/c2_globals.hpp line 838: > 836: \ > 837: develop(bool, StressLoopPeeling, false, \ > 838: "Randomize peeling decision") \ Suggestion: "Randomize loop peeling decision") \ src/hotspot/share/opto/loopTransform.cpp line 522: > 520: loop_head->_peeling_opportunities_count++; > 521: // In case of stress, let's just pick randomly... > 522: return phase->C->random() % 2 == 0 ? estimate : 0; Suggestion: return ((phase->C->random() % 2) == 0) ? estimate : 0; src/hotspot/share/opto/loopnode.hpp line 137: > 135: > 136: #ifndef PRODUCT > 137: uint _peeling_opportunities_count = 0; I think this should rather be named `_stress_peeling_attempts` or something. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25140#pullrequestreview-2843482129 PR Review Comment: https://git.openjdk.org/jdk/pull/25140#discussion_r2091156892 PR Review Comment: https://git.openjdk.org/jdk/pull/25140#discussion_r2091043445 PR Review Comment: https://git.openjdk.org/jdk/pull/25140#discussion_r2091155929 From thartmann at openjdk.org Thu May 15 13:22:05 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 15 May 2025 13:22:05 GMT Subject: RFR: 8355488: Add stress mode for C2 loop peeling [v3] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 12:13:58 GMT, Marc Chevalier wrote: >> Should I use `#ifdef ASSERT`? My thinking is that I want the code to be there whenever the flag is available. This is decided here: >> https://github.com/openjdk/jdk/blob/a989245a2424d136f5d2a828eda666c3867b0f48/src/hotspot/share/runtime/flags/jvmFlag.cpp#L552-L556 >> (called from `JVMFlag::find_flag` with `return_flag == false`, from `Arguments::find_jvm_flag`, from `Arguments::parse_argument` from `Arguments::process_argument`) with >> https://github.com/openjdk/jdk/blob/a989245a2424d136f5d2a828eda666c3867b0f48/src/hotspot/share/runtime/flags/jvmFlag.cpp#L60-L66 >> >> My thinking is that it's unintuitive to me to offer a flag that could have no effect. Why don't we want it in optimized non-product build? We can still stress-peel and even if we won't hit an assert, we could still observe unexpected behaviors or crashes. Does that make sense? > > As discussed, I came back to `PRODUCT` for consistency. Thanks for the details! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25140#discussion_r2091158677 From epeter at openjdk.org Thu May 15 13:25:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 May 2025 13:25:21 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v23] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: The big refactoring ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/ed1eead6..3bc12ca9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=21-22 Stats: 1014 lines in 15 files changed: 381 ins; 369 del; 264 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Thu May 15 13:28:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 May 2025 13:28:58 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v3] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 13 May 2025 17:40:40 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Extend comments in zLoadP implementations to explain role of reload Two small nits/questions, but otherwise ready from my side :) src/hotspot/share/opto/lcm.cpp line 80: > 78: > 79: void PhaseCFG::move_node_and_its_projections_to_block(Node* n, Block* b) { > 80: assert(n->bottom_type() != Type::CONTROL, "cannot move control node"); I usually check `n->is_CFG()`. What is the bottom type of an `IfNode`? `virtual const Type *bottom_type() const { return TypeTuple::IFBOTH; }` Are you aware of that? src/hotspot/share/opto/lcm.cpp line 97: > 95: > 96: void PhaseCFG::ensure_node_is_at_block_or_above(Node* n, Block* b) { > 97: assert(n->bottom_type() != Type::CONTROL, "cannot move control node"); Same question here test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java line 40: > 38: different GC memory accesses. > 39: * @library /test/lib / > 40: * @run driver compiler.gcbarriers.TestImplicitNullChecks I suppose you could still have a special run with `ZGC` and one with `G1GC`. But not sure if that is worth it, or if we do that in higher tiers anyway? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25066#pullrequestreview-2843684261 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2091166322 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2091166890 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2091170584 From epeter at openjdk.org Thu May 15 13:34:16 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 May 2025 13:34:16 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v24] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix small things ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/3bc12ca9..61b2119b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=22-23 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Thu May 15 13:34:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 May 2025 13:34:17 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v17] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 13:13:08 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Review suggestions by Roberto >> >> Co-authored-by: Roberto Casta?eda Lozano > > Thank you, Emanuel, for working on this! I'm already looking forward to using it. > > I did a superficial pass to get an overview and gain understanding. Apart from some typos, my main concern is reproducability with the randomness introduced in `NameSet`. @mhaessig @chhagedorn @robcasloz I have completed the refactoring. It should be ready for re-review ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2883831325 From epeter at openjdk.org Thu May 15 13:37:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 May 2025 13:37:40 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v25] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/61b2119b..a4efe454 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=23-24 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From chagedorn at openjdk.org Thu May 15 13:38:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 15 May 2025 13:38:01 GMT Subject: RFR: 8353638: C2: deoptimization and re-execution cycle with StringBuilder In-Reply-To: References: Message-ID: <0p1F5aXmj1Uhdz1FqRjjrzRpQt6akyez77gHr-cuqZE=.17c3fb5e-28f4-4240-819d-cf22e2053d8c@github.com> On Fri, 9 May 2025 14:57:54 GMT, Marc Chevalier wrote: > Unlike what was assumed at first, it is quite different from [JDK-8346989](https://bugs.openjdk.org/browse/JDK-8346989). The problem is actually unrelated to `StringBuilder`, but has to do with the underlying array allocation. > > Here, the problem is that the array allocation function, that is throwing when given a negative length, causes a deopt rather than using the compiled exception handlers. This is an old workaround, and the flag `StressCompiledExceptionHandlers` to rather use compiled handlers instead of deopting was added in [JDK-8004741](https://bugs.openjdk.org/browse/JDK-8004741) in 2012. This flag is used in testing since october 2022. > > So maybe it's time to use the compiled exception handlers! I propose to turn them on by default, and instead, add a diagnostic flag to deopt instead, in case something goes wrong. Doing so improve the performance to match the ones of C1 (both for direct array allocation, and `StringBuilder` construction). For instance, with the case given in the JBS issue: > > Stop at level 0 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,277s > user 0m4,214s > sys 0m0,117s > > Stop at level 1 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,104s > user 0m4,079s > sys 0m0,106s > > Stop at level 2 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,308s > user 0m4,239s > sys 0m0,145s > > Stop at level 3 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,304s > user 0m4,247s > sys 0m0,132s > > Default (Stop at level 4) > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,086s > user 0m4,059s > sys 0m0,122s > > > > I've run some tests (up to tier10), it seems all fine, ignoring the usual noise. I've checked with @dougxc, it shouldn't impact Graal as it doesn't use `OptoRuntime`. Looks reasonable to me. src/hotspot/share/runtime/globals.hpp line 655: > 653: product(bool, DeoptimizeOnAllocationException, false, DIAGNOSTIC, \ > 654: "Deoptimize on exception during allocation instead of using the" \ > 655: " compiled exception handlers") \ For consistency with other flag definitions: Suggestion: product(bool, DeoptimizeOnAllocationException, false, DIAGNOSTIC, \ "Deoptimize on exception during allocation instead of using the " \ "compiled exception handlers") \ ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25149#pullrequestreview-2843683815 PR Review Comment: https://git.openjdk.org/jdk/pull/25149#discussion_r2091166029 From chagedorn at openjdk.org Thu May 15 13:44:55 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 15 May 2025 13:44:55 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function [v4] In-Reply-To: <1AMdU-khBdc9AMeh3PxdmDPLAKvNdEggLO0478nxODw=.23a032ef-1081-4e88-b65f-e075023e5905@github.com> References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> <1AMdU-khBdc9AMeh3PxdmDPLAKvNdEggLO0478nxODw=.23a032ef-1081-4e88-b65f-e075023e5905@github.com> Message-ID: On Fri, 9 May 2025 02:45:48 GMT, kuaiwei wrote: >> I wrote a test to check if every C2 IR node has correct size_of() function. And I found some of them are missed. They added new fields and not add size_of() to reflect new size. In linux, it does not cause issue so far, because gcc allocate more space for alignment and can keep these additional `bool` flags. But it will report failure on windows. And if anyone modified base class, it will cause problem. >> >> PS, My test is in https://github.com/openjdk/jdk/compare/master...kuaiwei:jdk:test/check_node_size , but it has many hack on IR nodes to make test to run. > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Remove cmp()/hash() for Opaque node Two minor comments, otherwise looks good.Thanks for double-checking the occurrences again! src/hotspot/share/opto/intrinsicnode.hpp line 202: > 200: virtual const Type* Value(PhaseGVN* phase) const; > 201: virtual uint size_of() const { return sizeof(EncodeISOArrayNode); } > 202: virtual uint hash() const { return Node::hash() + ascii; } Was like that before but since you're touching the code now, can you also add a leading `_` for the `ascii` field? src/hotspot/share/opto/machnode.hpp line 546: > 544: > 545: private: > 546: bool _do_polling; While touching this class, maybe move this field declaration up to the start of the class. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25081#pullrequestreview-2843729696 PR Review Comment: https://git.openjdk.org/jdk/pull/25081#discussion_r2091195536 PR Review Comment: https://git.openjdk.org/jdk/pull/25081#discussion_r2091199648 From chagedorn at openjdk.org Thu May 15 13:51:54 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 15 May 2025 13:51:54 GMT Subject: RFR: 8355488: Add stress mode for C2 loop peeling [v3] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 13:18:15 GMT, Tobias Hartmann wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Back to PRODUCT for consistency > > src/hotspot/share/opto/loopnode.hpp line 137: > >> 135: >> 136: #ifndef PRODUCT >> 137: uint _peeling_opportunities_count = 0; > > I think this should rather be named `_stress_peeling_attempts` or something. I thought about proposing "attempts" as well. But it sounds like we are actually trying loop peeling and then somehow fail which is not the case here - we just flipped a coin and then decided not to do peeling. Anyway, I don't have a strong opinion here :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25140#discussion_r2091222142 From epeter at openjdk.org Thu May 15 14:14:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 May 2025 14:14:56 GMT Subject: RFR: 8355512: Test compiler/vectorization/TestVectorZeroCount.java times out with -XX:TieredStopAtLevel=3 In-Reply-To: References: Message-ID: <_ZX5WU8JRDjnkJg7YiE29_Pfzp6LhgpWxKBAfo9rKBE=.a7fc5946-b082-43f3-9ba6-e256304e43e3@github.com> On Thu, 15 May 2025 02:29:11 GMT, Jasmine Karthikeyan wrote: > Hi all, > This is a small patch to TestVectorZeroCount to make it only execute when C2 is enabled, to fix a timeout with -XX:TieredStopAtLevel=3. This test takes a long time to finish without C2 because it iterates through all of the integers twice. Since the intention of the test is to stress the C2-specific `numberOfLeadingZeros` and `numberOfTrailingZeros` intrinsics, I think it makes sense to limit it to running with C2 only. > > Reviews would be appreciated! @jaskarth @chhagedorn Hmm, this means that this test could not be run with GraalVM, for example. That's a shame. Do we not have some way to assert that there should be "some fast compiler"? Ah, what does `vm.flavor == "server"` do? Because I have seen lines like `vm.flavor == "server" & !vm.graal.enabled` before. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25243#issuecomment-2883965797 From epeter at openjdk.org Thu May 15 14:18:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 May 2025 14:18:52 GMT Subject: RFR: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry [v2] In-Reply-To: References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: On Mon, 12 May 2025 12:17:11 GMT, Jatin Bhateja wrote: >> PR adds missing EVEX compressed displacement attributes used for computing the scale factor (N) of compressed displacement. >> AVX512 memory operand instructions use compressed disp8 encoding if the displacement is a multiple of scale (N), which depends on Vector Length, embedded broadcasting, and lane size. Please refer to section 2.7.5 of Intel SDM for more details. >> >> e.g., Consider two instructions, one with displacement 0x10203040 and the other with displacement 0x40, instruction operates over full 64-byte vector hence scale N = 64. Displacement of latter instruction is a multiple of scale, thus can be represented by 1 byte displacement encoding, while the former requires 4 bytes to represent displacement in instruction encoding. >> >> >> 1) vpternlogq $0xff,0x10203040(%r20,%r21,8),%zmm23,%zmm24 >> EVEX OP MR SIB DISP IMM >> --------------|----|----|----|---------------|-----| >> 62 6b c1 40 25 84 ec 40 30 20 10 ff >> >> 2) vpternlogq $0xff,0x40(%r20,%r21,8),%zmm23,%zmm24 >> For full vector width operation, scalar matches with vector size, hence scale N = 64 >> effective displacement / compressed DISP8 = OFFSET(64) / 64 = 0x1 >> EVEX OP MR SIB DISP IMM >> -------------|----|---|---|-----------|---| >> 62 6b c1 40 25 44 ec 01 ff >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Correcting tuple types in some assembler routines The fix looks reasonable to me. But I don't understand the `x64` change, so we need some from Intel to review here. The Java tests look ok to me :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25021#pullrequestreview-2843873718 From epeter at openjdk.org Thu May 15 14:18:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 May 2025 14:18:53 GMT Subject: RFR: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry In-Reply-To: References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: On Mon, 12 May 2025 12:11:36 GMT, Jatin Bhateja wrote: >> @jatin-bhateja I'll run some internal testing, please ping me in 24h for results! :) > >> @jatin-bhateja I'll run some internal testing, please ping me in 24h for results! :) > > Please use the latest version @jatin-bhateja Tests for commit 2 / v01 have all passed :green_circle: ------------- PR Comment: https://git.openjdk.org/jdk/pull/25021#issuecomment-2883970779 From rcastanedalo at openjdk.org Thu May 15 14:26:40 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 15 May 2025 14:26:40 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v4] In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Remove comment that is only applicable to x64, not aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25066/files - new: https://git.openjdk.org/jdk/pull/25066/files/20d960e6..a52b0730 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25066.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25066/head:pull/25066 PR: https://git.openjdk.org/jdk/pull/25066 From epeter at openjdk.org Thu May 15 14:31:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 15 May 2025 14:31:14 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v26] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Manuel: link Strings consistently ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/a4efe454..76cbd833 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=24-25 Stats: 26 lines in 2 files changed: 0 ins; 0 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From dlunden at openjdk.org Thu May 15 14:36:58 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 15 May 2025 14:36:58 GMT Subject: RFR: 8354767: Test crashed: assert(increase < max_live_nodes_increase_per_iteration) failed: excessive live node increase in single iteration of IGVN: 4470 (should be at most 4000) In-Reply-To: <0I8vVoHwJofrSc2QzgtYPp965OS3GoNg_mQMvmCZfh0=.cf916f04-fd03-4429-bd9f-6f77abe0b3b0@github.com> References: <15ATYTrX3CtTnuj-s2Z84wMZNwpo9Qve0OTxnwYVVYU=.82ace3c4-08c3-45e9-ab12-c71e6bc37d93@github.com> <0I8vVoHwJofrSc2QzgtYPp965OS3GoNg_mQMvmCZfh0=.cf916f04-fd03-4429-bd9f-6f77abe0b3b0@github.com> Message-ID: On Wed, 7 May 2025 20:09:15 GMT, Emanuel Peter wrote: >> Thanks for the review @robcasloz! >> >> @eme64 >>> @dlunde Ok, then let's declare this as a "quickfix", and file a follow-up RFE. Maybe it should also be declared a lower-priority bug? >> >> Sounds good to me. Yes, definitely lower priority for now. We do not even know if the transformation is expected or not, although I agree it looks suspicious. > > @dlunde Approved, with the assumption that you will file that follow-up RFE and link it with this issue :) @eme64 (and others) for reference: https://bugs.openjdk.org/browse/JDK-8357055 ------------- PR Comment: https://git.openjdk.org/jdk/pull/24960#issuecomment-2884051800 From roland at openjdk.org Thu May 15 14:46:14 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 15 May 2025 14:46:14 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v21] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request incrementally with six additional commits since the last revision: - Update src/hotspot/share/opto/predicates.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/predicates.hpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/c2_globals.hpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/c2_globals.hpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/c2_globals.hpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/c2_globals.hpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21630/files - new: https://git.openjdk.org/jdk/pull/21630/files/223c9481..b0129598 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=19-20 Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From mhaessig at openjdk.org Thu May 15 14:51:03 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 15 May 2025 14:51:03 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v25] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 13:37:40 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix typo Thank you for the refactoring and your patience. I like the result and its simplicity a lot. I found a few typos, but otherwise it looks excellent. test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 47: > 45: /** > 46: * There can be at most one Renderer instance at any time. This is to avoid that users accidentally > 47: * render templates to strings, rather than letting them all render together. Suggestion: * There can be at most one Renderer instance at any time. This is to avoid users accidentally * rendering templates to separate strings, rather than letting them all render together. I do not understand the original sentence. My suggestion reflects what I understood. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 255: > 253: * > 254: * @param a The value for the (first) argument. > 255: * @return The template its argument applied. Suggestion: * @return The template with its argument applied. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 303: > 301: * @param a The value for the first argument. > 302: * @param b The value for the second argument. > 303: * @return The template all (two) arguments applied. Suggestion: * @return The template with all (two) arguments applied. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 378: > 376: * @param b The value for the second argument. > 377: * @param c The value for the third argument. > 378: * @return The template all (three) arguments applied. Suggestion: * @return The template with all (three) arguments applied. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 453: > 451: * @param Type of the (first) argument. > 452: * @param arg0Name The name of the (first) argument for hashtag replacement. > 453: * @return An {@link Template} with one argument. Suggestion: * @return A {@link Template} with one argument. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 672: > 670: > 671: /** > 672: * Weight the {@link Name}s for the specified {@link Name.Type}. Suggestion: * Weigh the {@link Name}s for the specified {@link Name.Type}. I think here you want the verb? test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 142: > 140: // Optimal would have been Java String Templates, but since those do not > 141: // currently exist, we use hashtag replacements. These can be either > 142: // defined by capturing arguments as string names, or by a "let" definition, Suggestion: // defined by capturing arguments as string names, or by using a "let" definition, There is a verb missing here. ------------- Marked as reviewed by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2843790505 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2091243038 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2091261403 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2091260895 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2091259872 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2091264291 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2091233289 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2091295497 From mdoerr at openjdk.org Thu May 15 14:58:57 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 15 May 2025 14:58:57 GMT Subject: RFR: 8356778: Compiler add event logging in case of failures In-Reply-To: <2ADILKC05CmadEbbFmEJ9HIrkDEY0mfPc2XkumnuGMI=.3935341f-d8ef-4702-8b84-9aa4c7c36c2c@github.com> References: <2ADILKC05CmadEbbFmEJ9HIrkDEY0mfPc2XkumnuGMI=.3935341f-d8ef-4702-8b84-9aa4c7c36c2c@github.com> Message-ID: On Mon, 12 May 2025 17:56:47 GMT, Matthias Baesken wrote: > We should add event logging to some related hotspot methods. While testing this functionality it turned out that sometimes the CompileTask pointer is 0, so this needs to be check to avoid crashes. src/hotspot/share/c1/c1_Compilation.cpp line 651: > 649: assert(msg != nullptr, "bailout message must exist"); > 650: // record the bailout for hserr envlog > 651: if (msg != nullptr) { How can it be nullptr? All callers pass some message. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25188#discussion_r2091391123 From roland at openjdk.org Thu May 15 14:59:18 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 15 May 2025 14:59:18 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v22] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request incrementally with 11 additional commits since the last revision: - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/graphKit.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.hpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.hpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - ... and 1 more: https://git.openjdk.org/jdk/compare/b0129598...2164c15f ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21630/files - new: https://git.openjdk.org/jdk/pull/21630/files/b0129598..2164c15f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=20-21 Stats: 16 lines in 4 files changed: 0 ins; 5 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From adinn at openjdk.org Thu May 15 15:22:00 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 15 May 2025 15:22:00 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v6] In-Reply-To: References: Message-ID: <78E_Q21KGNiVZ1EC_AeoZQYufPqYWnWkHJSoptXBnMY=.c2287602-9a6d-4ff1-9e47-297a11f82d46@github.com> On Tue, 13 May 2025 18:03:09 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Update test to make it more resilient > > Signed-off-by: Ashutosh Mehra > - Remove more unused code > > Signed-off-by: Ashutosh Mehra > - Fix whitespace issue. Remove unused code. > > Signed-off-by: Ashutosh Mehra > - Add test for using AOTCodeCache with different CompressedOops > configuration > > Signed-off-by: Ashutosh Mehra > - Add check for compressed oops base address; minor refacotring > > Signed-off-by: Ashutosh Mehra > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - ... and 8 more: https://git.openjdk.org/jdk/compare/d1543429...5d7c3aa0 src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 2188: > 2186: #endif > 2187: const char* name = SharedRuntime::stub_name(SharedStubId::deopt_id); > 2188: CodeBlob* blob = AOTCodeCache::load_code_blob(AOTCodeEntry::SharedBlob, (uint)SharedStubId::deopt_id, name); For most shared blobs we have a single entry at offset 0. However, for the deopt blob we have 3 (or 5) extra entry points which are embedded in the deopt blob as field values. Saving and restoring the blob internal state removes the need to pass a count and array of entry offsets into load_code_blob and store_code_blob at this point. That makes we wonder if we need to do the same with AdapterBlob. If we embedded the offsets that are currently stored in AdapterHandlerEntry into AdapterBlob then we could also avoid having to explicitly pass the count and array of offsets at AOT load/store for adapters. The getters in AdapterHandlerEntry could fetch them indirectly or else the entry could cache them locally when it is initialized depending on whether we care about a memory indirection. Maybe this would make things more uniform? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2091438273 From bkilambi at openjdk.org Thu May 15 15:30:43 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 15 May 2025 15:30:43 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v3] In-Reply-To: References: Message-ID: > This patch adds aarch64 backend (both Neon and SVE) for FP16 vector operations - add, mul, sub, div, min, max, sqrt and fma. > > Testing: > JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) pass on aarch64 which also includes the JTREG test to test the FP16 vector operations - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Remove additional spaces in the aarch64_vector_ad.m4 file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25096/files - new: https://git.openjdk.org/jdk/pull/25096/files/080a0ce8..31103f69 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25096&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25096&range=01-02 Stats: 26 lines in 2 files changed: 0 ins; 0 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/25096.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25096/head:pull/25096 PR: https://git.openjdk.org/jdk/pull/25096 From bkilambi at openjdk.org Thu May 15 15:30:43 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 15 May 2025 15:30:43 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v2] In-Reply-To: References: Message-ID: <3GuaBfrDNFRVR8oR64xeRUJpEr_FgpxViP0AW9uxpFU=.686ee41f-8bbc-41e5-aa3c-a683fa78bbc6@github.com> On Wed, 14 May 2025 08:07:00 GMT, Bhavana Kilambi wrote: >> src/hotspot/cpu/aarch64/aarch64_vector_ad.m4 line 553: >> >>> 551: >>> 552: // vector add - predicated >>> 553: BINARY_OP_PREDICATE(vaddB, AddVB, sve_add, B) >> >> stylistic nit: remove the extra spaces. >> >> In the initial commit, this extra space is added due to `vaddHF`. However, in the latest commit the predicated Float16 rules are removed. > > Good catch! I'll update a PS soon. Thanks! Done. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25096#discussion_r2091457179 From roland at openjdk.org Thu May 15 15:31:01 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 15 May 2025 15:31:01 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v20] In-Reply-To: References: <8qCb90dePuowml3bmtaa-dWvdY57rYEg1MfFHRIRAro=.da8f8cc2-96aa-486c-a616-dbbdb123a003@github.com> Message-ID: On Thu, 15 May 2025 12:23:26 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - Emanuel's review >> - Christian's review > > src/hotspot/share/opto/loopnode.cpp line 1198: > >> 1196: profile_short_running_loop = true; >> 1197: } else { >> 1198: profile_short_running_loop = !head->is_profile_trip_failed() && head->profile_trip_cnt() < iters_limit / ABS(stride_con); > > Why do we compare with `<=` above but here with `<`? Right. No reason it's not the same. > src/hotspot/share/opto/loopnode.cpp line 1225: > >> 1223: // Predicate). The current limit could, itself, be dependent on an existing predicate. Clone parse and template >> 1224: // assertion predicates below existing predicates to get proper ordering of predicates when walking from the loop >> 1225: // up: future predicates, Short Running Long Loop Predicate, existing predicates. > > Maybe you missed the visualization I've added in a comment for an earlier commit. I would find it quite useful to quickly grasp the idea, what do you think? > > > // > // Existing Hoisted > // Check Predicates > // | > // New Short Running Long > // Loop Predicate > // | > // Cloned Parse Predicates and > // Template Assertion Predicates > // | > // Loop I must have missed it. Sorry about that. > src/hotspot/share/opto/loopnode.cpp line 1269: > >> 1267: } >> 1268: #endif >> 1269: entry_control = head->skip_strip_mined()->in(LoopNode::EntryControl); > > It looks like this line rather belongs to the `Predicate` on L1275? Might have been moved here by accident. I don't think that's the case. Predicates were added so `entry_control` needs to be refreshed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091458756 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091452769 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091455479 From asmehra at openjdk.org Thu May 15 15:33:57 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 15 May 2025 15:33:57 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v6] In-Reply-To: <78E_Q21KGNiVZ1EC_AeoZQYufPqYWnWkHJSoptXBnMY=.c2287602-9a6d-4ff1-9e47-297a11f82d46@github.com> References: <78E_Q21KGNiVZ1EC_AeoZQYufPqYWnWkHJSoptXBnMY=.c2287602-9a6d-4ff1-9e47-297a11f82d46@github.com> Message-ID: On Thu, 15 May 2025 15:19:30 GMT, Andrew Dinn wrote: >> Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: >> >> - Merge branch 'master' into preserve-runtime-blobs-master >> - Address Vladimir's comments >> >> Signed-off-by: Ashutosh Mehra >> - Update test to make it more resilient >> >> Signed-off-by: Ashutosh Mehra >> - Remove more unused code >> >> Signed-off-by: Ashutosh Mehra >> - Fix whitespace issue. Remove unused code. >> >> Signed-off-by: Ashutosh Mehra >> - Add test for using AOTCodeCache with different CompressedOops >> configuration >> >> Signed-off-by: Ashutosh Mehra >> - Add check for compressed oops base address; minor refacotring >> >> Signed-off-by: Ashutosh Mehra >> - Merge branch 'master' into preserve-runtime-blobs-master >> - Address Vladimir's comments >> >> Signed-off-by: Ashutosh Mehra >> - Remove irrelevant comment >> >> Signed-off-by: Ashutosh Mehra >> - ... and 8 more: https://git.openjdk.org/jdk/compare/d1543429...5d7c3aa0 > > src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 2188: > >> 2186: #endif >> 2187: const char* name = SharedRuntime::stub_name(SharedStubId::deopt_id); >> 2188: CodeBlob* blob = AOTCodeCache::load_code_blob(AOTCodeEntry::SharedBlob, (uint)SharedStubId::deopt_id, name); > > For most shared blobs we have a single entry at offset 0. However, for the deopt blob we have 3 (or 5) extra entry points which are embedded in the deopt blob as field values. Saving and restoring the blob internal state removes the need to pass a count and array of entry offsets into load_code_blob and store_code_blob at this point. > That makes we wonder if we need to do the same with AdapterBlob. If we embedded the offsets that are currently stored in AdapterHandlerEntry into AdapterBlob then we could also avoid having to explicitly pass the count and array of offsets at AOT load/store for adapters. The getters in AdapterHandlerEntry could fetch them indirectly or else the entry could cache them locally when it is initialized depending on whether we care about a memory indirection. Maybe this would make things more uniform? yup, I agree and I have the similar idea of storing entry points in AdapterBlob just like DeoptimizationBlob. Currently the pointer to AdapterBlob is not maintained anywhere. So the AdapterHandlerEntry would also have to maintain pointer to AdapterBlob. I was actually wondering if we can eliminate AdapterHandlerEntry by also storing AdapterFingerprint in the AdapterBlob. The Method can then have a pointer to AdapterBlob. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2091465626 From roland at openjdk.org Thu May 15 15:35:01 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 15 May 2025 15:35:01 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v20] In-Reply-To: References: <8qCb90dePuowml3bmtaa-dWvdY57rYEg1MfFHRIRAro=.da8f8cc2-96aa-486c-a616-dbbdb123a003@github.com> Message-ID: On Thu, 15 May 2025 12:53:56 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - Emanuel's review >> - Christian's review > > src/hotspot/share/opto/loopnode.cpp line 1172: > >> 1170: // Only process if we are in the correct Predicate Block. >> 1171: return; >> 1172: } > > Do we really need this check? Could we not just clone all Template Assertion Predicates that we find? I think with the recent Assertion Predicate changes, we are sure that all Template Assertion Predicates found belong to this loop. Otherwise, they would already be marked useless and `visit()` is not called on them. Well, I trust you on that. Things have changed quite a bit recently with Assertion Predicates and it's hard to keep up! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091466589 From roland at openjdk.org Thu May 15 15:39:59 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 15 May 2025 15:39:59 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v20] In-Reply-To: References: <8qCb90dePuowml3bmtaa-dWvdY57rYEg1MfFHRIRAro=.da8f8cc2-96aa-486c-a616-dbbdb123a003@github.com> Message-ID: On Thu, 15 May 2025 13:00:19 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - Emanuel's review >> - Christian's review > > src/hotspot/share/opto/loopnode.cpp line 1191: > >> 1189: loop->compute_trip_count(this, bt); >> 1190: // Loop must run for no more than iter_limits as it guarantees no overflow of scale * iv in long range checks. >> 1191: bool known_short_running_loop = head->trip_count() <= iters_limit / ABS(stride_con); > > Can you also add a comment about the decision of the hardcoded `iters_limit / ABS(stride_con)` limit to indicate a short running long loop? Rather than `ShortLoopIter`? Or is it something else that you'd like to be better explained? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2091476799 From asmehra at openjdk.org Thu May 15 15:50:01 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 15 May 2025 15:50:01 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v6] In-Reply-To: References: <78E_Q21KGNiVZ1EC_AeoZQYufPqYWnWkHJSoptXBnMY=.c2287602-9a6d-4ff1-9e47-297a11f82d46@github.com> Message-ID: On Thu, 15 May 2025 15:31:34 GMT, Ashutosh Mehra wrote: >> src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 2188: >> >>> 2186: #endif >>> 2187: const char* name = SharedRuntime::stub_name(SharedStubId::deopt_id); >>> 2188: CodeBlob* blob = AOTCodeCache::load_code_blob(AOTCodeEntry::SharedBlob, (uint)SharedStubId::deopt_id, name); >> >> For most shared blobs we have a single entry at offset 0. However, for the deopt blob we have 3 (or 5) extra entry points which are embedded in the deopt blob as field values. Saving and restoring the blob internal state removes the need to pass a count and array of entry offsets into load_code_blob and store_code_blob at this point. >> That makes we wonder if we need to do the same with AdapterBlob. If we embedded the offsets that are currently stored in AdapterHandlerEntry into AdapterBlob then we could also avoid having to explicitly pass the count and array of offsets at AOT load/store for adapters. The getters in AdapterHandlerEntry could fetch them indirectly or else the entry could cache them locally when it is initialized depending on whether we care about a memory indirection. Maybe this would make things more uniform? > > yup, I agree and I have the similar idea of storing entry points in AdapterBlob just like DeoptimizationBlob. Currently the pointer to AdapterBlob is not maintained anywhere. So the AdapterHandlerEntry would also have to maintain pointer to AdapterBlob. > I was actually wondering if we can eliminate AdapterHandlerEntry by also storing AdapterFingerprint in the AdapterBlob. The Method can then have a pointer to AdapterBlob. I will open an RFE to move entry points to AdapterBlob. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2091498971 From aph at openjdk.org Thu May 15 15:52:14 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 15 May 2025 15:52:14 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v7] In-Reply-To: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: > This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). > > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op > MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op > MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op > MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Delete unused label ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25147/files - new: https://git.openjdk.org/jdk/pull/25147/files/e5771988..da574ccb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25147&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25147&range=05-06 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25147/head:pull/25147 PR: https://git.openjdk.org/jdk/pull/25147 From aph at openjdk.org Thu May 15 15:52:14 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 15 May 2025 15:52:14 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v6] In-Reply-To: References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: On Wed, 14 May 2025 15:36:32 GMT, Andrew Dinn wrote: > Looks good to me. Sorry, please approve again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25147#issuecomment-2884310593 From kvn at openjdk.org Thu May 15 15:55:57 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 15 May 2025 15:55:57 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v6] In-Reply-To: References: <78E_Q21KGNiVZ1EC_AeoZQYufPqYWnWkHJSoptXBnMY=.c2287602-9a6d-4ff1-9e47-297a11f82d46@github.com> Message-ID: On Thu, 15 May 2025 15:47:38 GMT, Ashutosh Mehra wrote: >> yup, I agree and I have the similar idea of storing entry points in AdapterBlob just like DeoptimizationBlob. Currently the pointer to AdapterBlob is not maintained anywhere. So the AdapterHandlerEntry would also have to maintain pointer to AdapterBlob. >> I was actually wondering if we can eliminate AdapterHandlerEntry by also storing AdapterFingerprint in the AdapterBlob. The Method can then have a pointer to AdapterBlob. > > I will open an RFE to move entry points to AdapterBlob. Yes, let's do that in follow up RFE. @adinn can you approve current changes so we can proceed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2091509067 From lucy at openjdk.org Thu May 15 15:56:51 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 15 May 2025 15:56:51 GMT Subject: RFR: 8356778: Compiler add event logging in case of failures In-Reply-To: <2ADILKC05CmadEbbFmEJ9HIrkDEY0mfPc2XkumnuGMI=.3935341f-d8ef-4702-8b84-9aa4c7c36c2c@github.com> References: <2ADILKC05CmadEbbFmEJ9HIrkDEY0mfPc2XkumnuGMI=.3935341f-d8ef-4702-8b84-9aa4c7c36c2c@github.com> Message-ID: On Mon, 12 May 2025 17:56:47 GMT, Matthias Baesken wrote: > We should add event logging to some related hotspot methods. While testing this functionality it turned out that sometimes the CompileTask pointer is 0, so this needs to be check to avoid crashes. LGTM. Martin's comment needs to be resolved, of course. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25188#pullrequestreview-2844254692 From lucy at openjdk.org Thu May 15 15:56:53 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 15 May 2025 15:56:53 GMT Subject: RFR: 8356778: Compiler add event logging in case of failures In-Reply-To: References: <2ADILKC05CmadEbbFmEJ9HIrkDEY0mfPc2XkumnuGMI=.3935341f-d8ef-4702-8b84-9aa4c7c36c2c@github.com> Message-ID: On Thu, 15 May 2025 14:56:41 GMT, Martin Doerr wrote: >> We should add event logging to some related hotspot methods. While testing this functionality it turned out that sometimes the CompileTask pointer is 0, so this needs to be check to avoid crashes. > > src/hotspot/share/c1/c1_Compilation.cpp line 651: > >> 649: assert(msg != nullptr, "bailout message must exist"); >> 650: // record the bailout for hserr envlog >> 651: if (msg != nullptr) { > > How can it be nullptr? All callers pass some message. What if I create a new call like char* myMsg = nullptr; . . . bailout(myMsg); and, due to complicated logic, `myMsg` is not assigned a value in all cases? General topic: future-proof code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25188#discussion_r2091509857 From aph at openjdk.org Thu May 15 16:03:44 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 15 May 2025 16:03:44 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v8] In-Reply-To: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: > This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). > > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op > MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op > MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op > MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Copyright format correction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25147/files - new: https://git.openjdk.org/jdk/pull/25147/files/da574ccb..283b900e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25147&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25147&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25147/head:pull/25147 PR: https://git.openjdk.org/jdk/pull/25147 From mdoerr at openjdk.org Thu May 15 16:06:53 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 15 May 2025 16:06:53 GMT Subject: RFR: 8356778: Compiler add event logging in case of failures In-Reply-To: References: <2ADILKC05CmadEbbFmEJ9HIrkDEY0mfPc2XkumnuGMI=.3935341f-d8ef-4702-8b84-9aa4c7c36c2c@github.com> Message-ID: <84X-kth27PmvgofIVuekTKIgZMclBEdKMXyXPlcqr4E=.c2419cf7-b1cb-499c-b62d-0da512f01f50@github.com> On Thu, 15 May 2025 15:53:19 GMT, Lutz Schmidt wrote: >> src/hotspot/share/c1/c1_Compilation.cpp line 651: >> >>> 649: assert(msg != nullptr, "bailout message must exist"); >>> 650: // record the bailout for hserr envlog >>> 651: if (msg != nullptr) { >> >> How can it be nullptr? All callers pass some message. > > What if I create a new call like > > char* myMsg = nullptr; > . . . > bailout(myMsg); > > and, due to complicated logic, `myMsg` is not assigned a value in all cases? General topic: future-proof code. I could live with the check, but we get other problems when it is nullptr because nullptr is interpreted as no bailout! `bool bailed_out() const { return _bailout_msg != nullptr; }` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25188#discussion_r2091530392 From jbhateja at openjdk.org Thu May 15 16:14:59 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 15 May 2025 16:14:59 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v21] In-Reply-To: <_2u1KhRhzp2BeKn9MaUQhSPH8OXCWturNkj21xz5nn4=.c40c4371-4b11-4ee6-8ef3-c29f9c6723a5@github.com> References: <_2u1KhRhzp2BeKn9MaUQhSPH8OXCWturNkj21xz5nn4=.c40c4371-4b11-4ee6-8ef3-c29f9c6723a5@github.com> Message-ID: On Wed, 7 May 2025 19:00:59 GMT, Srinivas Vamsi Parasa wrote: >> Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. >> >> For example: >> >> `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding >> `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > eimull revert fully to original version src/hotspot/cpu/x86/assembler_x86.cpp line 2517: > 2515: > 2516: void Assembler::eimull(Register dst, Register src1, Register src2, bool no_flags) { > 2517: evex_opcode_prefix_and_encode_swap(dst->encoding(), src1->encoding(), src2->encoding(), VEX_SIMD_NONE, /* MAP4 */VEX_OPCODE_0F_3C, EVEX_32bit, 0xAF, no_flags, true /* is_map1 */); Suggestion: evex_opcode_prefix_and_encode_swap(dst->encoding(), src1->encoding(), src2->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C /* MAP4 */, EVEX_32bit, 0xAF, no_flags, true /* is_map1 */); src/hotspot/cpu/x86/assembler_x86.cpp line 4068: > 4066: void Assembler::enegl(Register dst, Register src, bool no_flags) { > 4067: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 4068: int encode = evex_prefix_and_encode_ndd(dst->encoding(), src->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); Since VEX_OPCODE_OF_3C is not a standard MAP4 encoding please add /* MAP4 */ after it in the places. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2089247442 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2089250955 From jbhateja at openjdk.org Thu May 15 16:15:02 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 15 May 2025 16:15:02 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v22] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 14:04:48 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/x86_64.ad line 8688: >> >>> 8686: ins_pipe(ialu_reg_reg_alu0); >>> 8687: %} >>> 8688: >> >> Hi @vamsi-parasa , >> Can you also remove pattern at line number 7071, we may need to handle it diffently, I understand memory operand ordering can impact the NDD instruction opcode, but ADLC automatically create multiple DFA match patterns for commutative operations and in this case we don't see a pattern corresponding to line 7071 in ADLC generate dfa_x86.cpp since it may have assumed pattern to be equivalent to the one at line 7056. >> >> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L7056 >> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L7071 > > Same may also apply to following pattern pairs. > > instruct xorI_rReg_mem_rReg_ndd(rRegI dst, memory src1, rRegI src2, rFlagsReg cr) > instruct xorI_rReg_rReg_mem_ndd(rRegI dst, rRegI src1, memory src2, rFlagsReg cr) > > instruct andL_rReg_mem_rReg_ndd(rRegL dst, memory src1, rRegL src2, rFlagsReg cr) > instruct andL_rReg_rReg_mem_ndd(rRegL dst, rRegL src1, memory src2, rFlagsReg cr) > > instruct xorL_rReg_mem_rReg_ndd(rRegL dst, memory src1, rRegL src2, rFlagsReg cr) > instruct xorL_rReg_rReg_mem_ndd(rRegL dst, rRegL src1, memory src2, rFlagsReg cr) We should perefer retaining pattens which are opcode affinity towards demoted instrctuon. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2091297765 From jbhateja at openjdk.org Thu May 15 16:15:02 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 15 May 2025 16:15:02 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v22] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 13:18:33 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix for UseAddressNop related failure > > src/hotspot/cpu/x86/x86_64.ad line 8688: > >> 8686: ins_pipe(ialu_reg_reg_alu0); >> 8687: %} >> 8688: > > Hi @vamsi-parasa , > Can you also remove pattern at line number 7071, we may need to handle it diffently, I understand memory operand ordering can impact the NDD instruction opcode, but ADLC automatically create multiple DFA match patterns for commutative operations and in this case we don't see a pattern corresponding to line 7071 in ADLC generate dfa_x86.cpp since it may have assumed pattern to be equivalent to the one at line 7056. > > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L7056 > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L7071 Same may also apply to following pattern pairs. instruct xorI_rReg_mem_rReg_ndd(rRegI dst, memory src1, rRegI src2, rFlagsReg cr) instruct xorI_rReg_rReg_mem_ndd(rRegI dst, rRegI src1, memory src2, rFlagsReg cr) instruct andL_rReg_mem_rReg_ndd(rRegL dst, memory src1, rRegL src2, rFlagsReg cr) instruct andL_rReg_rReg_mem_ndd(rRegL dst, rRegL src1, memory src2, rFlagsReg cr) instruct xorL_rReg_mem_rReg_ndd(rRegL dst, memory src1, rRegL src2, rFlagsReg cr) instruct xorL_rReg_rReg_mem_ndd(rRegL dst, rRegL src1, memory src2, rFlagsReg cr) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2091258454 From jbhateja at openjdk.org Thu May 15 16:15:01 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 15 May 2025 16:15:01 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v22] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 01:24:51 GMT, Srinivas Vamsi Parasa wrote: >> Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. >> >> For example: >> >> `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding >> `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Fix for UseAddressNop related failure src/hotspot/cpu/x86/x86_64.ad line 8688: > 8686: ins_pipe(ialu_reg_reg_alu0); > 8687: %} > 8688: Hi @vamsi-parasa , Can you also remove pattern at line number 7071, we may need to handle it diffently, I understand memory operand ordering can impact the NDD instruction opcode, but ADLC automatically create multiple DFA match patterns for commutative operations and in this case we don't see a pattern corresponding to line 7071 in ADLC generate dfa_x86.cpp since it may have assumed pattern to be equivalent to the one at line 7056. https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L7056 https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86_64.ad#L7071 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2091156463 From dhanalla at openjdk.org Thu May 15 16:31:17 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Thu, 15 May 2025 16:31:17 GMT Subject: RFR: 8315916: assert(C->live_nodes() <= C->max_node_limit()) failed: Live Node limit exceeded [v12] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 21:53:29 GMT, Dhamoder Nalla wrote: >> **Problem:** >> In the debug build, the assertion assert(C->live_nodes() <= C->max_node_limit()) is triggered during the parsing phase when the compiler creates more than 80K live nodes while scalarizing large arrays. In the release build, however, compilation proceeds until code generation and then bails out at Compile::check_node_count(), completing execution without crashing. >> >> This discrepancy occurs because two Phi nodes are added per array element during scalar replacement, leading to a rapid increase in node count?especially when the EliminateAllocationArraySizeLimit JVM option is set high. When the assert is commented out, both builds behave similarly, bailing out during code generation after fully building the ideal graph. >> >> **Proposed Solution:** >> Introduce a bailout check during graph building in the scalar replacement phase. If the number of live nodes exceeds a defined threshold, the compiler will bail out and trigger a recompilation without Escape Analysis (EA). This prevents the construction of an excessively large graph and ensures consistency between debug and release builds. >> >> This approach is preferable because the graph may already be partially modified with scalarization-related nodes, and a clean recompilation path helps maintain compiler stability and performance. The bailout logic has been aligned with the existing mechanism in escape.cpp and refactored to reuse the same functionality where appropriate. > > Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: > > add additional run with fewer flags I?m pausing work on this PR for now due to other priorities. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20504#issuecomment-2884419129 From adinn at openjdk.org Thu May 15 16:39:58 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 15 May 2025 16:39:58 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v6] In-Reply-To: References: Message-ID: On Tue, 13 May 2025 18:03:09 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Update test to make it more resilient > > Signed-off-by: Ashutosh Mehra > - Remove more unused code > > Signed-off-by: Ashutosh Mehra > - Fix whitespace issue. Remove unused code. > > Signed-off-by: Ashutosh Mehra > - Add test for using AOTCodeCache with different CompressedOops > configuration > > Signed-off-by: Ashutosh Mehra > - Add check for compressed oops base address; minor refacotring > > Signed-off-by: Ashutosh Mehra > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - ... and 8 more: https://git.openjdk.org/jdk/compare/d1543429...5d7c3aa0 I looked through everything and could not spot any issues other than moving the adapter offsets into the blob. If we do decide to change that it can wait until we fold in the StubGen stub save/restore -- where we will need to save the blob plus extra info about the stubs within the blob and offsets to oner more entries within each stub. We will also eventually need to clean up registration of stub entry addresses in the address table but that has to wait until I have completed implementing generator code to produce an enum that identifies all generated entries. So I think this is ok to go in for now. ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25019#pullrequestreview-2844379139 From hgreule at openjdk.org Thu May 15 16:55:38 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 15 May 2025 16:55:38 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value Message-ID: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> This change improves the precision of the `Mod(I|L)Node::Value()` functions. I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early. The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions. ### Monotonicity Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range). ### Testing I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something). Please review and let me know what you think. ### Other The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508. During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into: - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement? - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd. ------------- Commit messages: - adapt uabs -> g_uabs name change - change range of mod by 0 for PhaseCCP - Improve ModLNode::Value - ModLNode::Value tests - improve ModINode::Value - ModINode::Value tests Changes: https://git.openjdk.org/jdk/pull/25254/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25254&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356813 Stats: 514 lines in 3 files changed: 477 ins; 23 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/25254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25254/head:pull/25254 PR: https://git.openjdk.org/jdk/pull/25254 From sparasa at openjdk.org Thu May 15 17:07:14 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 15 May 2025 17:07:14 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v23] In-Reply-To: References: Message-ID: > Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. > > For example: > > `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding > `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/cpu/x86/assembler_x86.cpp Reorder comment Co-authored-by: Jatin Bhateja ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/4e749710..21333a5c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=21-22 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From adinn at openjdk.org Thu May 15 17:17:00 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 15 May 2025 17:17:00 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v6] In-Reply-To: References: <78E_Q21KGNiVZ1EC_AeoZQYufPqYWnWkHJSoptXBnMY=.c2287602-9a6d-4ff1-9e47-297a11f82d46@github.com> Message-ID: <_MoyQbxwWIB15xC3tli43tJzruGl1QriOkfaBrjYasQ=.c22e5e6a-817f-4a82-9f9b-94045762cf46@github.com> On Thu, 15 May 2025 15:52:54 GMT, Vladimir Kozlov wrote: >> I will open an RFE to move entry points to AdapterBlob. > > Yes, let's do that in follow up RFE. > > @adinn can you approve current changes so we can proceed? Done! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25019#discussion_r2091645119 From asmehra at openjdk.org Thu May 15 17:22:04 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 15 May 2025 17:22:04 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v2] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 15:17:02 GMT, Andrew Dinn wrote: >> Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Merge branch 'master' into preserve-runtime-blobs-master >> - Address Vladimir's comments >> >> Signed-off-by: Ashutosh Mehra >> - Remove irrelevant comment >> >> Signed-off-by: Ashutosh Mehra >> - Fix win64 compile failures >> >> Signed-off-by: Ashutosh Mehra >> - Fix AOTCodeFlags.java test >> >> Signed-off-by: Ashutosh Mehra >> - Fix compile failure in minimal config >> >> Signed-off-by: Ashutosh Mehra >> - Revert back changes that added AOTRuntimeConstants. >> Ensure CompressedOops::base and CompressedKlssPointers::base does not >> change in production run >> >> Signed-off-by: Ashutosh Mehra >> - Fix merge conflicts >> >> Signed-off-by: Ashutosh Mehra >> - Store/load AsmRemarks and DbgStrings in aot code cache >> >> Signed-off-by: Ashutosh Mehra >> - Add missing external address in aarch64 >> >> Signed-off-by: Ashutosh Mehra >> - ... and 1 more: https://git.openjdk.org/jdk/compare/2a4f37cc...ba612dab > > Having discussed this with @fisk it appears that the weak reference load performed by the c2i adapters will not attempt a decode. The barrier load_at method only performs a decode when the decorators include `IN_HEAP`. `resolve_weak_handle` passes in the `IN_NATIVE` decorator which implies no decode should be performed. > > So, this means we can use the adapters even if the compressed oop base differs between training run and production run. @adinn thanks for the review ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2884542516 From asmehra at openjdk.org Thu May 15 17:22:05 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Thu, 15 May 2025 17:22:05 GMT Subject: Integrated: 8354887: Preserve runtime blobs in AOT code cache In-Reply-To: References: Message-ID: On Sat, 3 May 2025 04:10:01 GMT, Ashutosh Mehra wrote: > [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. > This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. > `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. This pull request has now been integrated. Changeset: c59debb3 Author: Ashutosh Mehra URL: https://git.openjdk.org/jdk/commit/c59debb3844d009ac501a48c31822a07f00521e9 Stats: 1360 lines in 31 files changed: 1100 ins; 125 del; 135 mod 8354887: Preserve runtime blobs in AOT code cache Co-authored-by: Andrew Dinn Reviewed-by: kvn, adinn ------------- PR: https://git.openjdk.org/jdk/pull/25019 From jbhateja at openjdk.org Thu May 15 17:32:56 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 15 May 2025 17:32:56 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v23] In-Reply-To: References: Message-ID: <1idIm8tBCRQr6x6mQkqP0kOeVMw733oV7EpbXoGcQ7k=.02674143-e237-4fe2-9da3-40c8805c3ec1@github.com> On Thu, 15 May 2025 17:07:14 GMT, Srinivas Vamsi Parasa wrote: >> Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. >> >> For example: >> >> `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding >> `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/x86/assembler_x86.cpp > > Reorder comment > > Co-authored-by: Jatin Bhateja Hi @vamsi-parasa , I don't see demotion tests being generated with with python3 x86-asmtest.py --full ------------- PR Review: https://git.openjdk.org/jdk/pull/24431#pullrequestreview-2844507674 From jbhateja at openjdk.org Thu May 15 17:38:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 15 May 2025 17:38:57 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v23] In-Reply-To: References: Message-ID: <6aZaHfVvUJFLz83fyZ42bnoSGseaRBYd0jEg_VLdS2Q=.4c681def-ee7c-4fcd-b147-348d317ac58f@github.com> On Thu, 15 May 2025 17:07:14 GMT, Srinivas Vamsi Parasa wrote: >> Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. >> >> For example: >> >> `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding >> `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/cpu/x86/assembler_x86.cpp > > Reorder comment > > Co-authored-by: Jatin Bhateja We are only handling first variant of NDD instruction[1] in python test script , please extend the script to cover second variant[2] also. eaddq(Register dst, Register src1, Address src2, bool no_flags) - [1] eaddq(Register dst, Address src1, Register src2, bool no_flags) - [2] ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2884577990 From adinn at openjdk.org Thu May 15 17:41:52 2025 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 15 May 2025 17:41:52 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v8] In-Reply-To: References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: On Thu, 15 May 2025 16:03:44 GMT, Andrew Haley wrote: >> This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). >> >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op >> MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op >> MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op >> MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Copyright format correction Still good ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25147#pullrequestreview-2844529876 From qamai at openjdk.org Thu May 15 17:49:54 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 15 May 2025 17:49:54 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value In-Reply-To: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: On Thu, 15 May 2025 15:13:18 GMT, Hannes Greule wrote: > Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range). Can we return `Type::TOP` instead? Besides, #17508 should be merged right after JDK-25 folk, do you want to wait for it first? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25254#issuecomment-2884613438 From sparasa at openjdk.org Thu May 15 17:52:13 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 15 May 2025 17:52:13 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v24] In-Reply-To: References: Message-ID: > Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. > > For example: > > `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding > `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Add /*MAP 4*/ comment next to VEX_OPCODE_0F_3C ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/21333a5c..ca7ba027 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=22-23 Stats: 159 lines in 1 file changed: 0 ins; 0 del; 159 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From sparasa at openjdk.org Thu May 15 17:52:13 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 15 May 2025 17:52:13 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v21] In-Reply-To: References: <_2u1KhRhzp2BeKn9MaUQhSPH8OXCWturNkj21xz5nn4=.c40c4371-4b11-4ee6-8ef3-c29f9c6723a5@github.com> Message-ID: <8mMdxlpwB2IR3ZbL6Uyqkvbg8olGcCzPAe43rvxTWTw=.496ee861-a961-4184-a8f9-4b3a8cdd2277@github.com> On Wed, 14 May 2025 15:46:48 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> eimull revert fully to original version > > src/hotspot/cpu/x86/assembler_x86.cpp line 4068: > >> 4066: void Assembler::enegl(Register dst, Register src, bool no_flags) { >> 4067: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 4068: int encode = evex_prefix_and_encode_ndd(dst->encoding(), src->encoding(), VEX_SIMD_NONE, VEX_OPCODE_0F_3C, &attributes, no_flags); > > Since VEX_OPCODE_OF_3C is not a standard MAP4 encoding please add /* MAP4 */ after it in the places. Please see the update code with the change `VEX_OPCODE_OF_3C /* MAP4 */` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2091696322 From hgreule at openjdk.org Thu May 15 18:10:51 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 15 May 2025 18:10:51 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: On Thu, 15 May 2025 17:47:16 GMT, Quan Anh Mai wrote: > > Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range). > > Can we return `Type::TOP` instead? That should work too and might be more intuitive. I assume there also isn't much benefit in constant-folding users of the mod if the mod is known to fail (which seems to be the only benefit of not returning TOP?). > Besides, #17508 should be merged right after JDK-25 folk, do you want to wait for it first? We can wait if it makes sense to do the unsigned variants here too, but I'm also fine with doing it separately. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25254#issuecomment-2884668466 From lucy at openjdk.org Thu May 15 20:33:54 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 15 May 2025 20:33:54 GMT Subject: RFR: 8356778: Compiler add event logging in case of failures In-Reply-To: <84X-kth27PmvgofIVuekTKIgZMclBEdKMXyXPlcqr4E=.c2419cf7-b1cb-499c-b62d-0da512f01f50@github.com> References: <2ADILKC05CmadEbbFmEJ9HIrkDEY0mfPc2XkumnuGMI=.3935341f-d8ef-4702-8b84-9aa4c7c36c2c@github.com> <84X-kth27PmvgofIVuekTKIgZMclBEdKMXyXPlcqr4E=.c2419cf7-b1cb-499c-b62d-0da512f01f50@github.com> Message-ID: On Thu, 15 May 2025 16:03:58 GMT, Martin Doerr wrote: >> What if I create a new call like >> >> char* myMsg = nullptr; >> . . . >> bailout(myMsg); >> >> and, due to complicated logic, `myMsg` is not assigned a value in all cases? General topic: future-proof code. > > I could live with the check, but we get other problems when it is nullptr because nullptr is interpreted as no bailout! > `bool bailed_out() const { return _bailout_msg != nullptr; }` OK, with that knowledge and without the check, we crash when we try to log the null message. Otherwise, we crash or fail sometime later - or some strange things happen. If we follow the "fail early" principle, the additional check should not be there. We could as well convert the assert into a guarantee to enforce a "planned" crash. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25188#discussion_r2091914815 From sparasa at openjdk.org Thu May 15 20:44:46 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 15 May 2025 20:44:46 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v25] In-Reply-To: References: Message-ID: > Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. > > For example: > > `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding > `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: remove redundant reg_mem_reg match rule for commutative ALU opcodes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/ca7ba027..7cfcebe9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=23-24 Stats: 75 lines in 1 file changed: 0 ins; 75 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From sparasa at openjdk.org Thu May 15 20:44:47 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 15 May 2025 20:44:47 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v25] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 14:22:24 GMT, Jatin Bhateja wrote: >> Same may also apply to following pattern pairs. >> >> instruct xorI_rReg_mem_rReg_ndd(rRegI dst, memory src1, rRegI src2, rFlagsReg cr) >> instruct xorI_rReg_rReg_mem_ndd(rRegI dst, rRegI src1, memory src2, rFlagsReg cr) >> >> instruct andL_rReg_mem_rReg_ndd(rRegL dst, memory src1, rRegL src2, rFlagsReg cr) >> instruct andL_rReg_rReg_mem_ndd(rRegL dst, rRegL src1, memory src2, rFlagsReg cr) >> >> instruct xorL_rReg_mem_rReg_ndd(rRegL dst, memory src1, rRegL src2, rFlagsReg cr) >> instruct xorL_rReg_rReg_mem_ndd(rRegL dst, rRegL src1, memory src2, rFlagsReg cr) > > We should perefer retaining pattens which are opcode affinity towards demoted instrctuon. Please see the updated x86_64.ad file with the redundant rReg_rReg_mem patterns removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2091928151 From vlivanov at openjdk.org Thu May 15 21:27:52 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 15 May 2025 21:27:52 GMT Subject: RFR: 8355094: Performance drop in auto-vectorized kernel due to split store [v2] In-Reply-To: References: Message-ID: <01bH8JnbAP-jAqlblAMBBtWmwHhv6PC-kEQ3ZLNd-FY=.f2d0f4db-01b0-4ad5-b086-9c57ad4fcebc@github.com> On Thu, 15 May 2025 09:21:34 GMT, Emanuel Peter wrote: >> **Summary** >> >> Before [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155) / https://github.com/openjdk/jdk/pull/18822, we used to prefer aligning to stores. But in that change, I removed that preference, and since then we have been aligning to loads instead (there is no preference, but since loads usually come before stores in the loop body, the load gets picked). This lead to a performance regression, especially on `x64`. >> >> Especially on `x64`, it is more important to align stores than aligning loads. This is because memory operations that cross a cacheline boundary are split. And `x64` CPU's generally have more throughput for loads than for stores, so splitting a store is worse than splitting a load. >> >> On `aarch64`, the results are less clear. On two machines, the differences were marginal, but surprisingly aligning to loads was marginally faster. On another machine, aligning to stores was significantly faster. I suspect performance depends on the exact `aarch64` implementation. I'm not an `aarch64` specialist, and only have access to a limited number of machines. >> >> **Fix**: make automatic alignment configurable with `SuperWordAutomaticAlignment` (no alignment, align to store, align to load). Default is align to store. >> >> For now, I will just align to stores on all platforms. If someone has various `aarch64` machines, they are welcome do do deeper investigations. Same for other platforms. We could always turn the flag into a platform dependent one, and set different defaults depending on the exact CPU. >> >> If you are interested you can read my investigations/benchmark results below. Therre are a lot of colorful plots ? ? >> >> **FYI about Vector API:** if you are working with the Vector API, you may also want to worry about **alignment**, because there can be a **significant performance impact** (30%+ in some cases). You may also want to know about **4k aliasing**, discussed below. >> >> **Shoutout:** >> - @jatin-bhateja filed the regression, and explained that it was about split stores. >> - @mhaessig helped me talk through some of the early benchmarks. >> - @iwanowww pointed me to the 4k aliasing explanation. >> >> -------------------- >> >> **Introduction** >> >> I had long lived with the **theory that on modern CPUs, misalignment has no consequence, especially no performance impact**. When you google, many sources say that misalignment used to be an issue on older CPUs, but not any more. >> >> That may **technically** be true: >> - A ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/superword.cpp > > Co-authored-by: Manuel H?ssig Impressive analysis, Emanuel! Very deep, thorough, and insightful. Looks good. Speaking of Vector API, we experimented with getting access alignment under control. Unfortunately, when it comes to on-heap accesses it boils down to hyper-aligned objects support which is not there yet. PS: yay, you found a way to turn PRs into blog posts! :-) ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25065#pullrequestreview-2845004209 From vlivanov at openjdk.org Thu May 15 21:58:54 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 15 May 2025 21:58:54 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: <4vbXpgvmXv6Ba1fEkMKIRpUnXZ-QVdAZ7rgicqxVhpM=.7dda802c-9b8a-459d-9bd7-7a83d9fc1744@github.com> On Wed, 30 Apr 2025 13:18:33 GMT, Marc Chevalier wrote: > A first part toward a better support of pure functions. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! > > IR framework and IGV needed a little bit of fixing. > > Thanks, > Marc Interesting! I wasn't aware ADLC already features such support. Thanks for the pointers. It does look attractive, especially for platform-specific use cases. But there are some pitfalls which makes it hard to use on its own. In particular, data nodes are aggressively commoned and freely flow in the graph. Unless it is taken into account during GVN and code motion, the final schedule may end up far from optimal. (In other words, it's highly beneficial to match only expensive nodes in such a way.) Moreover, some optimizations are highly sensitive to the presence of calls. (Think of the consequences of a call scheduled inside a heavily vectorized loop.) Macro-expansion also suffers from some of those issues, but still IMO an explicit `Call` node is a more appropriate solution to the problem. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2885142373 From sparasa at openjdk.org Fri May 16 01:20:57 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 16 May 2025 01:20:57 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v26] In-Reply-To: References: Message-ID: > Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. > > For example: > > `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding > `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding Srinivas Vamsi Parasa has updated the pull request incrementally with seven additional commits since the last revision: - complete tests refactoring to work with full_set - refactor demotion for full_set[4/5] - refactor demotion for full_set[3/5] - refactor demotion for full_set[2/5] - refactor RegRegRegImmNddInstruction demotion to work for full set as well - move demote loop in tests to to outer most - clean up gtest generation for full_set=False ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/7cfcebe9..65656aae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=24-25 Stats: 2824 lines in 3 files changed: 306 ins; 197 del; 2321 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From sparasa at openjdk.org Fri May 16 01:20:57 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 16 May 2025 01:20:57 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v14] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 22:06:31 GMT, Srinivas Vamsi Parasa wrote: >> Hi Sandhya (@sviswa7) and Jatin (@jatin-bhateja), >> >> Could you please review the refactored changes? >> >> Thanks, >> Vamsi > >> @vamsi-parasa @sviswa7 Did you already test this with `sde` and the `-future` flag? Once this is fully reviewed I can also run our internal testing, just let me know when you are ready :) > > Hi Emanuel (@eme64), > > Thank you for the message! > We're waiting for one more review from Jatin. Will let you know when that's completed. > > Thanks, > Vamsi > Hi @vamsi-parasa , I don't see demotion tests being generated with full mode gtest, i.e. python3 x86-asmtest.py --full Please see the updated `x86-asmtest.py` refactored to work with full set (`--full`). Please let me know if anything is missing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2885389546 From sparasa at openjdk.org Fri May 16 01:27:06 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 16 May 2025 01:27:06 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v23] In-Reply-To: <6aZaHfVvUJFLz83fyZ42bnoSGseaRBYd0jEg_VLdS2Q=.4c681def-ee7c-4fcd-b147-348d317ac58f@github.com> References: <6aZaHfVvUJFLz83fyZ42bnoSGseaRBYd0jEg_VLdS2Q=.4c681def-ee7c-4fcd-b147-348d317ac58f@github.com> Message-ID: <1e-92EcDWshsTiFbEmJt8z5SAVfhf5vpr8sgbEq3BbQ=.25d6d5f7-48d3-4a13-ac7d-8844844490fa@github.com> On Thu, 15 May 2025 17:33:03 GMT, Jatin Bhateja wrote: > We are only handling first variant of NDD instruction[1] in python test script , please extend the script to cover second variant[2] also. eaddq(Register dst, Register src1, Address src2, bool no_flags) - [1] eaddq(Register dst, Address src1, Register src2, bool no_flags) - [2] Hank's script is already handling the variant[2] in the `RegMemRegNddInstruction` class, for which no demotion is enabled. The demotion is enabled only for variant[1]. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2885396937 From jkarthikeyan at openjdk.org Fri May 16 03:56:55 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 16 May 2025 03:56:55 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v13] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 03:11:52 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: >> >> >> Baseline Patch >> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement >> VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) >> VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) >> VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) >> >> >> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Check for AVX2 for byte/long conversions I searched for a bug report but didn't find one for the specific issue, so I've filed one: https://bugs.openjdk.org/browse/JDK-8357085 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2885565349 From rcastanedalo at openjdk.org Fri May 16 04:15:02 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 16 May 2025 04:15:02 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v3] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 15 May 2025 13:25:21 GMT, Emanuel Peter wrote: > But not sure if that is worth it, or if we do that in higher tiers anyway? We already run this test with G1 (default) on tier1 and with all non-default GCs (including ZGC) in Oracle's internal tier3. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2092286950 From dholmes at openjdk.org Fri May 16 04:26:03 2025 From: dholmes at openjdk.org (David Holmes) Date: Fri, 16 May 2025 04:26:03 GMT Subject: RFR: 8354887: Preserve runtime blobs in AOT code cache [v6] In-Reply-To: References: Message-ID: <_7agKJ0ZAVaeBPGTHIK4jVbE8TX5TCeh2ZVAEu63CYI=.fa7dd58d-6dc6-41ef-a839-0133265d2339@github.com> On Tue, 13 May 2025 18:03:09 GMT, Ashutosh Mehra wrote: >> [8350209](https://bugs.openjdk.org/browse/JDK-8350209) introduced the framework for storing code in aot code cache and used it for caching i2c/c2i adapters. >> This PR extends the `AOTCodeCache` infrastructure and stores various runtime blobs (shared blobs, C1 and C2 runtime blobs) in the AOT code cache. It adds a new diagnostic flag `AOTStubCaching` to enable/disable the caching of these blobs. >> `AOTCodeFlags.java` test is extended to cover `AOTStubCaching`. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Update test to make it more resilient > > Signed-off-by: Ashutosh Mehra > - Remove more unused code > > Signed-off-by: Ashutosh Mehra > - Fix whitespace issue. Remove unused code. > > Signed-off-by: Ashutosh Mehra > - Add test for using AOTCodeCache with different CompressedOops > configuration > > Signed-off-by: Ashutosh Mehra > - Add check for compressed oops base address; minor refacotring > > Signed-off-by: Ashutosh Mehra > - Merge branch 'master' into preserve-runtime-blobs-master > - Address Vladimir's comments > > Signed-off-by: Ashutosh Mehra > - Remove irrelevant comment > > Signed-off-by: Ashutosh Mehra > - ... and 8 more: https://git.openjdk.org/jdk/compare/d1543429...5d7c3aa0 This change has broken the Zero build - https://bugs.openjdk.org/browse/JDK-8357084 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25019#issuecomment-2885595160 From duke at openjdk.org Fri May 16 05:12:38 2025 From: duke at openjdk.org (kuaiwei) Date: Fri, 16 May 2025 05:12:38 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function [v5] In-Reply-To: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> Message-ID: > I wrote a test to check if every C2 IR node has correct size_of() function. And I found some of them are missed. They added new fields and not add size_of() to reflect new size. In linux, it does not cause issue so far, because gcc allocate more space for alignment and can keep these additional `bool` flags. But it will report failure on windows. And if anyone modified base class, it will cause problem. > > PS, My test is in https://github.com/openjdk/jdk/compare/master...kuaiwei:jdk:test/check_node_size , but it has many hack on IR nodes to make test to run. kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Minor change ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25081/files - new: https://git.openjdk.org/jdk/pull/25081/files/03f84eb7..08debbd7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25081&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25081&range=03-04 Stats: 12 lines in 2 files changed: 2 ins; 5 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25081.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25081/head:pull/25081 PR: https://git.openjdk.org/jdk/pull/25081 From duke at openjdk.org Fri May 16 05:12:39 2025 From: duke at openjdk.org (kuaiwei) Date: Fri, 16 May 2025 05:12:39 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function [v4] In-Reply-To: References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> <1AMdU-khBdc9AMeh3PxdmDPLAKvNdEggLO0478nxODw=.23a032ef-1081-4e88-b65f-e075023e5905@github.com> Message-ID: On Thu, 15 May 2025 13:36:41 GMT, Christian Hagedorn wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove cmp()/hash() for Opaque node > > src/hotspot/share/opto/intrinsicnode.hpp line 202: > >> 200: virtual const Type* Value(PhaseGVN* phase) const; >> 201: virtual uint size_of() const { return sizeof(EncodeISOArrayNode); } >> 202: virtual uint hash() const { return Node::hash() + ascii; } > > Was like that before but since you're touching the code now, can you also add a leading `_` for the `ascii` field? Fixed > src/hotspot/share/opto/machnode.hpp line 546: > >> 544: >> 545: private: >> 546: bool _do_polling; > > While touching this class, maybe move this field declaration up to the start of the class. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25081#discussion_r2092329048 PR Review Comment: https://git.openjdk.org/jdk/pull/25081#discussion_r2092329150 From galder at openjdk.org Fri May 16 05:30:03 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Fri, 16 May 2025 05:30:03 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: Message-ID: On Wed, 7 May 2025 11:59:48 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - Whitespace >> - Suggestions by Christian >> >> Co-authored-by: Christian Hagedorn >> - typo >> - For Christian: example and more intro >> - fix hashtag >> - manual merge >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - move library >> - Merge branch 'master' into JDK-8344942-TemplateFramework-v3 >> - ... and 6 more: https://git.openjdk.org/jdk/compare/0844745e...fae7ced6 > > Next batch of comments. Will probably resume tomorrow :-) > > > @chhagedorn Ok, I tried my best with the `(Un)FilledTemplate` refactoring. I'm still not sure if I want to rename `FilledTemplate` to `RenderableTemplate`, it is not super satisfying for a beginner either. Naming is hard. If anybody else has a better idea than `(Un)FilledTemplate`, please let me know ;) > > > I think one can continue reviewing this now! > > > > > > I've just quickly skimmed through this hierarchy. `(Un)FilledTemplate` reminds me a bit of the builder pattern. What about renaming `UnFilledTemplate` to `TemplateBuilder` and `FilledTemplate` to just `Template`? > > @galderz Thanks for the comment! > > For me both `UnfilledTemplate` and `FilledTemplate` are Templates. The unfilled one has the arguments not yet applied, the filled one has the argument applied. Calling the `UnfilledTemplate` a `TemplateBuilder` seems a little odd, because it is basically already Template, it just has some holes that need to be filled with arguments. In that sense, it is really similar to what the Java String Template was supposed to be. Thanks for the clarification. Given that explanation `TemplateBuilder` is not right. What about `PartialTemplate` instead of `UnfilledTemplate`? It's a template that is not yet complete since it has holes that need to be filled in, so it sound like a partially built template, hence `PartialTemplate`. To me it sounds better than `UnfilledTemplate`. Then, I would just rename `FilledTemplate` to `Template` since it's a complete template with all it needs so I don't think there's a need to add `Filled` to it. And I don't think `CompleteTemplate` is a good name either. Simply `Template`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2885672156 From bkilambi at openjdk.org Fri May 16 05:39:52 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Fri, 16 May 2025 05:39:52 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v2] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 01:00:10 GMT, Hao Sun wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > LGTM except one nit. Hi @shqking , does it look ok to you for an approval now ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25096#issuecomment-2885686030 From epeter at openjdk.org Fri May 16 06:24:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 06:24:00 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 05:27:28 GMT, Galder Zamarre?o wrote: >> Next batch of comments. Will probably resume tomorrow :-) > >> > > @chhagedorn Ok, I tried my best with the `(Un)FilledTemplate` refactoring. I'm still not sure if I want to rename `FilledTemplate` to `RenderableTemplate`, it is not super satisfying for a beginner either. Naming is hard. If anybody else has a better idea than `(Un)FilledTemplate`, please let me know ;) >> > > I think one can continue reviewing this now! >> > >> > >> > I've just quickly skimmed through this hierarchy. `(Un)FilledTemplate` reminds me a bit of the builder pattern. What about renaming `UnFilledTemplate` to `TemplateBuilder` and `FilledTemplate` to just `Template`? >> >> @galderz Thanks for the comment! >> >> For me both `UnfilledTemplate` and `FilledTemplate` are Templates. The unfilled one has the arguments not yet applied, the filled one has the argument applied. Calling the `UnfilledTemplate` a `TemplateBuilder` seems a little odd, because it is basically already Template, it just has some holes that need to be filled with arguments. In that sense, it is really similar to what the Java String Template was supposed to be. > > Thanks for the clarification. Given that explanation `TemplateBuilder` is not right. What about `PartialTemplate` instead of `UnfilledTemplate`? It's a template that is not yet complete since it has holes that need to be filled in, so it sound like a partially built template, hence `PartialTemplate`. To me it sounds better than `UnfilledTemplate`. Then, I would just rename `FilledTemplate` to `Template` since it's a complete template with all it needs so I don't think there's a need to add `Filled` to it. And I don't think `CompleteTemplate` is a good name either. Simply `Template`. @galderz Thanks for the reply :) I think you missed my most recent refactoring, see https://github.com/openjdk/jdk/pull/24217#issuecomment-2882822708 I discussed it offline with @chhagedorn @mhaessig and @robcasloz and they all agreed to it already ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2885754054 From epeter at openjdk.org Fri May 16 06:28:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 06:28:15 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v27] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Manuel H?ssig ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/76cbd833..0416dd72 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=25-26 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Fri May 16 06:28:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 06:28:15 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: <7QJJnZqTbVS7CwiGylXaI_7r1yCeQcH7JqPUechIB4A=.c40078f0-89b4-4c9b-a744-7aa429a35100@github.com> Message-ID: On Thu, 15 May 2025 08:35:01 GMT, Roberto Casta?eda Lozano wrote: >> @robcasloz They do pretty much the same though, they allow you to set a hashtag replacement. It is just a question of where you can place it, and if it captures the value in a Java variable as well. >> >> What do you mean to suggest with the name `setIn`? > > Note that I suggest the name `letIn`, not `setIn`. My intuition is that the variant with a third argument binds `key` to `value` **in** the scope of a `function` that is given explicitly, hence the suggestion to call it `letIn` instead of just `let`. But it's just a suggestion, feel free to disregard if you don't think it fits. I see. Thanks for the suggestion. I think I will keep it as is :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092409438 From epeter at openjdk.org Fri May 16 06:38:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 06:38:49 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v28] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix return comments for Manuel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/0416dd72..6661425f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=26-27 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Fri May 16 06:38:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 06:38:54 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v25] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 14:06:10 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix typo > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 255: > >> 253: * >> 254: * @param a The value for the (first) argument. >> 255: * @return The template its argument applied. > > Suggestion: > > * @return The template with its argument applied. Changed the wording completely, because we now return a `TemplateToken` :) > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 303: > >> 301: * @param a The value for the first argument. >> 302: * @param b The value for the second argument. >> 303: * @return The template all (two) arguments applied. > > Suggestion: > > * @return The template with all (two) arguments applied. Changed the wording completely, because we now return a `TemplateToken` :) > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 378: > >> 376: * @param b The value for the second argument. >> 377: * @param c The value for the third argument. >> 378: * @return The template all (three) arguments applied. > > Suggestion: > > * @return The template with all (three) arguments applied. Changed the wording completely, because we now return a `TemplateToken` :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092421485 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092421397 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092421326 From epeter at openjdk.org Fri May 16 07:02:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 07:02:50 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v29] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Elaborate why there is only one Renderer ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/6661425f..86524219 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=27-28 Stats: 19 lines in 1 file changed: 17 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Fri May 16 07:02:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 07:02:52 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v25] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 14:48:39 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix typo > > Thank you for the refactoring and your patience. I like the result and its simplicity a lot. > > I found a few typos, but otherwise it looks excellent. @mhaessig Thanks for your suggestions, I applied them all - or elaborated even further :) > test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 47: > >> 45: /** >> 46: * There can be at most one Renderer instance at any time. This is to avoid that users accidentally >> 47: * render templates to strings, rather than letting them all render together. > > Suggestion: > > * There can be at most one Renderer instance at any time. This is to avoid users accidentally > * rendering templates to separate strings, rather than letting them all render together. > > I do not understand the original sentence. My suggestion reflects what I understood. Hmm, maybe I have to use more words to be more explicit. I replaced the two lines with this: 45 /** ~ 46 * There can be at most one Renderer instance at any time. ~ 47 * + 48 * When using nested templates, the user of the Template Framework may be tempted to first render + 49 * the nested template to a {@link String}, and then use this {@link String} as a token in an outer + 50 * {@link Template#body}. This would be a bad pattern: the outer and nested {@link Template} would + 51 * be rendered separately, and could not interact. For example, the nested {@link Template} would + 52 * not have access to the scopes of the outer {@link Template}. The inner {@link Template} could + 53 * not access {@link Name}s and {@link Hook}s from the outer {@link Template}. The user might assume + 54 * that the inner {@link Template} has access to the outer {@link Template}, but they would actually + 55 * be separated. This could lead to unexpected behavior or even bugs. + 56 * + 57 * Instead, the user should create a {@link TemplateToken} from the inner {@link Template}, and + 58 * use that {@link TemplateToken} in the {@link Template#body} of the outer {@link Template}. + 59 * This way, the inner and outer {@link Template}s get rendered together, and the inner {@link Template} + 60 * has access to the {@link Name}s and {@link Hook}s of the outer {@link Template}. + 61 * + 62 * The {@link Renderer} instance exists during the whole rendering process. Should the user ever + 63 * attempt to render a nested {@link Template} to a {@link String}, we would detect that there is + 64 * already a {@link Renderer} instance for the outer {@link Template}, and throw a {@link RendererException}. 65 */ ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2885825080 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092450471 From mhaessig at openjdk.org Fri May 16 07:05:58 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 16 May 2025 07:05:58 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v25] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 06:56:52 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 47: >> >>> 45: /** >>> 46: * There can be at most one Renderer instance at any time. This is to avoid that users accidentally >>> 47: * render templates to strings, rather than letting them all render together. >> >> Suggestion: >> >> * There can be at most one Renderer instance at any time. This is to avoid users accidentally >> * rendering templates to separate strings, rather than letting them all render together. >> >> I do not understand the original sentence. My suggestion reflects what I understood. > > Hmm, maybe I have to use more words to be more explicit. I replaced the two lines with this: > > 45 /** > ~ 46 * There can be at most one Renderer instance at any time. > ~ 47 * > + 48 * When using nested templates, the user of the Template Framework may be tempted to first render > + 49 * the nested template to a {@link String}, and then use this {@link String} as a token in an outer > + 50 * {@link Template#body}. This would be a bad pattern: the outer and nested {@link Template} would > + 51 * be rendered separately, and could not interact. For example, the nested {@link Template} would > + 52 * not have access to the scopes of the outer {@link Template}. The inner {@link Template} could > + 53 * not access {@link Name}s and {@link Hook}s from the outer {@link Template}. The user might assume > + 54 * that the inner {@link Template} has access to the outer {@link Template}, but they would actually > + 55 * be separated. This could lead to unexpected behavior or even bugs. > + 56 * > + 57 * Instead, the user should create a {@link TemplateToken} from the inner {@link Template}, and > + 58 * use that {@link TemplateToken} in the {@link Template#body} of the outer {@link Template}. > + 59 * This way, the inner and outer {@link Template}s get rendered together, and the inner {@link Template} > + 60 * has access to the {@link Name}s and {@link Hook}s of the outer {@link Template}. > + 61 * > + 62 * The {@link Renderer} instance exists during the whole rendering process. Should the user ever > + 63 * attempt to render a nested {@link Template} to a {@link String}, we would detect that there is > + 64 * already a {@link Renderer} instance for the outer {@link Template}, and throw a {@link RendererException}. > 65 */ That provides much more clarity. However, the amount of text makes me think part of it should probably go into the Javadoc of the `Renderer` class to explain its usage further. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092458770 From epeter at openjdk.org Fri May 16 07:14:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 07:14:00 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v25] In-Reply-To: References: Message-ID: <0UziBkv_1Fi0YkYpVjuJ66pylx5J7anB10YA2aXEbpQ=.9b0e0172-4978-48fe-81e9-5aefae1f8dc6@github.com> On Thu, 15 May 2025 14:48:39 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix typo > > Thank you for the refactoring and your patience. I like the result and its simplicity a lot. > > I found a few typos, but otherwise it looks excellent. @mhaessig @galderz @robcasloz @chhagedorn Just again about the naming: I'm really happy with the names now. So thanks ? for pushing me on that, even though it was a little frustrating in the meantime ? - `Template`: the template as your create it with `Template.make`, with zero or more arguments. I was always thinking of this as the true template. It is not in any sense "incomplete" at all, it having free arguments is the core feature, not some kind of "deficiency". - `template.asToken(..args..)` -> `TemplateToken`. This `TemplateToken` can now only be used as a `Token` inside other `Templates`. This makes a lot of sense, because `Template.body` expects a list of tokens. - `template.render(..args..)` -> `String`. Going directly from the `Template` with free args, and not exposing the "intermediate state" (with applied args) side-steps the discussion on how this "intermediate state" should be called. We were going on and on about that, and now we don't have to any more :) It is a good lesson for me: try to avoid exposing "intermediate states" to the public API, unless it is really strictly necessary. Good naming for "intermediate states" is tricky. Anyway: looking forward to your reviews ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2885850441 From epeter at openjdk.org Fri May 16 07:29:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 07:29:53 GMT Subject: RFR: 8355094: Performance drop in auto-vectorized kernel due to split store [v2] In-Reply-To: <01bH8JnbAP-jAqlblAMBBtWmwHhv6PC-kEQ3ZLNd-FY=.f2d0f4db-01b0-4ad5-b086-9c57ad4fcebc@github.com> References: <01bH8JnbAP-jAqlblAMBBtWmwHhv6PC-kEQ3ZLNd-FY=.f2d0f4db-01b0-4ad5-b086-9c57ad4fcebc@github.com> Message-ID: On Thu, 15 May 2025 21:25:08 GMT, Vladimir Ivanov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/superword.cpp >> >> Co-authored-by: Manuel H?ssig > > Impressive analysis, Emanuel! Very deep, thorough, and insightful. > > Looks good. > > Speaking of Vector API, we experimented with getting access alignment under control. Unfortunately, when it comes to on-heap accesses it boils down to hyper-aligned objects support which is not there yet. > > PS: yay, you found a way to turn PRs into blog posts! :-) @iwanowww Thanks for your kind words ? Indeed: on-heap access would profit from hyper-aligned objects. Are there any ideas on how to do that? I wonder if it is worth it, or if it is good enough to just use off-heap (native) MemorySegments to guarantee alignment for very performance critical cases? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25065#issuecomment-2885883207 From epeter at openjdk.org Fri May 16 07:37:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 07:37:04 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v25] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 07:03:12 GMT, Manuel H?ssig wrote: >> Hmm, maybe I have to use more words to be more explicit. I replaced the two lines with this: >> >> 45 /** >> ~ 46 * There can be at most one Renderer instance at any time. >> ~ 47 * >> + 48 * When using nested templates, the user of the Template Framework may be tempted to first render >> + 49 * the nested template to a {@link String}, and then use this {@link String} as a token in an outer >> + 50 * {@link Template#body}. This would be a bad pattern: the outer and nested {@link Template} would >> + 51 * be rendered separately, and could not interact. For example, the nested {@link Template} would >> + 52 * not have access to the scopes of the outer {@link Template}. The inner {@link Template} could >> + 53 * not access {@link Name}s and {@link Hook}s from the outer {@link Template}. The user might assume >> + 54 * that the inner {@link Template} has access to the outer {@link Template}, but they would actually >> + 55 * be separated. This could lead to unexpected behavior or even bugs. >> + 56 * >> + 57 * Instead, the user should create a {@link TemplateToken} from the inner {@link Template}, and >> + 58 * use that {@link TemplateToken} in the {@link Template#body} of the outer {@link Template}. >> + 59 * This way, the inner and outer {@link Template}s get rendered together, and the inner {@link Template} >> + 60 * has access to the {@link Name}s and {@link Hook}s of the outer {@link Template}. >> + 61 * >> + 62 * The {@link Renderer} instance exists during the whole rendering process. Should the user ever >> + 63 * attempt to render a nested {@link Template} to a {@link String}, we would detect that there is >> + 64 * already a {@link Renderer} instance for the outer {@link Template}, and throw a {@link RendererException}. >> 65 */ > > That provides much more clarity. However, the amount of text makes me think part of it should probably go into the Javadoc of the `Renderer` class to explain its usage further. I don't have a super strong opinion here. The whole class is package private, so the user will never see this anyway. It is more of a note to the reader of the internal code. I though it makes most sense to put it with the static `renderer` field, to give a justification why we have it. Alternatively, I could also make the `RendererException` message more verbose. But I don't want to burden the user with all that justification text, I just want to give the user an alternative way to get done what the user wants to get done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092501713 From duke at openjdk.org Fri May 16 07:44:00 2025 From: duke at openjdk.org (erifan) Date: Fri, 16 May 2025 07:44:00 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6] In-Reply-To: References: Message-ID: <9f_-MMP_SSYInzCTUc5scRzKOKN1jj4VcnGEYWoOF14=.2eca7e44-d1df-4e57-a513-d8ddaddc9ea2@github.com> On Wed, 14 May 2025 02:44:14 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Refactor the JTReg tests for compare.xor(maskAll) > > Also made a bit change to support pattern `VectorMask.fromLong()`. > - Merge branch 'master' into JDK-8354242 > - Refactor code > > Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this > optimization, making the code more modular. > - Merge branch 'master' into JDK-8354242 > - Update the jtreg test > - Merge branch 'master' into JDK-8354242 > - Addressed some review comments > > 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. > 2. Improve code comments. > - Merge branch 'master' into JDK-8354242 > - Merge branch 'master' into JDK-8354242 > - 8354242: VectorAPI: combine vector not operation with compare > > This patch optimizes the following patterns: > For integer types: > ``` > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > ``` > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > ``` > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > ``` > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: > With option `-XX:UseSVE=2`: > ``` > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855... Hi, I have updated the code and I'll file the patch to convert `VectorMask.fromLong(SPECIES, -1)` to `maskAll()` soon, I'll cover this test case in that patch. Would you please help review the patch again, thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2885913474 From rcastanedalo at openjdk.org Fri May 16 07:44:53 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 16 May 2025 07:44:53 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: Replace control type with PhaseCFG::is_CFG test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25066/files - new: https://git.openjdk.org/jdk/pull/25066/files/a52b0730..b92500a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25066.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25066/head:pull/25066 PR: https://git.openjdk.org/jdk/pull/25066 From rcastanedalo at openjdk.org Fri May 16 07:51:59 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 16 May 2025 07:51:59 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v3] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 15 May 2025 13:23:21 GMT, Emanuel Peter wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Extend comments in zLoadP implementations to explain role of reload > > src/hotspot/share/opto/lcm.cpp line 80: > >> 78: >> 79: void PhaseCFG::move_node_and_its_projections_to_block(Node* n, Block* b) { >> 80: assert(n->bottom_type() != Type::CONTROL, "cannot move control node"); > > I usually check `n->is_CFG()`. > > What is the bottom type of an `IfNode`? > `virtual const Type *bottom_type() const { return TypeTuple::IFBOTH; }` > Are you aware of that? Note that the analysis operates at the Mach level, where `Node::is_CFG()` is not complete anymore and `If` nodes have been replaced by their platform-dependent implementations. I replaced the `n->bottom_type() != Type::CONTROL` test with `!PhaseCFG::is_CFG(n)`, which is analogous to `Node::is_CFG()` at the Mach level (and covers some additional nodes without control type that should not be moved anyway), see commit b92500a2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2092524938 From mchevalier at openjdk.org Fri May 16 07:58:37 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 16 May 2025 07:58:37 GMT Subject: RFR: 8355488: Add stress mode for C2 loop peeling [v3] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 12:19:05 GMT, Tobias Hartmann wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Back to PRODUCT for consistency > > src/hotspot/share/opto/loopTransform.cpp line 522: > >> 520: loop_head->_peeling_opportunities_count++; >> 521: // In case of stress, let's just pick randomly... >> 522: return phase->C->random() % 2 == 0 ? estimate : 0; > > Suggestion: > > return ((phase->C->random() % 2) == 0) ? estimate : 0; Done. But does that really helps you? To me it's just more confusing. It looks like "I put parentheses because it will associate in a surprising way", and then the most normal things comes. I often wonder whether the parentheses are misplaced, or whether there is something subtle I don't get. But anyway, parentheses are in! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25140#discussion_r2092532652 From mchevalier at openjdk.org Fri May 16 07:58:36 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 16 May 2025 07:58:36 GMT Subject: RFR: 8355488: Add stress mode for C2 loop peeling [v4] In-Reply-To: References: Message-ID: <7DWIMUXTrlQH2NCCxw-ScsMsux-6WSCPmBpz-OUBYSo=.c178db63-6bd9-4cf2-b8eb-d852d490679b@github.com> > Adding a `StressLoopPeeling` dev flag that randomize peeling. > > ## Semantics > > For now, the direction I've taken is to randomly take a decision in case of peeling, otherwise, rely on existing heuristics. > > This requires to distinguish two things: > - not inlining because it's not legal: see for instance > ```cpp > assert(cl->trip_count() > 0, "peeling a fully unrolled loop"); > ``` > in `PhaseIdealLoop::do_peeling` > - not inlining because it doesn't seem profitable. > > Peeling loops without a good reason (not containing an exiting `If` whose condition is not a member of the loop) but without a concrete way to forbid it should always be allowed. Let's stress it! > > Peeling too many times is not a great idea either. It uses a lot of memory, of nodes... Also, it may prevent other optimisations from kicking in. And what about interaction with future stress flags? Let's limit peeling: we give a fixed number of opportunities to peel before we give up on peeling for good. That is not the same as limiting the amount of peeling we do. Indeed, if we bound the number of times we say "yes, please, peel" given enough requests, we will always reach the bound. If we limit the number of requests, we have a more evenly distributed amount of peeling, between 0 and the bound. > > I've tried without the bound: I couldn't find any bug without the bound that would not reproduce with the bound. It only save some legitimate memory problems. Without a bound on the number of peeling opportunities, hotspot eats a lot of memory, but all the allocations seems reasonable: it just seems we ask too much. We could limit the number of nodes, to prevent peeling before we reach the memory limit, but that would also hinder other optimizations and (future) stress flags. > > > > ## The Flag > > The flag is very specialized, unlike a `StressLoopOpts` would be. My idea so far is "let's see". My idea is that it's good to be able to enable stress optimizations selectively, and have a flag like `StressLoopOpts` that would turn them all: we could use the general one in testing, and the finer-grain ones when debugging. A reason for that is that I don't see a real use-case for stressing some features but not others (which would make the number of combinations explode): having (for instance) `+StressLoopUnrolling +StressLoopPeeling` would sometimes behave like `+StressLoopUnrolling -StressLoopPeeling`, and so it's not very useful to test the latter. > > But once again: let's see what happens. > > > ## On the Code > > The field `_peel... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Address comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25140/files - new: https://git.openjdk.org/jdk/pull/25140/files/a2dd68c9..5bc32b93 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25140&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25140&range=02-03 Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25140.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25140/head:pull/25140 PR: https://git.openjdk.org/jdk/pull/25140 From mchevalier at openjdk.org Fri May 16 07:58:37 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 16 May 2025 07:58:37 GMT Subject: RFR: 8355488: Add stress mode for C2 loop peeling [v3] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 13:11:09 GMT, Christian Hagedorn wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Back to PRODUCT for consistency > > One suggestion, otherwise, it looks good to me! > > Worth mentioning, this stress flag already found a bug: > [JDK-8356084](https://bugs.openjdk.org/browse/JDK-8356084) Applied requested changes. I have a similar opinion as @chhagedorn on "attempts", but I don't see better and I don't think it's so bad, so fine with me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25140#issuecomment-2885944627 From mchevalier at openjdk.org Fri May 16 08:00:37 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 16 May 2025 08:00:37 GMT Subject: RFR: 8353638: C2: deoptimization and re-execution cycle with StringBuilder [v2] In-Reply-To: <0p1F5aXmj1Uhdz1FqRjjrzRpQt6akyez77gHr-cuqZE=.17c3fb5e-28f4-4240-819d-cf22e2053d8c@github.com> References: <0p1F5aXmj1Uhdz1FqRjjrzRpQt6akyez77gHr-cuqZE=.17c3fb5e-28f4-4240-819d-cf22e2053d8c@github.com> Message-ID: On Thu, 15 May 2025 13:23:12 GMT, Christian Hagedorn wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix spaces > > src/hotspot/share/runtime/globals.hpp line 655: > >> 653: product(bool, DeoptimizeOnAllocationException, false, DIAGNOSTIC, \ >> 654: "Deoptimize on exception during allocation instead of using the" \ >> 655: " compiled exception handlers") \ > > For consistency with other flag definitions: > Suggestion: > > product(bool, DeoptimizeOnAllocationException, false, DIAGNOSTIC, \ > "Deoptimize on exception during allocation instead of using the " \ > "compiled exception handlers") \ Indeed, it was a bit ugly. I think I have my spaces in row now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25149#discussion_r2092538365 From mchevalier at openjdk.org Fri May 16 08:00:37 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 16 May 2025 08:00:37 GMT Subject: RFR: 8353638: C2: deoptimization and re-execution cycle with StringBuilder [v2] In-Reply-To: References: Message-ID: <2MlUsHBQCkS4rtBcpTKW_1S_Gi45AU8R8XXF3PvD_Gc=.1185ceb2-e899-41b5-860a-f1d0b8a94ee7@github.com> > Unlike what was assumed at first, it is quite different from [JDK-8346989](https://bugs.openjdk.org/browse/JDK-8346989). The problem is actually unrelated to `StringBuilder`, but has to do with the underlying array allocation. > > Here, the problem is that the array allocation function, that is throwing when given a negative length, causes a deopt rather than using the compiled exception handlers. This is an old workaround, and the flag `StressCompiledExceptionHandlers` to rather use compiled handlers instead of deopting was added in [JDK-8004741](https://bugs.openjdk.org/browse/JDK-8004741) in 2012. This flag is used in testing since october 2022. > > So maybe it's time to use the compiled exception handlers! I propose to turn them on by default, and instead, add a diagnostic flag to deopt instead, in case something goes wrong. Doing so improve the performance to match the ones of C1 (both for direct array allocation, and `StringBuilder` construction). For instance, with the case given in the JBS issue: > > Stop at level 0 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,277s > user 0m4,214s > sys 0m0,117s > > Stop at level 1 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,104s > user 0m4,079s > sys 0m0,106s > > Stop at level 2 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,308s > user 0m4,239s > sys 0m0,145s > > Stop at level 3 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,304s > user 0m4,247s > sys 0m0,132s > > Default (Stop at level 4) > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,086s > user 0m4,059s > sys 0m0,122s > > > > I've run some tests (up to tier10), it seems all fine, ignoring the usual noise. I've checked with @dougxc, it shouldn't impact Graal as it doesn't use `OptoRuntime`. Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Fix spaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25149/files - new: https://git.openjdk.org/jdk/pull/25149/files/361a7727..3adc9d4e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25149&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25149&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25149.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25149/head:pull/25149 PR: https://git.openjdk.org/jdk/pull/25149 From epeter at openjdk.org Fri May 16 08:03:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 08:03:55 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v30] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: apply offline suggestions by Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/86524219..68f87cd2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=28-29 Stats: 14 lines in 1 file changed: 4 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Fri May 16 08:06:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 08:06:00 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Fri, 16 May 2025 07:44:53 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Replace control type with PhaseCFG::is_CFG test Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25066#pullrequestreview-2845835872 From rcastanedalo at openjdk.org Fri May 16 08:06:01 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 16 May 2025 08:06:01 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v3] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Thu, 15 May 2025 13:25:56 GMT, Emanuel Peter wrote: > Two small nits/questions, but otherwise ready from my side :) Thanks again for reviewing @eme64, I have addressed your questions now. And thanks also for your review @vnkozlov. @stefank @fisk @xmas92 @jsikstro may I get a review from the GC side? @RealFYang @TheRealMDoerr note that this PR also introduces implicit null check support for ZGC loads in RISC-V and PPC, but I cannot test it beyond GHA. May I ask you to test the changes on your respective platforms? (or let me know if you prefer to add the support in separate PRs). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2885960803 From epeter at openjdk.org Fri May 16 08:06:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 08:06:02 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v3] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Fri, 16 May 2025 07:49:01 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/share/opto/lcm.cpp line 80: >> >>> 78: >>> 79: void PhaseCFG::move_node_and_its_projections_to_block(Node* n, Block* b) { >>> 80: assert(n->bottom_type() != Type::CONTROL, "cannot move control node"); >> >> I usually check `n->is_CFG()`. >> >> What is the bottom type of an `IfNode`? >> `virtual const Type *bottom_type() const { return TypeTuple::IFBOTH; }` >> Are you aware of that? > > Note that the analysis operates at the Mach level, where `Node::is_CFG()` is not complete anymore and `If` nodes have been replaced by their platform-dependent implementations. I replaced the `n->bottom_type() != Type::CONTROL` test with `!PhaseCFG::is_CFG(n)`, which is analogous to `Node::is_CFG()` at the Mach level (and covers some additional nodes without control type that should not be moved anyway), see commit b92500a2. Ah, makes sense, did not know that ? Thanks for the update! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2092546157 From duke at openjdk.org Fri May 16 08:11:16 2025 From: duke at openjdk.org (Anjian-Wen) Date: Fri, 16 May 2025 08:11:16 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v10] In-Reply-To: References: Message-ID: > From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time > > on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below > > before the patch > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op > MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op > MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op > MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op > MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op > MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op > MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op > MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op > MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op > MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op > MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op > MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op > MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op > MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op > MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op > MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op > MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op > MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op > MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/op > MemorySegmentZeroUnsafe.panama false 63 avgt 30... Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: RISC-V: Intrinsify Unsafe::setMemory ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23890/files - new: https://git.openjdk.org/jdk/pull/23890/files/51654891..0b03bb2a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=08-09 Stats: 59712 lines in 1863 files changed: 38222 ins; 11277 del; 10213 mod Patch: https://git.openjdk.org/jdk/pull/23890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23890/head:pull/23890 PR: https://git.openjdk.org/jdk/pull/23890 From duke at openjdk.org Fri May 16 08:11:16 2025 From: duke at openjdk.org (Anjian-Wen) Date: Fri, 16 May 2025 08:11:16 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v9] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 02:50:53 GMT, Anjian-Wen wrote: >> From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time >> >> on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below >> >> before the patch >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op >> MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op >> MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op >> MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op >> MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op >> MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op >> MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op >> MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op >> MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op >> MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op >> MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op >> MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op >> MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op >> MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op >> MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op >> MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op >> MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op >> MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op >> MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/o... > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > fix bug and delete some useless code With the latest update, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below Benchmark (aligned) (size) Mode Cnt Score Error Units MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.387 ? 0.564 ns/op MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.690 ? 0.015 ns/op MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.680 ? 0.008 ns/op MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.051 ? 0.008 ns/op MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.751 ? 0.098 ns/op MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.805 ? 0.091 ns/op MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.328 ? 0.041 ns/op MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.688 ? 0.007 ns/op MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.602 ? 0.040 ns/op MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.590 ? 0.020 ns/op MemorySegmentZeroUnsafe.panama true 63 avgt 30 45.911 ? 0.765 ns/op MemorySegmentZeroUnsafe.panama true 64 avgt 30 47.624 ? 0.134 ns/op MemorySegmentZeroUnsafe.panama true 255 avgt 30 60.191 ? 0.077 ns/op MemorySegmentZeroUnsafe.panama true 256 avgt 30 60.438 ? 0.394 ns/op MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.857 ? 0.055 ns/op MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.600 ? 0.105 ns/op MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.678 ? 0.010 ns/op MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.189 ? 0.145 ns/op MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.873 ? 0.243 ns/op MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.764 ? 0.091 ns/op MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.374 ? 0.087 ns/op MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.705 ? 0.025 ns/op MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.821 ? 0.148 ns/op MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.608 ? 0.202 ns/op MemorySegmentZeroUnsafe.panama false 63 avgt 30 47.337 ? 0.250 ns/op MemorySegmentZeroUnsafe.panama false 64 avgt 30 48.475 ? 0.272 ns/op MemorySegmentZeroUnsafe.panama false 255 avgt 30 61.789 ? 0.344 ns/op MemorySegmentZeroUnsafe.panama false 256 avgt 30 63.718 ? 0.549 ns/op MemorySegmentZeroUnsafe.unsafe true 1 avgt 30 19.476 ? 0.059 ns/op MemorySegmentZeroUnsafe.unsafe true 2 avgt 30 21.928 ? 0.009 ns/op MemorySegmentZeroUnsafe.unsafe true 3 avgt 30 24.410 ? 0.513 ns/op MemorySegmentZeroUnsafe.unsafe true 4 avgt 30 26.510 ? 0.577 ns/op MemorySegmentZeroUnsafe.unsafe true 5 avgt 30 26.578 ? 0.211 ns/op MemorySegmentZeroUnsafe.unsafe true 6 avgt 30 27.618 ? 0.066 ns/op MemorySegmentZeroUnsafe.unsafe true 7 avgt 30 28.820 ? 0.009 ns/op MemorySegmentZeroUnsafe.unsafe true 8 avgt 30 33.219 ? 0.021 ns/op MemorySegmentZeroUnsafe.unsafe true 15 avgt 30 33.873 ? 0.077 ns/op MemorySegmentZeroUnsafe.unsafe true 16 avgt 30 33.325 ? 0.119 ns/op MemorySegmentZeroUnsafe.unsafe true 63 avgt 30 37.172 ? 0.721 ns/op MemorySegmentZeroUnsafe.unsafe true 64 avgt 30 38.247 ? 0.044 ns/op MemorySegmentZeroUnsafe.unsafe true 255 avgt 30 50.822 ? 0.174 ns/op MemorySegmentZeroUnsafe.unsafe true 256 avgt 30 50.696 ? 0.139 ns/op MemorySegmentZeroUnsafe.unsafe false 1 avgt 30 19.423 ? 0.008 ns/op MemorySegmentZeroUnsafe.unsafe false 2 avgt 30 22.138 ? 0.199 ns/op MemorySegmentZeroUnsafe.unsafe false 3 avgt 30 23.928 ? 0.140 ns/op MemorySegmentZeroUnsafe.unsafe false 4 avgt 30 26.939 ? 0.331 ns/op MemorySegmentZeroUnsafe.unsafe false 5 avgt 30 26.362 ? 0.066 ns/op MemorySegmentZeroUnsafe.unsafe false 6 avgt 30 27.635 ? 0.084 ns/op MemorySegmentZeroUnsafe.unsafe false 7 avgt 30 29.030 ? 0.202 ns/op MemorySegmentZeroUnsafe.unsafe false 8 avgt 30 31.390 ? 0.053 ns/op MemorySegmentZeroUnsafe.unsafe false 15 avgt 30 35.823 ? 0.133 ns/op MemorySegmentZeroUnsafe.unsafe false 16 avgt 30 37.005 ? 0.079 ns/op MemorySegmentZeroUnsafe.unsafe false 63 avgt 30 38.177 ? 0.334 ns/op MemorySegmentZeroUnsafe.unsafe false 64 avgt 30 39.484 ? 0.019 ns/op MemorySegmentZeroUnsafe.unsafe false 255 avgt 30 52.219 ? 0.176 ns/op MemorySegmentZeroUnsafe.unsafe false 256 avgt 30 53.325 ? 0.066 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2885970449 From mbaesken at openjdk.org Fri May 16 08:18:52 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 16 May 2025 08:18:52 GMT Subject: RFR: 8356778: Compiler add event logging in case of failures In-Reply-To: References: <2ADILKC05CmadEbbFmEJ9HIrkDEY0mfPc2XkumnuGMI=.3935341f-d8ef-4702-8b84-9aa4c7c36c2c@github.com> <84X-kth27PmvgofIVuekTKIgZMclBEdKMXyXPlcqr4E=.c2419cf7-b1cb-499c-b62d-0da512f01f50@github.com> Message-ID: On Thu, 15 May 2025 20:30:10 GMT, Lutz Schmidt wrote: >> I could live with the check, but we get other problems when it is nullptr because nullptr is interpreted as no bailout! >> `bool bailed_out() const { return _bailout_msg != nullptr; }` > > OK, with that knowledge and without the check, we crash when we try to log the null message. Otherwise, we crash or fail sometime later - or some strange things happen. > > If we follow the "fail early" principle, the additional check should not be there. We could as well convert the assert into a guarantee to enforce a "planned" crash. okay I will remove the additional check , I think I saw the assert and thought , let's better do a check here to be on the safe side. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25188#discussion_r2092568243 From roland at openjdk.org Fri May 16 08:33:33 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 16 May 2025 08:33:33 GMT Subject: RFR: 8350329: C2: Div looses dependency on condition that guarantees divisor not zero in counted loop after peeling Message-ID: This is an issue similar to 8349139: the type of the iv phi of a counted loop is narrowed down so a `Div` node doesn't need a control input. The loop is then peeled. The `Div` in the loop body is guaranteed to be non zero only if it is actually executed so the `Div` is implicitly dependent on the zero trip guard. Then the loop looses its backedge and the `Div` freely floats. The `Div` instruction is scheduled above the zero trip guard and faults. Had the `Div` been control dependent on the zero trip guard, it wouldn't have executed. The fix, similar to 8349139 is to add a `CastII` on peeling to make the dependency between what's in the loop body and relies on the narrowed down type of the iv phi and the zero trip guard explicit. ------------- Commit messages: - test - fix Changes: https://git.openjdk.org/jdk/pull/25262/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25262&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350329 Stats: 62 lines in 2 files changed: 60 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25262.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25262/head:pull/25262 PR: https://git.openjdk.org/jdk/pull/25262 From thartmann at openjdk.org Fri May 16 08:34:55 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 16 May 2025 08:34:55 GMT Subject: RFR: 8355488: Add stress mode for C2 loop peeling [v3] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 07:53:41 GMT, Marc Chevalier wrote: >> src/hotspot/share/opto/loopTransform.cpp line 522: >> >>> 520: loop_head->_peeling_opportunities_count++; >>> 521: // In case of stress, let's just pick randomly... >>> 522: return phase->C->random() % 2 == 0 ? estimate : 0; >> >> Suggestion: >> >> return ((phase->C->random() % 2) == 0) ? estimate : 0; > > Done. But does that really helps you? To me it's just more confusing. It looks like "I put parentheses because it will associate in a surprising way", and then the most normal things comes. I often wonder whether the parentheses are misplaced, or whether there is something subtle I don't get. > But anyway, parentheses are in! I think it just helps readability because you don't need to think about operator precedence. But it surely is personal preference. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25140#discussion_r2092590771 From thartmann at openjdk.org Fri May 16 08:34:54 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 16 May 2025 08:34:54 GMT Subject: RFR: 8355488: Add stress mode for C2 loop peeling [v4] In-Reply-To: <7DWIMUXTrlQH2NCCxw-ScsMsux-6WSCPmBpz-OUBYSo=.c178db63-6bd9-4cf2-b8eb-d852d490679b@github.com> References: <7DWIMUXTrlQH2NCCxw-ScsMsux-6WSCPmBpz-OUBYSo=.c178db63-6bd9-4cf2-b8eb-d852d490679b@github.com> Message-ID: <-yeys6cDJ-XQ2antyRX8B9BZcs87q2El7IOGMvriFbU=.743563f3-1707-4dc6-b2d2-09b13dfd99d1@github.com> On Fri, 16 May 2025 07:58:36 GMT, Marc Chevalier wrote: >> Adding a `StressLoopPeeling` dev flag that randomize peeling. >> >> ## Semantics >> >> For now, the direction I've taken is to randomly take a decision in case of peeling, otherwise, rely on existing heuristics. >> >> This requires to distinguish two things: >> - not inlining because it's not legal: see for instance >> ```cpp >> assert(cl->trip_count() > 0, "peeling a fully unrolled loop"); >> ``` >> in `PhaseIdealLoop::do_peeling` >> - not inlining because it doesn't seem profitable. >> >> Peeling loops without a good reason (not containing an exiting `If` whose condition is not a member of the loop) but without a concrete way to forbid it should always be allowed. Let's stress it! >> >> Peeling too many times is not a great idea either. It uses a lot of memory, of nodes... Also, it may prevent other optimisations from kicking in. And what about interaction with future stress flags? Let's limit peeling: we give a fixed number of opportunities to peel before we give up on peeling for good. That is not the same as limiting the amount of peeling we do. Indeed, if we bound the number of times we say "yes, please, peel" given enough requests, we will always reach the bound. If we limit the number of requests, we have a more evenly distributed amount of peeling, between 0 and the bound. >> >> I've tried without the bound: I couldn't find any bug without the bound that would not reproduce with the bound. It only save some legitimate memory problems. Without a bound on the number of peeling opportunities, hotspot eats a lot of memory, but all the allocations seems reasonable: it just seems we ask too much. We could limit the number of nodes, to prevent peeling before we reach the memory limit, but that would also hinder other optimizations and (future) stress flags. >> >> >> >> ## The Flag >> >> The flag is very specialized, unlike a `StressLoopOpts` would be. My idea so far is "let's see". My idea is that it's good to be able to enable stress optimizations selectively, and have a flag like `StressLoopOpts` that would turn them all: we could use the general one in testing, and the finer-grain ones when debugging. A reason for that is that I don't see a real use-case for stressing some features but not others (which would make the number of combinations explode): having (for instance) `+StressLoopUnrolling +StressLoopPeeling` would sometimes behave like `+StressLoopUnrolling -StressLoopPeeling`, and so it's not very useful to test the latter. >> >> But once again: let'... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Address comments Thanks for making these changes, looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25140#pullrequestreview-2845907126 From thartmann at openjdk.org Fri May 16 08:34:56 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 16 May 2025 08:34:56 GMT Subject: RFR: 8355488: Add stress mode for C2 loop peeling [v3] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 13:48:53 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/loopnode.hpp line 137: >> >>> 135: >>> 136: #ifndef PRODUCT >>> 137: uint _peeling_opportunities_count = 0; >> >> I think this should rather be named `_stress_peeling_attempts` or something. > > I thought about proposing "attempts" as well. But it sounds like we are actually trying loop peeling and then somehow fail which is not the case here - we just flipped a coin and then decided not to do peeling. Anyway, I don't have a strong opinion here :-) Right, it's not a perfect name either but let's not bikeshed too much - I'm fine either way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25140#discussion_r2092595184 From aboldtch at openjdk.org Fri May 16 08:40:56 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 16 May 2025 08:40:56 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Fri, 16 May 2025 07:44:53 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Replace control type with PhaseCFG::is_CFG test The GC changes looks good. Only took a cursory look of the ADLC and C2 changes, but nothing stands out. Only had a small comment about `legitimize_address_requires_lea`. src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 141: > 139: Address legitimize_address(const Address &a, int size, Register scratch) { > 140: if (a.getMode() == Address::base_plus_offset) { > 141: if (legitimize_address_requires_lea(a, size)) { It is a little strange that `legitimize_address_requires_lea` is only the second condition and not return a.getMode() == Address::base_plus_offset && !Address::offset_ok_for_immed(a.offset(), exact_log2(size)); And have the check in `legitimize_address` simply be `if (legitimize_address_requires_lea(a, size))` I guess we never end up calling `legitimize_address_requires_lea` with a literal address, where it would assert in `a.offset()`. But requiring the Address parameter of legitimize_address_requires_lea to be in a specific mode as a precondition seems weird to me. ------------- PR Review: https://git.openjdk.org/jdk/pull/25066#pullrequestreview-2845912788 PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2092596572 From chagedorn at openjdk.org Fri May 16 09:13:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 16 May 2025 09:13:06 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v30] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 08:03:55 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > apply offline suggestions by Christian Some first comments for `Template`, will continue with the other files :-) And thanks for bearing with us! I'm now also very happy about the naming and design. It was well worth to discuss the names and the design more in-depth. The result is very good now I think! :-) test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 229: > 227: > 228: /** > 229: * Renders the {@link Template} to {@link String}. Suggestion: * Renders the {@link Template} to a {@link String}. Could also be changed at the other `render()` methods. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 410: > 408: */ > 409: public String render(float fuel, A a, B b, C c) { > 410: return new TemplateToken.ThreeArgs(this, a, b, c).render(fuel); Suggestion: return new TemplateToken.ThreeArgs<>(this, a, b, c).render(fuel); test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 413: > 411: } > 412: } > 413: Suggestion: test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 612: > 610: > 611: /** > 612: * Default amount of fuel for Template rendering. It guides the nesting depth of Templates. Maybe add here how to change: Suggestion: * Default amount of fuel for Template rendering. It guides the nesting depth of Templates. Can be changed when * rendering a template with {@code render(fuel)} (e.g. {@link ZeroArgs#render(float)}). test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 618: > 616: /** > 617: * The default amount of fuel spent per Template. It is subtracted from the current {@link #fuel} at every > 618: * nesting level, and once the {@link #fuel} reaches zero, the nesting is supposed to terminate. Same here: Suggestion: * nesting level, and once the {@link #fuel} reaches zero, the nesting is supposed to terminate. Can be changed * with {@link #setFuelCost(float)} inside {@link #body(Object...)}. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 620: > 618: * nesting level, and once the {@link #fuel} reaches zero, the nesting is supposed to terminate. > 619: */ > 620: public final static float DEFAULT_FUEL_COST = 10.0f; Suggestion: float DEFAULT_FUEL = 100.0f; /** * The default amount of fuel spent per Template. It is subtracted from the current {@link #fuel} at every * nesting level, and once the {@link #fuel} reaches zero, the nesting is supposed to terminate. */ float DEFAULT_FUEL_COST = 10.0f; test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 631: > 629: * Example of a recursive Template, which checks the remaining {@link #fuel} at every level, > 630: * and terminates if it reaches zero. It also demonstrates the use of {@link TemplateBinding} for > 631: * the recursive use of Templates. We {@link Template.OneArgs#render} with {@code 30} total fuel, and spend {@code 5} fuel at each recursion level. Long line can be split: Suggestion: * the recursive use of Templates. We {@link Template.OneArgs#render} with {@code 30} total fuel, * and spend {@code 5} fuel at each recursion level. * *

test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 640: > 638: * System.out.println("Currently at depth #depth with fuel #fuel"); > 639: * """, > 640: * (fuel() > 0) ? binding.get().asToken(depth + 1) Missing `:`: Suggestion: * (fuel() > 0) ? binding.get().asToken(depth + 1) : test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 675: > 673: * @return The token that performs the defining action. > 674: */ > 675: static Token addName(Name name) { I have not fully grasped the concept of these `Name`s, yet. The `generateWithNames()` example in `TestTutorial` is quite big and complex. Could you add there a simpler example to better understand this idea of names? ------------- PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2845694798 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092554087 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092561787 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092562738 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092621482 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092625480 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092618666 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092637966 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092629499 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092658089 From chagedorn at openjdk.org Fri May 16 09:13:14 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 16 May 2025 09:13:14 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v29] In-Reply-To: <4PNaHeq3MI5kRKh2x8LgdF7KVQdzKJJzfu9IACI3jKM=.b9908567-e6d4-4e46-b999-bca774ac24c5@github.com> References: <4PNaHeq3MI5kRKh2x8LgdF7KVQdzKJJzfu9IACI3jKM=.b9908567-e6d4-4e46-b999-bca774ac24c5@github.com> Message-ID: On Fri, 16 May 2025 07:06:04 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Elaborate why there is only one Renderer > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 88: > >> 86: * To get an executable test, we define a Template that produces a class body with a main method. The Template >> 87: * takes a list of types, and calls the {@code testTemplate} defined above for each type and operator. We use >> 88: * the {@code TestFramework} to call our {@code @Test} methods. > > For the first mention, you can also add a link: > Suggestion: > > * the {@link TestFramework} to call our {@code @Test} methods. > > Also requires: > > import compiler.lib.ir_framework.TestFramework; Just a side note, maybe it's time to bite the bullet and finally rename `TestFramework` to `IRFramework` given that we have many frameworks now designed for tests... :shrug: ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092465262 From chagedorn at openjdk.org Fri May 16 09:13:13 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 16 May 2025 09:13:13 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v29] In-Reply-To: References: Message-ID: <4PNaHeq3MI5kRKh2x8LgdF7KVQdzKJJzfu9IACI3jKM=.b9908567-e6d4-4e46-b999-bca774ac24c5@github.com> On Fri, 16 May 2025 07:02:50 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Elaborate why there is only one Renderer test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 53: > 51: * > 52: *

> 53: * Once we rendered the source code to a {@link String}, we can compile it with the {@code CompileFramework}. You could also link the `CompileFramework`: Suggestion: * Once we rendered the source code to a {@link String}, we can compile it with the {@link CompileFramework}. This would require an additional: import compiler.lib.compile_framework.CompileFramework; test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 86: > 84: * > 85: *

> 86: * To get an executable test, we define a Template that produces a class body with a main method. The Template Suggestion: * To get an executable test, we define a {@link Template} that produces a class body with a main method. The Template test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 88: > 86: * To get an executable test, we define a Template that produces a class body with a main method. The Template > 87: * takes a list of types, and calls the {@code testTemplate} defined above for each type and operator. We use > 88: * the {@code TestFramework} to call our {@code @Test} methods. For the first mention, you can also add a link: Suggestion: * the {@link TestFramework} to call our {@code @Test} methods. Also requires: import compiler.lib.ir_framework.TestFramework; test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 111: > 109: * """, > 110: * // Call the testTemplate for each type and operator, generating a > 111: * // list of lists of (Template) Tokens: Since we eventually settled on exposing `TemplateToken`, I guess you can name it as such here: Suggestion: * // list of lists of TemplateToken: test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 124: > 122: *

> 123: * Finally, we generate the list of types, and pass it to the class template: > 124: * Suggestion: * *

test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 143: > 141: * Details: > 142: *

> 143: * A Template can have zero or more arguments. A template can be created with {@code make} methods like Suggestion: * A {@link Template} can have zero or more arguments. A template can be created with {@code make} methods like test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 145: > 143: * A Template can have zero or more arguments. A template can be created with {@code make} methods like > 144: * {@link Template#make(String, Function)}. For each number of arguments there is an implementation > 145: * (e.g. {@code Template.TwoArgs} for two arguments). This allows the use of Generics for the Suggestion: * (e.g. {@link Template.TwoArgs} for two arguments). This allows the use of Generics for the test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 146: > 144: * {@link Template#make(String, Function)}. For each number of arguments there is an implementation > 145: * (e.g. {@code Template.TwoArgs} for two arguments). This allows the use of Generics for the > 146: * Template argument types, i.e. the Template arguments can be type checked. Suggestion: * Template argument types which enables type checking of the Template arguments. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 151: > 149: * A {@link Template} can be rendered to a {@link String} (e.g. {@link Template.ZeroArgs#render()}). > 150: * Alternatively, we can generate a {@link Token} (e.g. {@link Template.ZeroArgs#asToken()}), > 151: * and use the {@link Token} inside another {@link Template#body}. Maybe we can mention here that it's actually a `TemplateToken`: Suggestion: * Alternatively, we can generate a {@link Token} (more specifically, a {@link TemplateToken}) with {@code asToken()} * (e.g. {@link Template.ZeroArgs#asToken()}), and use the {@link Token} inside another {@link Template#body}. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 168: > 166: * > 167: *

> 168: * A {@link TemplateToken} can not just be used in {@link Template#body}, but but it can also be Suggestion: * A {@link TemplateToken} can not just be used in {@link Template#body}, but it can also be test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 178: > 176: * with a certain amount of {@link #fuel}, which is decreased at each Template nesting by a certain amount > 177: * (can be changed with {@link #setFuelCost}). Recursive templates are supposed to terminate once the {@link #fuel} > 178: * is depleted (i.e. reaches zero). I suggest to also mention the default fuel and how to change that once you start rendering. Maybe something like this? Suggestion: * with a certain amount of {@link #fuel} (default: 100, see {@link #DEFAULT_FUEL}), which is decreased at each * Template nesting by a certain amount (default: 10, see {@link #DEFAULT_FUEL_COST}). The default fuel for a * template can be changed when we {@code render()} it (e.g. {@link ZeroArgs#render(float)}) and the default * fuel cost with {@link #setFuelCost}) when defining the {@link #body(Object...)}. Recursive templates are * supposed to terminate once the {@link #fuel} is depleted (i.e. reaches zero). ``` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092455409 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092459709 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092462635 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092470827 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092472111 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092474294 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092476166 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092477901 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092494798 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092498957 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092516132 From chagedorn at openjdk.org Fri May 16 09:19:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 16 May 2025 09:19:53 GMT Subject: RFR: 8355488: Add stress mode for C2 loop peeling [v4] In-Reply-To: <7DWIMUXTrlQH2NCCxw-ScsMsux-6WSCPmBpz-OUBYSo=.c178db63-6bd9-4cf2-b8eb-d852d490679b@github.com> References: <7DWIMUXTrlQH2NCCxw-ScsMsux-6WSCPmBpz-OUBYSo=.c178db63-6bd9-4cf2-b8eb-d852d490679b@github.com> Message-ID: On Fri, 16 May 2025 07:58:36 GMT, Marc Chevalier wrote: >> Adding a `StressLoopPeeling` dev flag that randomize peeling. >> >> ## Semantics >> >> For now, the direction I've taken is to randomly take a decision in case of peeling, otherwise, rely on existing heuristics. >> >> This requires to distinguish two things: >> - not inlining because it's not legal: see for instance >> ```cpp >> assert(cl->trip_count() > 0, "peeling a fully unrolled loop"); >> ``` >> in `PhaseIdealLoop::do_peeling` >> - not inlining because it doesn't seem profitable. >> >> Peeling loops without a good reason (not containing an exiting `If` whose condition is not a member of the loop) but without a concrete way to forbid it should always be allowed. Let's stress it! >> >> Peeling too many times is not a great idea either. It uses a lot of memory, of nodes... Also, it may prevent other optimisations from kicking in. And what about interaction with future stress flags? Let's limit peeling: we give a fixed number of opportunities to peel before we give up on peeling for good. That is not the same as limiting the amount of peeling we do. Indeed, if we bound the number of times we say "yes, please, peel" given enough requests, we will always reach the bound. If we limit the number of requests, we have a more evenly distributed amount of peeling, between 0 and the bound. >> >> I've tried without the bound: I couldn't find any bug without the bound that would not reproduce with the bound. It only save some legitimate memory problems. Without a bound on the number of peeling opportunities, hotspot eats a lot of memory, but all the allocations seems reasonable: it just seems we ask too much. We could limit the number of nodes, to prevent peeling before we reach the memory limit, but that would also hinder other optimizations and (future) stress flags. >> >> >> >> ## The Flag >> >> The flag is very specialized, unlike a `StressLoopOpts` would be. My idea so far is "let's see". My idea is that it's good to be able to enable stress optimizations selectively, and have a flag like `StressLoopOpts` that would turn them all: we could use the general one in testing, and the finer-grain ones when debugging. A reason for that is that I don't see a real use-case for stressing some features but not others (which would make the number of combinations explode): having (for instance) `+StressLoopUnrolling +StressLoopPeeling` would sometimes behave like `+StressLoopUnrolling -StressLoopPeeling`, and so it's not very useful to test the latter. >> >> But once again: let'... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Address comments Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25140#pullrequestreview-2846050287 From rcastanedalo at openjdk.org Fri May 16 09:30:57 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 16 May 2025 09:30:57 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v5] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Thu, 15 May 2025 12:37:26 GMT, Roland Westrelin wrote: > > Thanks for working on this, Roland! A "dumb" question: could the issue also be addressed by ensuring that dead allocations are removed earlier (e.g. in the call to `PhaseMacroExpand::eliminate_allocate_node` performed as part of escape analysis/scalar replacement, before loop optimizations)? It seems this would also prevent the miscompilations in `TestEliminationOfAllocationWithoutUse`, no? > > I don't thing that would work (but haven't tried). The problem is that the memory graph is broken around `Initialize` nodes from the time they are added to the IR (parse time) and that's only exposed once they are removed from the graph. But I don't think it matters when they are removed. I gave it a quick try (https://github.com/openjdk/jdk/commit/c28f81a7ef2a4f3d3cb761ea23a80c09276e7e58) and removing dead array allocations early (which are trivially detected as non-escaping but marked as non-scalar replaceable) is sufficient to fix all failures in `TestEliminationOfAllocationWithoutUse`. I have not tested the patch thoroughly and it might be that the dead test is too weak or I am missing something else, but this seems to me something worth exploring before committing to the solution proposed in this PR. > What you suggest does also sound quite conservative: it could very well be that an allocation looses all its uses after some rounds of optimization but in the scheme you suggest, that allocation wouldn't be optimized out. Note that I am not necessarily suggesting disabling "late" elimination of allocations at macro expansion. But it would be good, in light of the above findings, to find actual cases where the seemingly simpler alternative of removing dead allocations early is not sufficient for correctness, to motivate the more complex approach proposed in this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2886175661 From aph at openjdk.org Fri May 16 09:31:03 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 16 May 2025 09:31:03 GMT Subject: Integrated: 8354674: AArch64: Intrinsify Unsafe::setMemory In-Reply-To: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: On Fri, 9 May 2025 14:11:27 GMT, Andrew Haley wrote: > This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). > > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op > MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op > MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op > MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op This pull request has now been integrated. Changeset: a6ebcf61 Author: Andrew Haley URL: https://git.openjdk.org/jdk/commit/a6ebcf61eb522a1bcfc9f2169d42974af3883b00 Stats: 127 lines in 3 files changed: 122 ins; 0 del; 5 mod 8354674: AArch64: Intrinsify Unsafe::setMemory Reviewed-by: adinn ------------- PR: https://git.openjdk.org/jdk/pull/25147 From chagedorn at openjdk.org Fri May 16 09:32:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 16 May 2025 09:32:01 GMT Subject: RFR: 8350329: C2: Div looses dependency on condition that guarantees divisor not zero in counted loop after peeling In-Reply-To: References: Message-ID: <_4lQuZ3SJIv6N52RBHUYXQ0VRXwuZSzBbwUWcu7QvXM=.8da9e8bb-7e94-4cc3-8722-72bc2e670326@github.com> On Fri, 16 May 2025 08:28:20 GMT, Roland Westrelin wrote: > This is an issue similar to 8349139: the type of the iv phi of a > counted loop is narrowed down so a `Div` node doesn't need a control > input. The loop is then peeled. The `Div` in the loop body is > guaranteed to be non zero only if it is actually executed so the `Div` > is implicitly dependent on the zero trip guard. Then the loop looses > its backedge and the `Div` freely floats. The `Div` instruction is > scheduled above the zero trip guard and faults. Had the `Div` been > control dependent on the zero trip guard, it wouldn't have > executed. The fix, similar to 8349139 is to add a `CastII` on peeling > to make the dependency between what's in the loop body and relies on > the narrowed down type of the iv phi and the zero trip guard explicit. Looks good to me. test/hotspot/jtreg/compiler/controldependency/TestPeeledLoopNoBackedgeFloatingDiv.java line 29: > 27: * @summary C2: Div looses dependency on condition that guarantees divisor not zero in counted loop after peeling > 28: * @run main/othervm -XX:-TieredCompilation -XX:-BackgroundCompilation -XX:-UseLoopPredicate -XX:+StressGCM -XX:StressSeed=31780379 TestPeeledLoopNoBackedgeFloatingDiv > 29: * @run main/othervm -XX:-TieredCompilation -XX:-BackgroundCompilation -XX:-UseLoopPredicate -XX:+StressGCM TestPeeledLoopNoBackedgeFloatingDiv You should use `-XX:+UnlockDiagnosticVMOptions` for `StressGCM` and maybe also add `-XX:+IgnoreUnrecognizedVMOptions` when run without C2 since `UseLoopPredicate` is a C2 only flag. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25262#pullrequestreview-2846080792 PR Review Comment: https://git.openjdk.org/jdk/pull/25262#discussion_r2092696865 From mdoerr at openjdk.org Fri May 16 09:36:12 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 16 May 2025 09:36:12 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Fri, 16 May 2025 07:44:53 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Replace control type with PhaseCFG::is_CFG test Thanks for implementing it and thanks for the ping. It basically works on PPC64, but one IR rule is failing: Failed IR Rules (1) of Methods (1) ---------------------------------- 1) Method "static java.lang.Object compiler.gcbarriers.TestImplicitNullChecks.testLoadVolatile(compiler.gcbarriers.TestImplicitNullChecks$OuterWithVolatileField)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={FINAL_CODE}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#NULL_CHECK#_", "1"}, applyIfPlatformOr={}, applyIfPlatform={"aarch64", "false"}, failOn={}, applyIfOr={"UseZGC", "true", "UseG1GC", "true"}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "Final Code": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\d+(\s){2}(NullCheck.*)+(\s){2}===.*)" - Failed comparison: [found] 0 = 1 [given] - No nodes matched! This is probably because PPC64 uses a membar_volatile before volatile load, so the graph looks differently: 33 Prolog === [[ ]] [2380000000033] 9 MachProj === 10 [[ 8 ]] #0/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) R3 11 MachProj === 10 [[ 8 26 ]] #5 Oop:compiler/gcbarriers/TestImplicitNullChecks$OuterWithVolatileField * !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) 12 MachProj === 10 [[ 4 17 ]] #1/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) 13 MachProj === 10 [[ 4 21 ]] #2/unmatched Memory: @BotPTR *+bot, idx=Bot; !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) R1 14 MachProj === 10 [[ 4 2 17 ]] #3 !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) 15 MachProj === 10 [[ 4 17 ]] #4 !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) 0 Con === 10 [[ ]] #top 8 zeroCheckP_reg_imm0 === 9 11 [[ 7 22 ]] P=0.000001, C=-1.000000 !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) BB#002: 31 Region === 31 22 [[ 31 21 26 ]] 21 membar_volatile === 31 0 13 0 0 [[ 20 23 ]] !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) 20 MachProj === 21 [[ 19 ]] #0/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) 23 MachProj === 21 [[ 19 26 ]] #2/unmatched Memory: @BotPTR *+bot, idx=Bot; !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) R15 26 loadN_ac === 31 23 11 [[ 25 19 ]] #12/0x000000000000000c Volatile!narrowoop: java/lang/Object * 19 unnecessary_membar_acquire === 20 0 23 0 0 |26 0 [[ 18 24 ]] !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) 18 MachProj === 19 [[ 17 ]] #0/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) 24 MachProj === 19 [[ 17 ]] #2/unmatched Memory: @BotPTR *+bot, idx=Bot; !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) R3 25 decodeN_unscaled === _ 26 [[ 17 ]] java/lang/Object * Oop:java/lang/Object * !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) 34 Epilog === [[ ]] [2380000000034] 17 Ret === 18 12 24 14 15 25 [[ 1 ]] BB#003: 30 Region === 30 7 [[ 30 4 ]] R3 16 loadConI16 === 1 [[ 4 ]] #-10/0xfffffff6 6 ConP === 10 [[ 4 ]] #null 4 CallStaticJavaDirect === 30 12 13 14 15 16 0 6 [[ 5 3 32 ]] Static wrapper for: uncommon_trap(reason='null_check' action='maybe_recompile') # void ( int ) C=0.000100 TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) reexecute !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) 5 MachProj === 4 [[ ]] #10005/fat 3 MachProj === 4 [[ 2 ]] #0/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) R14 32 MachProj === 4 [[ ]] #6/fat 2 ShouldNotReachHere === 3 0 0 14 0 [[ 1 ]] !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) I guess it's not worth stepping over the memory barrier. Disabling this rule for PPC64 should be ok, too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2886188487 From epeter at openjdk.org Fri May 16 09:49:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 09:49:59 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v29] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 07:02:50 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Elaborate why there is only one Renderer test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 31: > 29: > 30: import java.util.List; > 31: Suggestion: import compiler.lib.compile_framework.CompileFramework; import compiler.lib.ir_framework.TestFramework; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092721174 From epeter at openjdk.org Fri May 16 09:50:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 09:50:00 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v29] In-Reply-To: References: <4PNaHeq3MI5kRKh2x8LgdF7KVQdzKJJzfu9IACI3jKM=.b9908567-e6d4-4e46-b999-bca774ac24c5@github.com> Message-ID: On Fri, 16 May 2025 07:08:02 GMT, Christian Hagedorn wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 88: >> >>> 86: * To get an executable test, we define a Template that produces a class body with a main method. The Template >>> 87: * takes a list of types, and calls the {@code testTemplate} defined above for each type and operator. We use >>> 88: * the {@code TestFramework} to call our {@code @Test} methods. >> >> For the first mention, you can also add a link: >> Suggestion: >> >> * the {@link TestFramework} to call our {@code @Test} methods. >> >> Also requires: >> >> import compiler.lib.ir_framework.TestFramework; > > Just a side note, maybe it's time to bite the bullet and finally rename `TestFramework` to `IRFramework` given that we have many frameworks now designed for tests... :shrug: I don't know if that is really necessary. It already does more than "IR checks": - It finds all test methods `@Test` - helps with `@Setup` etc. - And I hope one day we can put in automatic result verification as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092725980 From mchevalier at openjdk.org Fri May 16 09:56:56 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 16 May 2025 09:56:56 GMT Subject: RFR: 8353638: C2: deoptimization and re-execution cycle with StringBuilder [v2] In-Reply-To: <0p1F5aXmj1Uhdz1FqRjjrzRpQt6akyez77gHr-cuqZE=.17c3fb5e-28f4-4240-819d-cf22e2053d8c@github.com> References: <0p1F5aXmj1Uhdz1FqRjjrzRpQt6akyez77gHr-cuqZE=.17c3fb5e-28f4-4240-819d-cf22e2053d8c@github.com> Message-ID: On Thu, 15 May 2025 13:34:55 GMT, Christian Hagedorn wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix spaces > > Looks reasonable to me. I'm ready for new review (hopefully approval): since last review, I just fixed the spaces as @chhagedorn pointed out. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25149#issuecomment-2886242818 From epeter at openjdk.org Fri May 16 09:59:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 09:59:51 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v31] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - Apply suggestions from code review Co-authored-by: Christian Hagedorn - Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/68f87cd2..6b83ea39 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=29-30 Stats: 28 lines in 1 file changed: 8 ins; 1 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Fri May 16 10:03:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 10:03:03 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v30] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 08:46:48 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> apply offline suggestions by Christian > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 620: > >> 618: * nesting level, and once the {@link #fuel} reaches zero, the nesting is supposed to terminate. >> 619: */ >> 620: public final static float DEFAULT_FUEL_COST = 10.0f; > > Suggestion: > > float DEFAULT_FUEL = 100.0f; > > /** > * The default amount of fuel spent per Template. It is subtracted from the current {@link #fuel} at every > * nesting level, and once the {@link #fuel} reaches zero, the nesting is supposed to terminate. > */ > float DEFAULT_FUEL_COST = 10.0f; removed the `public final static` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092747498 From epeter at openjdk.org Fri May 16 10:10:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 10:10:44 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v32] In-Reply-To: References: Message-ID: <32OxhVRhwuY_Flt3Dmo-mcU5ruQIptcC2lBATGpQdZc=.ceeb5e58-b083-445d-a7dd-131380c75508@github.com> > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix up review suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/6b83ea39..f655f139 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=30-31 Stats: 7 lines in 1 file changed: 3 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Fri May 16 10:18:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 10:18:05 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v30] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 09:09:07 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> apply offline suggestions by Christian > > Some first comments for `Template`, will continue with the other files :-) > > And thanks for bearing with us! I'm also very happy about the naming and design we found together. It was well worth to discuss everything more in-depth. The result is very good now I think! :-) @chhagedorn I applied all your suggestions. Let's discuss offline about what additional tutorial / tests you would like to have for the `Name`s. > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 675: > >> 673: * @return The token that performs the defining action. >> 674: */ >> 675: static Token addName(Name name) { > > I have not fully grasped the concept of these `Name`s, yet. The `generateWithNames()` example in `TestTutorial` is quite big and complex. Could you add there a simpler example to better understand this idea of names? Let's discuss offline to see what would be good to have here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2886296861 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092770855 From mbaesken at openjdk.org Fri May 16 10:23:42 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 16 May 2025 10:23:42 GMT Subject: RFR: 8356778: Compiler add event logging in case of failures [v2] In-Reply-To: <2ADILKC05CmadEbbFmEJ9HIrkDEY0mfPc2XkumnuGMI=.3935341f-d8ef-4702-8b84-9aa4c7c36c2c@github.com> References: <2ADILKC05CmadEbbFmEJ9HIrkDEY0mfPc2XkumnuGMI=.3935341f-d8ef-4702-8b84-9aa4c7c36c2c@github.com> Message-ID: > We should add event logging to some related hotspot methods. While testing this functionality it turned out that sometimes the CompileTask pointer is 0, so this needs to be check to avoid crashes. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: Update c1_Compilation.cpp - remove null check in bailout ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25188/files - new: https://git.openjdk.org/jdk/pull/25188/files/4da9b834..baab8c34 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25188&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25188&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25188.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25188/head:pull/25188 PR: https://git.openjdk.org/jdk/pull/25188 From thartmann at openjdk.org Fri May 16 10:29:51 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 16 May 2025 10:29:51 GMT Subject: RFR: 8350329: C2: Div looses dependency on condition that guarantees divisor not zero in counted loop after peeling In-Reply-To: References: Message-ID: On Fri, 16 May 2025 08:28:20 GMT, Roland Westrelin wrote: > This is an issue similar to 8349139: the type of the iv phi of a > counted loop is narrowed down so a `Div` node doesn't need a control > input. The loop is then peeled. The `Div` in the loop body is > guaranteed to be non zero only if it is actually executed so the `Div` > is implicitly dependent on the zero trip guard. Then the loop looses > its backedge and the `Div` freely floats. The `Div` instruction is > scheduled above the zero trip guard and faults. Had the `Div` been > control dependent on the zero trip guard, it wouldn't have > executed. The fix, similar to 8349139 is to add a `CastII` on peeling > to make the dependency between what's in the loop body and relies on > the narrowed down type of the iv phi and the zero trip guard explicit. Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25262#pullrequestreview-2846220936 From thartmann at openjdk.org Fri May 16 10:48:02 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 16 May 2025 10:48:02 GMT Subject: RFR: 8351568: Improve source code documentation for PhaseCFG::insert_anti_dependences [v7] In-Reply-To: References: <3xXLZZOHl6oejisEzmNv206aQo4y6FuJoWhsOO_GWqM=.682d7701-baa2-4654-8216-e4de526456d1@github.com> <_jqax2exjj4DnvqP-lVK4kiwJ59C0XS6B8DE6quAHGc=.579945f7-36a6-49f8-9b04-c0fe63f60a5f@github.com> <4n5OGLVPn8sEuDgcJqZ5oKco3N_trnSxHNwyBawRQF4=.fe8ecf59-fb55-49c9-b8da-99efee63dde4@github.com> <21d6dUz886V6-TbWTNyt22KX6UaBNbsfZQz3hnQVjNA=.edb3bbd3-0dba-4e54-92cf-88e680cc5149@github.com> Message-ID: On Thu, 15 May 2025 12:38:26 GMT, Daniel Lund?n wrote: >> Ah right, I missed that. > > @robcasloz suggested that I simply add the new assert to the commit just before the fix which `test4` is a regression test for (much simpler than trying to revert the fix in mainline), and check if it triggers when the old assert triggers. When I manually disable loop strip mining verification (as Tobias suggested), the new assert triggers, as expected, whenever the old assert triggers. Resolving this thread now! Great, thanks for double checking! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24926#discussion_r2092812813 From thartmann at openjdk.org Fri May 16 10:52:52 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 16 May 2025 10:52:52 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function [v5] In-Reply-To: References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> Message-ID: On Fri, 16 May 2025 05:12:38 GMT, kuaiwei wrote: >> I wrote a test to check if every C2 IR node has correct size_of() function. And I found some of them are missed. They added new fields and not add size_of() to reflect new size. In linux, it does not cause issue so far, because gcc allocate more space for alignment and can keep these additional `bool` flags. But it will report failure on windows. And if anyone modified base class, it will cause problem. >> >> PS, My test is in https://github.com/openjdk/jdk/compare/master...kuaiwei:jdk:test/check_node_size , but it has many hack on IR nodes to make test to run. > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Minor change Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25081#pullrequestreview-2846270976 From thartmann at openjdk.org Fri May 16 10:55:52 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 16 May 2025 10:55:52 GMT Subject: RFR: 8353638: C2: deoptimization and re-execution cycle with StringBuilder [v2] In-Reply-To: <2MlUsHBQCkS4rtBcpTKW_1S_Gi45AU8R8XXF3PvD_Gc=.1185ceb2-e899-41b5-860a-f1d0b8a94ee7@github.com> References: <2MlUsHBQCkS4rtBcpTKW_1S_Gi45AU8R8XXF3PvD_Gc=.1185ceb2-e899-41b5-860a-f1d0b8a94ee7@github.com> Message-ID: On Fri, 16 May 2025 08:00:37 GMT, Marc Chevalier wrote: >> Unlike what was assumed at first, it is quite different from [JDK-8346989](https://bugs.openjdk.org/browse/JDK-8346989). The problem is actually unrelated to `StringBuilder`, but has to do with the underlying array allocation. >> >> Here, the problem is that the array allocation function, that is throwing when given a negative length, causes a deopt rather than using the compiled exception handlers. This is an old workaround, and the flag `StressCompiledExceptionHandlers` to rather use compiled handlers instead of deopting was added in [JDK-8004741](https://bugs.openjdk.org/browse/JDK-8004741) in 2012. This flag is used in testing since october 2022. >> >> So maybe it's time to use the compiled exception handlers! I propose to turn them on by default, and instead, add a diagnostic flag to deopt instead, in case something goes wrong. Doing so improve the performance to match the ones of C1 (both for direct array allocation, and `StringBuilder` construction). For instance, with the case given in the JBS issue: >> >> Stop at level 0 >> CompileCommand: compileonly C.test* bool compileonly = true >> >> real 0m4,277s >> user 0m4,214s >> sys 0m0,117s >> >> Stop at level 1 >> CompileCommand: compileonly C.test* bool compileonly = true >> >> real 0m4,104s >> user 0m4,079s >> sys 0m0,106s >> >> Stop at level 2 >> CompileCommand: compileonly C.test* bool compileonly = true >> >> real 0m4,308s >> user 0m4,239s >> sys 0m0,145s >> >> Stop at level 3 >> CompileCommand: compileonly C.test* bool compileonly = true >> >> real 0m4,304s >> user 0m4,247s >> sys 0m0,132s >> >> Default (Stop at level 4) >> CompileCommand: compileonly C.test* bool compileonly = true >> >> real 0m4,086s >> user 0m4,059s >> sys 0m0,122s >> >> >> >> I've run some tests (up to tier10), it seems all fine, ignoring the usual noise. I've checked with @dougxc, it shouldn't impact Graal as it doesn't use `OptoRuntime`. > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Fix spaces Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25149#pullrequestreview-2846276527 From chagedorn at openjdk.org Fri May 16 10:59:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 16 May 2025 10:59:53 GMT Subject: RFR: 8353638: C2: deoptimization and re-execution cycle with StringBuilder [v2] In-Reply-To: <2MlUsHBQCkS4rtBcpTKW_1S_Gi45AU8R8XXF3PvD_Gc=.1185ceb2-e899-41b5-860a-f1d0b8a94ee7@github.com> References: <2MlUsHBQCkS4rtBcpTKW_1S_Gi45AU8R8XXF3PvD_Gc=.1185ceb2-e899-41b5-860a-f1d0b8a94ee7@github.com> Message-ID: On Fri, 16 May 2025 08:00:37 GMT, Marc Chevalier wrote: >> Unlike what was assumed at first, it is quite different from [JDK-8346989](https://bugs.openjdk.org/browse/JDK-8346989). The problem is actually unrelated to `StringBuilder`, but has to do with the underlying array allocation. >> >> Here, the problem is that the array allocation function, that is throwing when given a negative length, causes a deopt rather than using the compiled exception handlers. This is an old workaround, and the flag `StressCompiledExceptionHandlers` to rather use compiled handlers instead of deopting was added in [JDK-8004741](https://bugs.openjdk.org/browse/JDK-8004741) in 2012. This flag is used in testing since october 2022. >> >> So maybe it's time to use the compiled exception handlers! I propose to turn them on by default, and instead, add a diagnostic flag to deopt instead, in case something goes wrong. Doing so improve the performance to match the ones of C1 (both for direct array allocation, and `StringBuilder` construction). For instance, with the case given in the JBS issue: >> >> Stop at level 0 >> CompileCommand: compileonly C.test* bool compileonly = true >> >> real 0m4,277s >> user 0m4,214s >> sys 0m0,117s >> >> Stop at level 1 >> CompileCommand: compileonly C.test* bool compileonly = true >> >> real 0m4,104s >> user 0m4,079s >> sys 0m0,106s >> >> Stop at level 2 >> CompileCommand: compileonly C.test* bool compileonly = true >> >> real 0m4,308s >> user 0m4,239s >> sys 0m0,145s >> >> Stop at level 3 >> CompileCommand: compileonly C.test* bool compileonly = true >> >> real 0m4,304s >> user 0m4,247s >> sys 0m0,132s >> >> Default (Stop at level 4) >> CompileCommand: compileonly C.test* bool compileonly = true >> >> real 0m4,086s >> user 0m4,059s >> sys 0m0,122s >> >> >> >> I've run some tests (up to tier10), it seems all fine, ignoring the usual noise. I've checked with @dougxc, it shouldn't impact Graal as it doesn't use `OptoRuntime`. > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Fix spaces Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25149#pullrequestreview-2846284713 From chagedorn at openjdk.org Fri May 16 11:05:00 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 16 May 2025 11:05:00 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v29] In-Reply-To: References: <4PNaHeq3MI5kRKh2x8LgdF7KVQdzKJJzfu9IACI3jKM=.b9908567-e6d4-4e46-b999-bca774ac24c5@github.com> Message-ID: On Fri, 16 May 2025 09:46:45 GMT, Emanuel Peter wrote: >> Just a side note, maybe it's time to bite the bullet and finally rename `TestFramework` to `IRFramework` given that we have many frameworks now designed for tests... :shrug: > > I don't know if that is really necessary. It already does more than "IR checks": > - It finds all test methods `@Test` > - helps with `@Setup` etc. > - And I hope one day we can put in automatic result verification as well. That's true, you can do more stuff than IR matching. That's the reason why we named it `TestFramework` back there but in everyday discussions, we always talk about the "IR Framework". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092835341 From mchevalier at openjdk.org Fri May 16 11:35:01 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 16 May 2025 11:35:01 GMT Subject: RFR: 8353638: C2: deoptimization and re-execution cycle with StringBuilder [v2] In-Reply-To: <2MlUsHBQCkS4rtBcpTKW_1S_Gi45AU8R8XXF3PvD_Gc=.1185ceb2-e899-41b5-860a-f1d0b8a94ee7@github.com> References: <2MlUsHBQCkS4rtBcpTKW_1S_Gi45AU8R8XXF3PvD_Gc=.1185ceb2-e899-41b5-860a-f1d0b8a94ee7@github.com> Message-ID: On Fri, 16 May 2025 08:00:37 GMT, Marc Chevalier wrote: >> Unlike what was assumed at first, it is quite different from [JDK-8346989](https://bugs.openjdk.org/browse/JDK-8346989). The problem is actually unrelated to `StringBuilder`, but has to do with the underlying array allocation. >> >> Here, the problem is that the array allocation function, that is throwing when given a negative length, causes a deopt rather than using the compiled exception handlers. This is an old workaround, and the flag `StressCompiledExceptionHandlers` to rather use compiled handlers instead of deopting was added in [JDK-8004741](https://bugs.openjdk.org/browse/JDK-8004741) in 2012. This flag is used in testing since october 2022. >> >> So maybe it's time to use the compiled exception handlers! I propose to turn them on by default, and instead, add a diagnostic flag to deopt instead, in case something goes wrong. Doing so improve the performance to match the ones of C1 (both for direct array allocation, and `StringBuilder` construction). For instance, with the case given in the JBS issue: >> >> Stop at level 0 >> CompileCommand: compileonly C.test* bool compileonly = true >> >> real 0m4,277s >> user 0m4,214s >> sys 0m0,117s >> >> Stop at level 1 >> CompileCommand: compileonly C.test* bool compileonly = true >> >> real 0m4,104s >> user 0m4,079s >> sys 0m0,106s >> >> Stop at level 2 >> CompileCommand: compileonly C.test* bool compileonly = true >> >> real 0m4,308s >> user 0m4,239s >> sys 0m0,145s >> >> Stop at level 3 >> CompileCommand: compileonly C.test* bool compileonly = true >> >> real 0m4,304s >> user 0m4,247s >> sys 0m0,132s >> >> Default (Stop at level 4) >> CompileCommand: compileonly C.test* bool compileonly = true >> >> real 0m4,086s >> user 0m4,059s >> sys 0m0,122s >> >> >> >> I've run some tests (up to tier10), it seems all fine, ignoring the usual noise. I've checked with @dougxc, it shouldn't impact Graal as it doesn't use `OptoRuntime`. > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Fix spaces And thanks again @TobiHartmann and @chhagedorn! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25149#issuecomment-2886459895 From mchevalier at openjdk.org Fri May 16 11:35:02 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 16 May 2025 11:35:02 GMT Subject: RFR: 8355488: Add stress mode for C2 loop peeling [v4] In-Reply-To: <7DWIMUXTrlQH2NCCxw-ScsMsux-6WSCPmBpz-OUBYSo=.c178db63-6bd9-4cf2-b8eb-d852d490679b@github.com> References: <7DWIMUXTrlQH2NCCxw-ScsMsux-6WSCPmBpz-OUBYSo=.c178db63-6bd9-4cf2-b8eb-d852d490679b@github.com> Message-ID: On Fri, 16 May 2025 07:58:36 GMT, Marc Chevalier wrote: >> Adding a `StressLoopPeeling` dev flag that randomize peeling. >> >> ## Semantics >> >> For now, the direction I've taken is to randomly take a decision in case of peeling, otherwise, rely on existing heuristics. >> >> This requires to distinguish two things: >> - not inlining because it's not legal: see for instance >> ```cpp >> assert(cl->trip_count() > 0, "peeling a fully unrolled loop"); >> ``` >> in `PhaseIdealLoop::do_peeling` >> - not inlining because it doesn't seem profitable. >> >> Peeling loops without a good reason (not containing an exiting `If` whose condition is not a member of the loop) but without a concrete way to forbid it should always be allowed. Let's stress it! >> >> Peeling too many times is not a great idea either. It uses a lot of memory, of nodes... Also, it may prevent other optimisations from kicking in. And what about interaction with future stress flags? Let's limit peeling: we give a fixed number of opportunities to peel before we give up on peeling for good. That is not the same as limiting the amount of peeling we do. Indeed, if we bound the number of times we say "yes, please, peel" given enough requests, we will always reach the bound. If we limit the number of requests, we have a more evenly distributed amount of peeling, between 0 and the bound. >> >> I've tried without the bound: I couldn't find any bug without the bound that would not reproduce with the bound. It only save some legitimate memory problems. Without a bound on the number of peeling opportunities, hotspot eats a lot of memory, but all the allocations seems reasonable: it just seems we ask too much. We could limit the number of nodes, to prevent peeling before we reach the memory limit, but that would also hinder other optimizations and (future) stress flags. >> >> >> >> ## The Flag >> >> The flag is very specialized, unlike a `StressLoopOpts` would be. My idea so far is "let's see". My idea is that it's good to be able to enable stress optimizations selectively, and have a flag like `StressLoopOpts` that would turn them all: we could use the general one in testing, and the finer-grain ones when debugging. A reason for that is that I don't see a real use-case for stressing some features but not others (which would make the number of combinations explode): having (for instance) `+StressLoopUnrolling +StressLoopPeeling` would sometimes behave like `+StressLoopUnrolling -StressLoopPeeling`, and so it's not very useful to test the latter. >> >> But once again: let'... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Address comments Thanks @chhagedorn and @TobiHartmann for reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25140#issuecomment-2886458303 From mchevalier at openjdk.org Fri May 16 11:35:02 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 16 May 2025 11:35:02 GMT Subject: Integrated: 8353638: C2: deoptimization and re-execution cycle with StringBuilder In-Reply-To: References: Message-ID: On Fri, 9 May 2025 14:57:54 GMT, Marc Chevalier wrote: > Unlike what was assumed at first, it is quite different from [JDK-8346989](https://bugs.openjdk.org/browse/JDK-8346989). The problem is actually unrelated to `StringBuilder`, but has to do with the underlying array allocation. > > Here, the problem is that the array allocation function, that is throwing when given a negative length, causes a deopt rather than using the compiled exception handlers. This is an old workaround, and the flag `StressCompiledExceptionHandlers` to rather use compiled handlers instead of deopting was added in [JDK-8004741](https://bugs.openjdk.org/browse/JDK-8004741) in 2012. This flag is used in testing since october 2022. > > So maybe it's time to use the compiled exception handlers! I propose to turn them on by default, and instead, add a diagnostic flag to deopt instead, in case something goes wrong. Doing so improve the performance to match the ones of C1 (both for direct array allocation, and `StringBuilder` construction). For instance, with the case given in the JBS issue: > > Stop at level 0 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,277s > user 0m4,214s > sys 0m0,117s > > Stop at level 1 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,104s > user 0m4,079s > sys 0m0,106s > > Stop at level 2 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,308s > user 0m4,239s > sys 0m0,145s > > Stop at level 3 > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,304s > user 0m4,247s > sys 0m0,132s > > Default (Stop at level 4) > CompileCommand: compileonly C.test* bool compileonly = true > > real 0m4,086s > user 0m4,059s > sys 0m0,122s > > > > I've run some tests (up to tier10), it seems all fine, ignoring the usual noise. I've checked with @dougxc, it shouldn't impact Graal as it doesn't use `OptoRuntime`. This pull request has now been integrated. Changeset: a0a30607 Author: Marc Chevalier URL: https://git.openjdk.org/jdk/commit/a0a3060709473c3ab433fa1485b723ca6c22b7cb Stats: 5 lines in 2 files changed: 4 ins; 0 del; 1 mod 8353638: C2: deoptimization and re-execution cycle with StringBuilder Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/25149 From mchevalier at openjdk.org Fri May 16 11:35:03 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 16 May 2025 11:35:03 GMT Subject: Integrated: 8355488: Add stress mode for C2 loop peeling In-Reply-To: References: Message-ID: <9bvpQQlHeUEuAwHQJ2HRKRz7e01bkWGyA-c0f19wf6w=.847f5bce-032d-4b46-8557-e06ae00a54f8@github.com> On Fri, 9 May 2025 11:31:27 GMT, Marc Chevalier wrote: > Adding a `StressLoopPeeling` dev flag that randomize peeling. > > ## Semantics > > For now, the direction I've taken is to randomly take a decision in case of peeling, otherwise, rely on existing heuristics. > > This requires to distinguish two things: > - not inlining because it's not legal: see for instance > ```cpp > assert(cl->trip_count() > 0, "peeling a fully unrolled loop"); > ``` > in `PhaseIdealLoop::do_peeling` > - not inlining because it doesn't seem profitable. > > Peeling loops without a good reason (not containing an exiting `If` whose condition is not a member of the loop) but without a concrete way to forbid it should always be allowed. Let's stress it! > > Peeling too many times is not a great idea either. It uses a lot of memory, of nodes... Also, it may prevent other optimisations from kicking in. And what about interaction with future stress flags? Let's limit peeling: we give a fixed number of opportunities to peel before we give up on peeling for good. That is not the same as limiting the amount of peeling we do. Indeed, if we bound the number of times we say "yes, please, peel" given enough requests, we will always reach the bound. If we limit the number of requests, we have a more evenly distributed amount of peeling, between 0 and the bound. > > I've tried without the bound: I couldn't find any bug without the bound that would not reproduce with the bound. It only save some legitimate memory problems. Without a bound on the number of peeling opportunities, hotspot eats a lot of memory, but all the allocations seems reasonable: it just seems we ask too much. We could limit the number of nodes, to prevent peeling before we reach the memory limit, but that would also hinder other optimizations and (future) stress flags. > > > > ## The Flag > > The flag is very specialized, unlike a `StressLoopOpts` would be. My idea so far is "let's see". My idea is that it's good to be able to enable stress optimizations selectively, and have a flag like `StressLoopOpts` that would turn them all: we could use the general one in testing, and the finer-grain ones when debugging. A reason for that is that I don't see a real use-case for stressing some features but not others (which would make the number of combinations explode): having (for instance) `+StressLoopUnrolling +StressLoopPeeling` would sometimes behave like `+StressLoopUnrolling -StressLoopPeeling`, and so it's not very useful to test the latter. > > But once again: let's see what happens. > > > ## On the Code > > The field `_peel... This pull request has now been integrated. Changeset: 0d867578 Author: Marc Chevalier URL: https://git.openjdk.org/jdk/commit/0d8675780f28d25ed538589480cc208b48fe7e93 Stats: 26 lines in 4 files changed: 24 ins; 0 del; 2 mod 8355488: Add stress mode for C2 loop peeling Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/25140 From chagedorn at openjdk.org Fri May 16 12:05:14 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 16 May 2025 12:05:14 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v32] In-Reply-To: <32OxhVRhwuY_Flt3Dmo-mcU5ruQIptcC2lBATGpQdZc=.ceeb5e58-b083-445d-a7dd-131380c75508@github.com> References: <32OxhVRhwuY_Flt3Dmo-mcU5ruQIptcC2lBATGpQdZc=.ceeb5e58-b083-445d-a7dd-131380c75508@github.com> Message-ID: On Fri, 16 May 2025 10:10:44 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix up review suggestions Some more comments, I will take this up again on Monday. Looking good so far :-) Great work! test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 36: > 34: * {@link Hook}s can be added to a frame, which allows code to be inserted at that location later. > 35: * When a {@link Hook} is {@link Hook#set}, this separates the Template into an outer and inner > 36: * {@link CodeFrame}, ensuring that names that are {@link Template#defineName}'d inside the inner frame `define` and `defineName` seem to no longer exist. test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 91: > 89: > 90: /** > 91: * Creates a special frame, which has a {@link parent} but uses the {@link NameSet} Suggestion: * Creates a base frame, which has no {@link #parent}. */ public static CodeFrame makeBase() { return new CodeFrame(null, false); } /** * Creates a normal frame, which has a {@link #parent} and which defines an inner * {@link NameSet}, for the names that are generated inside this frame. Once this * frame is exited, the name from inside this frame are not available anymore. */ public static CodeFrame make(CodeFrame parent) { return new CodeFrame(parent, false); } /** * Creates a special frame, which has a {@link #parent} but uses the {@link NameSet} test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 39: > 37: * > 38: * The {@link Renderer} instance keeps track of the current frames, > 39: * see {@link TemplateFrame} and {@link CodeFrame}. I suggest to mention that we render a tokenized template with this class. You can also use Javadocs `@see`: Suggestion: * The {@link Renderer} class renders a tokenized {@link Template} in the form of a {@link TemplateToken}. * It also keeps track of the states during a nested Template rendering. There can only be a single * {@link Renderer} active at any point, since there are static methods that reference * {@link Renderer#getCurrent}. * *

* The {@link Renderer} instance keeps track of the current frames. * * @see TemplateFrame * @see CodeFrame test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 47: > 45: /** > 46: * There can be at most one Renderer instance at any time. > 47: * Suggestion: * *

test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 56: > 54: * that the inner {@link Template} has access to the outer {@link Template}, but they would actually > 55: * be separated. This could lead to unexpected behavior or even bugs. > 56: * Suggestion: * *

test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 61: > 59: * This way, the inner and outer {@link Template}s get rendered together, and the inner {@link Template} > 60: * has access to the {@link Name}s and {@link Hook}s of the outer {@link Template}. > 61: * Suggestion: * *

test/hotspot/jtreg/compiler/lib/template_framework/Renderer.java line 71: > 69: private TemplateFrame baseTemplateFrame; > 70: private TemplateFrame currentTemplateFrame; > 71: private CodeFrame baseCodeFrame; Can be made final: Suggestion: private final TemplateFrame baseTemplateFrame; private TemplateFrame currentTemplateFrame; private final CodeFrame baseCodeFrame; test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 31: > 29: /** > 30: * The {@link TemplateFrame} is the frame for a {@link Template}, i.e. the corresponding > 31: * {@link TemplateToken}. It ensures that each template use has its own unique {@link id} Suggestion: * {@link TemplateToken}. It ensures that each template use has its own unique {@link #id} test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 35: > 33: * replacements, which combine the key-value pairs from the template argument and the > 34: * {@link Template#let} definitions. The {@link parent} relationship provides a trace > 35: * for the use chain of templates. The {@link fuel} is reduced over this chain, to give Suggestion: * {@link Template#let} definitions. The {@link #parent} relationship provides a trace * for the use chain of templates. The {@link #fuel} is reduced over this chain, to give test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 47: > 45: final Map hashtagReplacements = new HashMap<>(); > 46: final float fuel; > 47: float fuelCost; Some can be made `private`. Maybe you want to double check if you don't want to make them all private and just offer accessor methods for those that are used by the `Renderer` instead: Suggestion: final TemplateFrame parent; private final int id; private final Map hashtagReplacements = new HashMap<>(); final float fuel; private float fuelCost; test/hotspot/jtreg/compiler/lib/template_framework/TemplateFrame.java line 73: > 71: return; > 72: } > 73: throw new RendererException("Duplicate hashtag replacement for #" + key); Could be simplified to: Suggestion: if (hashtagReplacements.putIfAbsent(key, value) != null) { throw new RendererException("Duplicate hashtag replacement for #" + key); } test/hotspot/jtreg/compiler/lib/template_framework/TemplateToken.java line 28: > 26: /** > 27: * Represents a Template with filled arguments, ready for instantiation, either > 28: * as a {@link Token} inside another {@link Template} or with {@link #render}. Now that we renamed `fill()` to `asToken()`, how about: Suggestion: * Represents a tokenized {@link Template} (after calling {@code asToken()}) ready for * instantiation either as a {@link Token} inside another {@link Template} or as * a {@link String} with {@link #render}. test/hotspot/jtreg/compiler/lib/template_framework/TemplateToken.java line 41: > 39: * Represents a zero-argument {@link TemplateToken}, already filled with arguments, ready for > 40: * instantiation either as a {@link Token} inside another {@link Template} or > 41: * with {@link #render}. Following the above suggestion, how about: Suggestion: * Represents a tokenized zero-argument {@link Template} ready for instantiation * either as a {@link Token} inside another {@link Template} or as a {@link String} * with {@link #render}. test/hotspot/jtreg/compiler/lib/template_framework/TemplateToken.java line 61: > 59: /** > 60: * Represents a one-argument {@link TemplateToken}, already filled with arguments, ready for instantiation > 61: * either as a {@link Token} inside another {@link Template} or with {@link #render}. Similarly: Suggestion: * Represents a tokenized one-argument {@link Template}, already filled with arguments, ready for * instantiation either as a {@link Token} inside another {@link Template} or as a {@link String} * with {@link #render}. test/hotspot/jtreg/compiler/lib/template_framework/TemplateToken.java line 87: > 85: /** > 86: * Represents a two-argument {@link TemplateToken}, already filled with arguments, ready for instantiation > 87: * either as a {@link Token} inside another {@link Template} or with {@link #render}. Suggestion: * Represents a tokenized two-argument {@link Template}, already filled with arguments, ready for * instantiation either as a {@link Token} inside another {@link Template} or as a {@link String} * with {@link #render}. test/hotspot/jtreg/compiler/lib/template_framework/TemplateToken.java line 117: > 115: /** > 116: * Represents a three-argument {@link TemplateToken}, already filled with arguments, ready for instantiation > 117: * either as a {@link Token} inside another {@link Template} or with {@link #render}. Suggestion: * Represents a tokenized three-argument {@link TemplateToken}, already filled with arguments, ready for * instantiation either as a {@link Token} inside another {@link Template} or as a {@link String} * with {@link #render}. test/hotspot/jtreg/compiler/lib/template_framework/Token.java line 33: > 31: * The {@link Template#body} and {@link Hook#set} are given a list of tokens, which are either > 32: * {@link Token}s or {@link String}s or some permitted boxed primitives. These are then parsed > 33: * and all non {@link Token}s are converted to {@link StringToken}s. The parsing also flattens Suggestion: * and all non-{@link Token}s are converted to {@link StringToken}s. The parsing also flattens test/hotspot/jtreg/compiler/lib/template_framework/Token.java line 51: > 49: throw new IllegalArgumentException("Unexpected tokens: null"); > 50: } > 51: List outputList = new ArrayList(); Can be simplified: Suggestion: List outputList = new ArrayList<>(); ------------- PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2846308913 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092913603 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092914802 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092871329 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092871616 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092871775 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092871953 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092874930 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092876424 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092877304 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092881685 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092898464 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092854996 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092857864 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092859476 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092861819 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092862742 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092844398 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092845249 From epeter at openjdk.org Fri May 16 12:08:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 12:08:19 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v33] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - Name test with subtyping - more offline suggestions by Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/f655f139..3637038d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=31-32 Stats: 95 lines in 2 files changed: 88 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Fri May 16 12:22:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 12:22:22 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v34] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - Update test/hotspot/jtreg/compiler/lib/template_framework/Token.java Co-authored-by: Christian Hagedorn - Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/3637038d..c9c484cd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=33 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=32-33 Stats: 43 lines in 5 files changed: 11 ins; 2 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Fri May 16 12:33:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 12:33:49 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v35] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix to addName ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/c9c484cd..398a7c1b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=34 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=33-34 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Fri May 16 12:33:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 12:33:50 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v32] In-Reply-To: References: <32OxhVRhwuY_Flt3Dmo-mcU5ruQIptcC2lBATGpQdZc=.ceeb5e58-b083-445d-a7dd-131380c75508@github.com> Message-ID: <4LFHXEAtxic6sA6szZuKXdn6HouBKLtH2iUA2h3hzpw=.a024cdc0-0166-463a-9352-c3e38424b617@github.com> On Fri, 16 May 2025 12:02:29 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix up review suggestions > > Some more comments, I will take this up again on Monday. Looking good so far :-) Great work! @chhagedorn Thanks for all the suggestions, we are making good progress! I have just added a test for subtype use of Names, and will improve the tutorial with some more cases now. > test/hotspot/jtreg/compiler/lib/template_framework/CodeFrame.java line 36: > >> 34: * {@link Hook}s can be added to a frame, which allows code to be inserted at that location later. >> 35: * When a {@link Hook} is {@link Hook#set}, this separates the Template into an outer and inner >> 36: * {@link CodeFrame}, ensuring that names that are {@link Template#defineName}'d inside the inner frame > > `define` and `defineName` seem to no longer exist. Ah, refactoring artifact. Should be `addName`. Fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2886588658 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2092952214 From epeter at openjdk.org Fri May 16 12:39:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 12:39:44 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v36] In-Reply-To: References: Message-ID: <82W4hgQqSURcidwPYRzq32fy7bNRmRLmAoJe9oIVgJI=.c970a963-e96d-47b0-b715-907e10557428@github.com> > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix whitespaces from applied suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/398a7c1b..3a7493cf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=35 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=34-35 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From roland at openjdk.org Fri May 16 12:45:16 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 16 May 2025 12:45:16 GMT Subject: RFR: 8350329: C2: Div looses dependency on condition that guarantees divisor not zero in counted loop after peeling [v2] In-Reply-To: References: Message-ID: > This is an issue similar to 8349139: the type of the iv phi of a > counted loop is narrowed down so a `Div` node doesn't need a control > input. The loop is then peeled. The `Div` in the loop body is > guaranteed to be non zero only if it is actually executed so the `Div` > is implicitly dependent on the zero trip guard. Then the loop looses > its backedge and the `Div` freely floats. The `Div` instruction is > scheduled above the zero trip guard and faults. Had the `Div` been > control dependent on the zero trip guard, it wouldn't have > executed. The fix, similar to 8349139 is to add a `CastII` on peeling > to make the dependency between what's in the loop body and relies on > the narrowed down type of the iv phi and the zero trip guard explicit. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25262/files - new: https://git.openjdk.org/jdk/pull/25262/files/eb3a13b3..f5069b5e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25262&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25262&range=00-01 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25262.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25262/head:pull/25262 PR: https://git.openjdk.org/jdk/pull/25262 From roland at openjdk.org Fri May 16 12:45:17 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 16 May 2025 12:45:17 GMT Subject: RFR: 8350329: C2: Div looses dependency on condition that guarantees divisor not zero in counted loop after peeling [v2] In-Reply-To: <_4lQuZ3SJIv6N52RBHUYXQ0VRXwuZSzBbwUWcu7QvXM=.8da9e8bb-7e94-4cc3-8722-72bc2e670326@github.com> References: <_4lQuZ3SJIv6N52RBHUYXQ0VRXwuZSzBbwUWcu7QvXM=.8da9e8bb-7e94-4cc3-8722-72bc2e670326@github.com> Message-ID: On Fri, 16 May 2025 09:28:51 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > test/hotspot/jtreg/compiler/controldependency/TestPeeledLoopNoBackedgeFloatingDiv.java line 29: > >> 27: * @summary C2: Div looses dependency on condition that guarantees divisor not zero in counted loop after peeling >> 28: * @run main/othervm -XX:-TieredCompilation -XX:-BackgroundCompilation -XX:-UseLoopPredicate -XX:+StressGCM -XX:StressSeed=31780379 TestPeeledLoopNoBackedgeFloatingDiv >> 29: * @run main/othervm -XX:-TieredCompilation -XX:-BackgroundCompilation -XX:-UseLoopPredicate -XX:+StressGCM TestPeeledLoopNoBackedgeFloatingDiv > > You should use `-XX:+UnlockDiagnosticVMOptions` for `StressGCM` and maybe also add `-XX:+IgnoreUnrecognizedVMOptions` when run without C2 since `UseLoopPredicate` is a C2 only flag. Thanks for reviewing this. Done in the new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25262#discussion_r2092973861 From aph at openjdk.org Fri May 16 13:00:57 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 16 May 2025 13:00:57 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Fri, 16 May 2025 07:44:53 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: > > Replace control type with PhaseCFG::is_CFG test src/hotspot/cpu/aarch64/gc/z/z_aarch64.ad line 131: > 129: !MacroAssembler::legitimize_address_requires_lea(ref_addr, size), > 130: "an instruction that can be used for implicit null checking should emit the candidate memory access first"); > 131: ref_addr = __ legitimize_address(ref_addr, size, rscratch2); I just saw this. I think it might be simpler and better to handle this case in the segfault handler. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2092999625 From roland at openjdk.org Fri May 16 13:09:03 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 16 May 2025 13:09:03 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v5] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Fri, 16 May 2025 09:28:15 GMT, Roberto Casta?eda Lozano wrote: > Note that I am not necessarily suggesting disabling "late" elimination of allocations at macro expansion. But it would be good, in light of the above findings, to find actual cases where the seemingly simpler alternative of removing dead allocations early is not sufficient for correctness, to motivate the more complex approach proposed in this PR. What happens with this bug is that a Phi created sometime after parsing inherits the type of the projection out of the Initialize which is wrong. No issue happens until the allocation is removed though. Only having allocations be removed early one shortens the window where bad things (a new Phi) can happen. But bad things could still happen. After all, we do some loop opts to help EA so maybe a similar issue could happen there. Or maybe, down the road, someone will change the way we do loop opts during EA because it helps EA and the bug will be back but we don't necessarily notice it until it happens at a user's site. Beyond that, you're suggesting restricting elimination of allocation. What if, down the road, someone notices that it gets in the way of some other optimization? Then that someone we'll have to reconstruct the history here. There's a history in c2 of fixing issues that are complicated by working around them. Often what happens is that we, later on, realize that the first work around wasn't sufficient and try to pile on more arounds. I also already ran into situations where everything to perform the needed optimization is there but it's disabled for some reason in a particular case and it's unclear why. I'm in favor of fixing issues once and for all by targeting their root cause and trying to not sacrifice performance whenever possible. Truth is we have a limited view in what people are running though a set of microbenchmarks but we can't be sure something has no performance impact only because that set of microbenchmarks doesn't regress. Beyond that, what can appear as a safer workaround today could be more trouble down the line and in the end cause more confusion and work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2886673287 From thartmann at openjdk.org Fri May 16 13:19:55 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 16 May 2025 13:19:55 GMT Subject: RFR: 8350329: C2: Div looses dependency on condition that guarantees divisor not zero in counted loop after peeling [v2] In-Reply-To: References: Message-ID: <-n4NMccK5Dn1ud60aMQ7VCzOz_nXB76BECvqXojFXVw=.7d6bbfa3-b9fe-4779-8974-61267ef343c7@github.com> On Fri, 16 May 2025 12:45:16 GMT, Roland Westrelin wrote: >> This is an issue similar to 8349139: the type of the iv phi of a >> counted loop is narrowed down so a `Div` node doesn't need a control >> input. The loop is then peeled. The `Div` in the loop body is >> guaranteed to be non zero only if it is actually executed so the `Div` >> is implicitly dependent on the zero trip guard. Then the loop looses >> its backedge and the `Div` freely floats. The `Div` instruction is >> scheduled above the zero trip guard and faults. Had the `Div` been >> control dependent on the zero trip guard, it wouldn't have >> executed. The fix, similar to 8349139 is to add a `CastII` on peeling >> to make the dependency between what's in the loop body and relies on >> the narrowed down type of the iv phi and the zero trip guard explicit. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25262#pullrequestreview-2846609906 From epeter at openjdk.org Fri May 16 13:46:17 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 13:46:17 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v37] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: improve tutorial for Names part1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/3a7493cf..cc60064b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=36 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=35-36 Stats: 149 lines in 1 file changed: 127 ins; 9 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From roland at openjdk.org Fri May 16 14:21:47 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 16 May 2025 14:21:47 GMT Subject: RFR: 8355230: Crash in fuzzer tests: assert(n != nullptr) failed: must not be null Message-ID: During IGVN, `TypeNode::make_paths_from_here_dead()` follows data nodes until a `Phi`. The `Region` input for the input that that logic goes through to reach the `Phi` is `null` causing the crash. I propose simply adding an extra check for that corner case. ------------- Commit messages: - test - fix Changes: https://git.openjdk.org/jdk/pull/25268/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25268&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355230 Stats: 104 lines in 2 files changed: 103 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25268.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25268/head:pull/25268 PR: https://git.openjdk.org/jdk/pull/25268 From epeter at openjdk.org Fri May 16 14:36:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 14:36:43 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v38] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: improve Name tutorial part2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/cc60064b..df3ff718 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=37 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=36-37 Stats: 92 lines in 1 file changed: 90 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From rcastanedalo at openjdk.org Fri May 16 15:02:06 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 16 May 2025 15:02:06 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Fri, 16 May 2025 12:57:54 GMT, Andrew Haley wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Replace control type with PhaseCFG::is_CFG test > > src/hotspot/cpu/aarch64/gc/z/z_aarch64.ad line 131: > >> 129: !MacroAssembler::legitimize_address_requires_lea(ref_addr, size), >> 130: "an instruction that can be used for implicit null checking should emit the candidate memory access first"); >> 131: ref_addr = __ legitimize_address(ref_addr, size, rscratch2); > > I just saw this. I think it might be simpler and better to handle this case in the segfault handler. OK. C2 does not currently support creating exception table entries with arbitrary offsets relative to the start address of the code emitted for a Mach node, so that support would have to be added. I prototyped this support [here](https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-implicit-null-checks), see calls to `record_exception_pc_offset()`. I don't think it is, overall, simpler than the approach proposed in this PR - definitely not from a `PhaseOutput`/`C2_MacroAssembler` perspective. But if you still think it is worth exploring, I will create a new prototype with the `record_exception_pc_offset()` on top of this PR to make it easier to compare. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2093219601 From asmehra at openjdk.org Fri May 16 15:08:02 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Fri, 16 May 2025 15:08:02 GMT Subject: RFR: 8357084: Zero build fails after JDK-8354887 Message-ID: This fixes compile failure in zero variant on aarch64. Verified by compiling on zero variant on linux-aarch64. ------------- Commit messages: - 8357084: Zero build fails after JDK-8354887 Changes: https://git.openjdk.org/jdk/pull/25269/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25269&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357084 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25269.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25269/head:pull/25269 PR: https://git.openjdk.org/jdk/pull/25269 From ccheung at openjdk.org Fri May 16 15:15:50 2025 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 16 May 2025 15:15:50 GMT Subject: RFR: 8357084: Zero build fails after JDK-8354887 In-Reply-To: References: Message-ID: <9qfETrrBrS9gwtJpi5YAv5wi3tcQqSN-k-ZQR9NmBJo=.1b7489be-7c22-4e93-aa8f-8e2c86f6bd73@github.com> On Fri, 16 May 2025 15:03:32 GMT, Ashutosh Mehra wrote: > This fixes compile failure in zero variant on aarch64. Verified by compiling on zero variant on linux-aarch64. LGTM ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25269#pullrequestreview-2846941134 From asmehra at openjdk.org Fri May 16 15:15:51 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Fri, 16 May 2025 15:15:51 GMT Subject: RFR: 8357084: Zero build fails after JDK-8354887 In-Reply-To: References: Message-ID: On Fri, 16 May 2025 15:03:32 GMT, Ashutosh Mehra wrote: > This fixes compile failure in zero variant on aarch64. Verified by compiling on zero variant on linux-aarch64. @vnkozlov a trivial fix. can you please review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25269#issuecomment-2887009141 From epeter at openjdk.org Fri May 16 15:31:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 16 May 2025 15:31:42 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v39] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: tutorial with mutable and subtyping ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/df3ff718..fcbd76a0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=38 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=37-38 Stats: 135 lines in 2 files changed: 116 ins; 0 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From asmehra at openjdk.org Fri May 16 15:37:51 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Fri, 16 May 2025 15:37:51 GMT Subject: RFR: 8357084: Zero build fails after JDK-8354887 In-Reply-To: <9qfETrrBrS9gwtJpi5YAv5wi3tcQqSN-k-ZQR9NmBJo=.1b7489be-7c22-4e93-aa8f-8e2c86f6bd73@github.com> References: <9qfETrrBrS9gwtJpi5YAv5wi3tcQqSN-k-ZQR9NmBJo=.1b7489be-7c22-4e93-aa8f-8e2c86f6bd73@github.com> Message-ID: <9Ap11IgRjEV-lXWC2kkWOdHKFAZOOxKwukhD_VUiT6k=.f87be6ef-8218-4364-99a6-acf7e6de3c59@github.com> On Fri, 16 May 2025 15:13:38 GMT, Calvin Cheung wrote: >> This fixes compile failure in zero variant on aarch64. Verified by compiling on zero variant on linux-aarch64. > > LGTM @calvinccheung thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25269#issuecomment-2887067342 From kvn at openjdk.org Fri May 16 15:42:57 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 16 May 2025 15:42:57 GMT Subject: RFR: 8357084: Zero build fails after JDK-8354887 In-Reply-To: References: Message-ID: On Fri, 16 May 2025 15:03:32 GMT, Ashutosh Mehra wrote: > This fixes compile failure in zero variant on aarch64. Verified by compiling on zero variant on linux-aarch64. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25269#pullrequestreview-2847007077 From jbhateja at openjdk.org Fri May 16 16:09:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 16 May 2025 16:09:57 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v23] In-Reply-To: <1e-92EcDWshsTiFbEmJt8z5SAVfhf5vpr8sgbEq3BbQ=.25d6d5f7-48d3-4a13-ac7d-8844844490fa@github.com> References: <6aZaHfVvUJFLz83fyZ42bnoSGseaRBYd0jEg_VLdS2Q=.4c681def-ee7c-4fcd-b147-348d317ac58f@github.com> <1e-92EcDWshsTiFbEmJt8z5SAVfhf5vpr8sgbEq3BbQ=.25d6d5f7-48d3-4a13-ac7d-8844844490fa@github.com> Message-ID: On Fri, 16 May 2025 01:23:48 GMT, Srinivas Vamsi Parasa wrote: >> We are only handling first variant of NDD instruction[1] in python test script , please extend the script to cover second variant[2] also. >> eaddq(Register dst, Register src1, Address src2, bool no_flags) - [1] >> eaddq(Register dst, Address src1, Register src2, bool no_flags) - [2] > >> We are only handling first variant of NDD instruction[1] in python test script , please extend the script to cover second variant[2] also. eaddq(Register dst, Register src1, Address src2, bool no_flags) - [1] eaddq(Register dst, Address src1, Register src2, bool no_flags) - [2] > > Hank's script is already handling the variant[2] in the `RegMemRegNddInstruction` class, for which no demotion is enabled. The demotion is enabled only for variant[1]. > > Hi @vamsi-parasa , I don't see demotion tests being generated with full mode gtest, i.e. python3 x86-asmtest.py --full > > Please see the updated `x86-asmtest.py` refactored to work with full set (`--full`). Please let me know if anything is missing. Hi @vamsi-parasa , I am seeing some failures with --full mode when ENABLE_DEMOTION=False /home/jatinbha/sandboxes/apx-release/jdk/test/hotspot/gtest/x86/test_assembler_x86.cpp:61: Failure Failed __ ecmovq (Assembler::Condition::greater, r31, r31, Address(rcx, rdx, (Address::ScaleFactor)0, +0x3c8d1915)); OpenJDK: cc cc cc cc cc cc cc cc cc cc cc GNU Assembler: 62 64 84 10 4f bc 11 15 19 8d 3c [ FAILED ] AssemblerX86.validate_vm (13562 ms) [----------] 1 test from AssemblerX86 (13708 ms total) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2887139356 From enikitin at openjdk.org Fri May 16 16:23:56 2025 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Fri, 16 May 2025 16:23:56 GMT Subject: Integrated: 8356702: CTW: Update modules In-Reply-To: <5_pxWyLzGtPZEDsJKkq6i5wFIemDsY-OeXTgkVO_kuk=.ed16944a-2e41-4c19-a27c-6c1a8269da42@github.com> References: <5_pxWyLzGtPZEDsJKkq6i5wFIemDsY-OeXTgkVO_kuk=.ed16944a-2e41-4c19-a27c-6c1a8269da42@github.com> Message-ID: On Mon, 12 May 2025 05:57:46 GMT, Evgeny Nikitin wrote: > This PR enhances CTW test wrappers generator in order to make it more user-friendly. Added features are: > > 1. Automatic scanning for modules list under `open/src` > 2. Automatic recognition of current year; > 3. Multi-wrapper modules support (allows for splitting huge modules into 2 and more wrappers) > 4. ability to exclude modules; > > The updated generator have been used to refresh JTReg module wrappers. > The most meaningful change is contained in the `generate.bash` > Testing: `open/test/hotspot/jtreg/applications/ctw/modules` with the supported platforms, no failures spotted. This pull request has now been integrated. Changeset: d5245092 Author: Evgeny Nikitin Committer: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/d5245092249ed400f98711393e25e0ae97990daf Stats: 49 lines in 1 file changed: 40 ins; 0 del; 9 mod 8356702: CTW: Update modules Reviewed-by: lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/25175 From sparasa at openjdk.org Fri May 16 17:05:02 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 16 May 2025 17:05:02 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v23] In-Reply-To: <1e-92EcDWshsTiFbEmJt8z5SAVfhf5vpr8sgbEq3BbQ=.25d6d5f7-48d3-4a13-ac7d-8844844490fa@github.com> References: <6aZaHfVvUJFLz83fyZ42bnoSGseaRBYd0jEg_VLdS2Q=.4c681def-ee7c-4fcd-b147-348d317ac58f@github.com> <1e-92EcDWshsTiFbEmJt8z5SAVfhf5vpr8sgbEq3BbQ=.25d6d5f7-48d3-4a13-ac7d-8844844490fa@github.com> Message-ID: On Fri, 16 May 2025 01:23:48 GMT, Srinivas Vamsi Parasa wrote: >> We are only handling first variant of NDD instruction[1] in python test script , please extend the script to cover second variant[2] also. >> eaddq(Register dst, Register src1, Address src2, bool no_flags) - [1] >> eaddq(Register dst, Address src1, Register src2, bool no_flags) - [2] > >> We are only handling first variant of NDD instruction[1] in python test script , please extend the script to cover second variant[2] also. eaddq(Register dst, Register src1, Address src2, bool no_flags) - [1] eaddq(Register dst, Address src1, Register src2, bool no_flags) - [2] > > Hank's script is already handling the variant[2] in the `RegMemRegNddInstruction` class, for which no demotion is enabled. The demotion is enabled only for variant[1]. > Hi @vamsi-parasa , I am seeing some failures with --full mode when ENABLE_DEMOTION=False To investigate the issue, ran Hank's tests with `ENABLE_DEMOTION=False` (without the full_set) and looked at the errors. Even though the demotion flag was turned off in Hank's test script, demotion is enabled by default in this PR for OpenJDK and there's no corresponding flag in `src/hotspot/cpu/x86/assembler_x86.cpp` that allows demotion to be off in sync with the test script. Thus, `ENABLE_DEMOTION=False` can't be used with this PR. To address this, we can update x86-asmtest.py to not use this flag and enable demotion by default. Please let me know. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2887255797 From asmehra at openjdk.org Fri May 16 17:23:57 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Fri, 16 May 2025 17:23:57 GMT Subject: Integrated: 8357084: Zero build fails after JDK-8354887 In-Reply-To: References: Message-ID: On Fri, 16 May 2025 15:03:32 GMT, Ashutosh Mehra wrote: > This fixes compile failure in zero variant on aarch64. Verified by compiling on zero variant on linux-aarch64. This pull request has now been integrated. Changeset: 63ef90be Author: Ashutosh Mehra URL: https://git.openjdk.org/jdk/commit/63ef90be971267a1d3ceb6b7a03b570c34ac4d06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8357084: Zero build fails after JDK-8354887 Reviewed-by: ccheung, kvn ------------- PR: https://git.openjdk.org/jdk/pull/25269 From asmehra at openjdk.org Fri May 16 17:23:56 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Fri, 16 May 2025 17:23:56 GMT Subject: RFR: 8357084: Zero build fails after JDK-8354887 In-Reply-To: References: Message-ID: On Fri, 16 May 2025 15:40:29 GMT, Vladimir Kozlov wrote: >> This fixes compile failure in zero variant on aarch64. Verified by compiling on zero variant on linux-aarch64. > > Good. @vnkozlov thank you for reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25269#issuecomment-2887287664 From sparasa at openjdk.org Fri May 16 17:42:18 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 16 May 2025 17:42:18 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v27] In-Reply-To: References: Message-ID: > Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. > > For example: > > `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding > `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Update x86-asmtest.py to enable demotion by default and make test generation optional ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/65656aae..b2e8fd2a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=25-26 Stats: 2757 lines in 2 files changed: 204 ins; 356 del; 2197 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From sparasa at openjdk.org Fri May 16 17:47:56 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 16 May 2025 17:47:56 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v23] In-Reply-To: References: <6aZaHfVvUJFLz83fyZ42bnoSGseaRBYd0jEg_VLdS2Q=.4c681def-ee7c-4fcd-b147-348d317ac58f@github.com> <1e-92EcDWshsTiFbEmJt8z5SAVfhf5vpr8sgbEq3BbQ=.25d6d5f7-48d3-4a13-ac7d-8844844490fa@github.com> Message-ID: <8LrB8qVwcOxRccZp6uwJdK13JRTr_lXAqx16je8NWio=.e70b7f3a-2dde-4609-a3ac-8c872c42d389@github.com> On Fri, 16 May 2025 16:07:32 GMT, Jatin Bhateja wrote: >>> We are only handling first variant of NDD instruction[1] in python test script , please extend the script to cover second variant[2] also. eaddq(Register dst, Register src1, Address src2, bool no_flags) - [1] eaddq(Register dst, Address src1, Register src2, bool no_flags) - [2] >> >> Hank's script is already handling the variant[2] in the `RegMemRegNddInstruction` class, for which no demotion is enabled. The demotion is enabled only for variant[1]. > >> > Hi @vamsi-parasa , I don't see demotion tests being generated with full mode gtest, i.e. python3 x86-asmtest.py --full >> >> Please see the updated `x86-asmtest.py` refactored to work with full set (`--full`). Please let me know if anything is missing. > > Hi @vamsi-parasa , > I am seeing some failures with --full mode when ENABLE_DEMOTION=False > /home/jatinbha/sandboxes/apx-release/jdk/test/hotspot/gtest/x86/test_assembler_x86.cpp:61: Failure > Failed > __ ecmovq (Assembler::Condition::greater, r31, r31, Address(rcx, rdx, (Address::ScaleFactor)0, +0x3c8d1915)); > OpenJDK: cc cc cc cc cc cc cc cc cc cc cc > GNU Assembler: 62 64 84 10 4f bc 11 15 19 8d 3c > [ FAILED ] AssemblerX86.validate_vm (13562 ms) > [----------] 1 test from AssemblerX86 (13708 ms total) Hi Jatin (@jatin-bhateja), as a follow-up to the previous comment, Hank's test script was updated to enable demotion by default. Thus, the flag `ENABLE_DEMOTION` was removed. The new flag is `TEST_DEMOTION=True/False`. This will explicitly enable or disable the generation of testcases for demotion. If `TEST_DEMOTION=True`, then test cases with `dst==src1` will be explicitly generated. Ran the full_set with `TEST_DEMOTION=True and False` and verified that it works. Please let me know if you see any further issues. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2887336061 From jrose at openjdk.org Fri May 16 19:49:56 2025 From: jrose at openjdk.org (John R Rose) Date: Fri, 16 May 2025 19:49:56 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v6] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 22:26:30 GMT, Chen Liang wrote: >> In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list". > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Move intrinsic to be a subsection; just one most common function of the annotation > - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate > - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate > - Update src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java > > Co-authored-by: Raffaello Giulietti > - Shorter first sentence > - Updates, thanks to John > - Refine validation and defensive copying > - 8355223: Improve documentation on @IntrinsicCandidate src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 33: > 31: * The {@code @IntrinsicCandidate} indicates that an annotated method is > 32: * recognized by {@code vmIntrinsics.hpp} for special treatment by the HotSpot > 33: * VM, unlike the other methods. The HotSpot VM checks, when loading a class, /, unlike the other methods/s/the other/other/ If you want something at the top that teaches about "the other methods", maybe say something like: > Normally, the VM executes methods using only their bytecodes (or JNI definitions), but the VM is (in addition) allowed to use specialized code or analysis logic for methods marked as intrinsics. Thus, intrinsic calls can be replaced or modified or specially optimized by the VM. But those points will be covered further down, so maybe not. More concisely: s/, unlike the other methods/, unlike normal methods, which depend only on bytecodes or native entry points for their meaning to the VM/ Or just: s/, unlike the other methods// Also, s/The HotSpot VM checks, when loading a class,/When loading a class, the HotSpot VM checks/ (fewer commas, better logical flow) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2093593792 From jrose at openjdk.org Fri May 16 20:23:57 2025 From: jrose at openjdk.org (John R Rose) Date: Fri, 16 May 2025 20:23:57 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v6] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 22:26:30 GMT, Chen Liang wrote: >> In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list". > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Move intrinsic to be a subsection; just one most common function of the annotation > - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate > - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate > - Update src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java > > Co-authored-by: Raffaello Giulietti > - Shorter first sentence > - Updates, thanks to John > - Refine validation and defensive copying > - 8355223: Improve documentation on @IntrinsicCandidate Marked as reviewed by jrose (Reviewer). src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 38: > 36: * > 37: *

Intrinsification

> 38: * The most frequently special treatment is intrinsification, which replaces a > The most frequently special treatment is intrinsification, which Better: > Most frequently, the special treatment of an intrinsic is intrinsification, which src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 47: > 45: * intrinsics necessary. > 46: *

> 47: * Intrinsification may never happen, or happen at any moment during execution. s/or happen/or may happen/ (easier to parse) src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 53: > 51: * intrinsic implementors must ensure that non-bytecode execution has the same > 52: * results as execution of the actual Java code in all application contexts > 53: * (given the assumptions on the arguments hold). s/given the/given that the/ src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 57: > 55: * A candidate method should contain a minimal piece of Java code that should be > 56: * replaced by an intrinsic wholesale. The backing intrinsic is (in the common > 57: * case) unsafe - they do not perform checks, but have s/they do not perform/it may omit safety/ s/but have/but instead makes/ s/their/its/ s/that can ensure type safety/that type safety is ensured elsewhere/ src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 67: > 65: * accessing a field or method on an object which does not possess that field or > 66: * method; accessing an element of an array not actually present in the array; > 67: * and manipulating managed references in a way that prevents the GC from ? s/managed references/object references/ ("manage" is mentioned a moment later) src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 90: > 88: * intrinsic.) For example, the documentation can simply say that the result is > 89: * undefined if a race happens. However, race conditions must not lead to > 90: * program failures or type safety breaches, as listed above. Maybe add a teaching paragraph: > Reasoning about such race conditions is difficult, but it is a necessary skill when working with intrinsics that can observe racing shared variables. One example of a tolerable race is a repeated read of a shared reference. This only works if the algorithm takes no action based on the first read, other than deciding to perform the second read; it must "forget what it saw" in the first read. This is why the array-mismatch intrinsics can sometimes report a tentative search hit (maybe using vectorized code), which can then be confirmed (by scalar code) as the caller makes a fresh and independent observation. (This is done when the array mismatch logic performs NaN-folding. I just noticed that the NaN-folding code in ArraysSupport is slightly incorrect with respect to races!) ------------- PR Review: https://git.openjdk.org/jdk/pull/24777#pullrequestreview-2847502871 PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2093599254 PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2093605455 PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2093605395 PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2093605356 PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2093605323 PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2093632716 From lucy at openjdk.org Fri May 16 20:32:51 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 16 May 2025 20:32:51 GMT Subject: RFR: 8356778: Compiler add event logging in case of failures [v2] In-Reply-To: References: <2ADILKC05CmadEbbFmEJ9HIrkDEY0mfPc2XkumnuGMI=.3935341f-d8ef-4702-8b84-9aa4c7c36c2c@github.com> Message-ID: On Fri, 16 May 2025 10:23:42 GMT, Matthias Baesken wrote: >> We should add event logging to some related hotspot methods. While testing this functionality it turned out that sometimes the CompileTask pointer is 0, so this needs to be check to avoid crashes. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Update c1_Compilation.cpp - remove null check in bailout LGTM. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25188#pullrequestreview-2847569516 From kvn at openjdk.org Fri May 16 21:41:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 16 May 2025 21:41:52 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases [v6] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 15:00:30 GMT, Manuel H?ssig wrote: >> This PR introduces the flag `-XX:PrintPhaseLevel` that works like the flag `-XX:PrintIdealGraphLevel` and prints the name phases of a C2 compilation (essentially what we have in the left side bar in IGV) to the terminal. This allows redirecting the output to a file and comparing phase decisions between two compilations. Further, it is useful in conjunction with loop opts tracing to immediately see in which phase a certain optimization happened. >> >>

>> Output with `-XX:PrintPhaseLevel=2` >> >> >>> java-fastdebug -Xbatch -XX:CompileCommand=compileonly,TestLoop.test10 -XX:CompileCommand=printcompilation,TestLoop.test* -XX:PrintPhaseLevel=2 TestLoop.java >> CompileCommand: compileonly TestLoop.test10 bool compileonly = true >> CompileCommand: PrintCompilation TestLoop.test* bool PrintCompilation = true >> 3577 98 % b 3 TestLoop::test10 @ 2 (64 bytes) >> 3584 99 b 3 TestLoop::test10 (64 bytes) >> 3648 100 % b 4 TestLoop::test10 @ 2 (64 bytes) >> 1. After Parsing >> 2. Iter GVN 1 >> 3. Incremental Inline >> 4. Incremental Boxing Inline >> 5. Before Loop Optimizations >> 6. PhaseIdealLoop 1 >> 7. PhaseIdealLoop 2 >> 8. PhaseIdealLoop 3 >> 9. Before PhaseCCP 1 >> 10. PhaseCCP 1 >> 11. Iter GVN 2 >> 12. PhaseIdealLoop iterations >> 13. After Loop Optimizations >> 14. After Macro Expansion >> 15. Barrier expand >> 16. Optimize finished >> 17. Before matching >> 18. After matching >> 19. Global code motion >> 20. Register Allocation >> 21. Final Code >> 3668 103 b 4 TestLoop::test10 (64 bytes) >> 1. After Parsing >> 2. Iter GVN 1 >> 3. Incremental Inline >> 4. Incremental Boxing Inline >> 5. Before Loop Optimizations >> 6. PhaseIdealLoop 1 >> 7. PhaseIdealLoop 2 >> 8. PhaseIdealLoop 3 >> 9. Before PhaseCCP 1 >> 10. PhaseCCP 1 >> 11. Iter GVN 2 >> 12. PhaseIdealLoop iterations >> 13. PhaseIdealLoop iterations 2 >> 14. PhaseIdealLoop iterations 3 >> 15. PhaseIdealLoop iterations 4 >> 16. PhaseIdealLoop iterations 5 >> 17. PhaseIdealLoop iterations 6 >> 18. PhaseIdealLoop iterations 7 >> 19. PhaseIdealLoop iterations 8 >> 20. PhaseIdealLoop iterations 9 >> 21. After Loop Optimizations >> 22. After Macro Expansion >> 23. Barrier expand >> 24. Optimize finished >> 25. Before matching >> 26. After matching >> 27. Global code motion >> 28. Registe... > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from @chhagedorn > > Co-authored-by: Christian Hagedorn Good. Conversion to UL is not simple change. Lets do that as separate RFE. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25183#pullrequestreview-2847662910 PR Comment: https://git.openjdk.org/jdk/pull/25183#issuecomment-2887720919 From kvn at openjdk.org Fri May 16 22:26:00 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 16 May 2025 22:26:00 GMT Subject: RFR: 8357166: Many AOT tests failed with VM crash Message-ID: <_BC6P3NWZR1E2i8UxpXIVf3UggbEh3eRkwDpfNcNUDM=.7cede7ec-df62-4c0a-9410-caa38ba18428@github.com> Disable AOT runtime blobs generation when VerifyOops flag is on. AOT adapters are not affected - they don't have oops operation. Tested tier1 and tier7-comp (which failed without fix) ------------- Commit messages: - 8357166: Many AOT tests failed with VM crash Changes: https://git.openjdk.org/jdk/pull/25277/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25277&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357166 Stats: 13 lines in 1 file changed: 13 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25277.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25277/head:pull/25277 PR: https://git.openjdk.org/jdk/pull/25277 From duke at openjdk.org Sat May 17 05:26:57 2025 From: duke at openjdk.org (kuaiwei) Date: Sat, 17 May 2025 05:26:57 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function In-Reply-To: References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> Message-ID: On Thu, 8 May 2025 08:36:56 GMT, Christian Hagedorn wrote: >> Good catch! It would currently only be a problem when we clone nodes which is probably hard to check statically (could, for example, be part of a loop body and then be cloned). >> >> Some questions: >> - Have you also checked the Mach nodes? >> - Have you also checked that `cmp()` is overridden in case `hash()` is not `NO_HASH` for those nodes that specify at least one field? >> >> Just a side node, you can also just use `sizeof(*this)` which is often done in the code. > >> @chhagedorn I checked `machnode.hpp` manually and found some of them still miss `size_of()` . I added them in new patch. Thanks. > > Nice! > >> I checked node list in share/opto/classes.hpp, so MachNode/MachNullCheckNode/MachProjNode are checked. For mach nodes created by adlc, I found adlc will always add size_of function. > > Thanks for checking that. > >> I haven't checked cmp() and hash() , I will check if my test can cover these. > > Sounds good, thanks! @chhagedorn @TobiHartmann Thanks for your review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25081#issuecomment-2888097786 From duke at openjdk.org Sat May 17 05:26:57 2025 From: duke at openjdk.org (duke) Date: Sat, 17 May 2025 05:26:57 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function [v5] In-Reply-To: References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> Message-ID: On Fri, 16 May 2025 05:12:38 GMT, kuaiwei wrote: >> I wrote a test to check if every C2 IR node has correct size_of() function. And I found some of them are missed. They added new fields and not add size_of() to reflect new size. In linux, it does not cause issue so far, because gcc allocate more space for alignment and can keep these additional `bool` flags. But it will report failure on windows. And if anyone modified base class, it will cause problem. >> >> PS, My test is in https://github.com/openjdk/jdk/compare/master...kuaiwei:jdk:test/check_node_size , but it has many hack on IR nodes to make test to run. > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Minor change @kuaiwei Your change (at version 08debbd75d35fc52c0738eaec876411b2e42d51c) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25081#issuecomment-2888098202 From aph at openjdk.org Sat May 17 09:06:54 2025 From: aph at openjdk.org (Andrew Haley) Date: Sat, 17 May 2025 09:06:54 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Fri, 16 May 2025 14:59:18 GMT, Roberto Casta?eda Lozano wrote: >> src/hotspot/cpu/aarch64/gc/z/z_aarch64.ad line 131: >> >>> 129: !MacroAssembler::legitimize_address_requires_lea(ref_addr, size), >>> 130: "an instruction that can be used for implicit null checking should emit the candidate memory access first"); >>> 131: ref_addr = __ legitimize_address(ref_addr, size, rscratch2); >> >> I just saw this. I think it might be simpler and better to handle this case in the segfault handler. > > OK. C2 does not currently support creating exception table entries with arbitrary offsets relative to the start address of the code emitted for a Mach node, so that support would have to be added. I prototyped this support [here](https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-implicit-null-checks), see calls to `record_exception_pc_offset()`. I don't think it is, overall, simpler than the approach proposed in this PR - definitely not from a `PhaseOutput`/`C2_MacroAssembler` perspective. But if you still think it is worth exploring, I will create a new prototype with the `record_exception_pc_offset()` on top of this PR to make it easier to compare. I don't think you have to do that. I think you only have to mark both the lea and the memory access with an exception table entry. The segfault handler sees the two entries, deduces that this access is split into two instructions, and does the right thing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2094064464 From qamai at openjdk.org Sat May 17 14:59:38 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 17 May 2025 14:59:38 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII Message-ID: Hi, The issue here is that the `CastLLNode` is created before the actual check that ensures the range of the input. This patch fixes it by moving the creation to the correct place, which is under `inline_block`. Please take a look and leave your reviews, thanks a lot. ------------- Commit messages: - misplaced CastLL Changes: https://git.openjdk.org/jdk/pull/25284/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25284&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8355574 Stats: 13 lines in 3 files changed: 6 ins; 5 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25284.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25284/head:pull/25284 PR: https://git.openjdk.org/jdk/pull/25284 From qamai at openjdk.org Sat May 17 15:14:36 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 17 May 2025 15:14:36 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII [v2] In-Reply-To: References: Message-ID: > Hi, > > The issue here is that the `CastLLNode` is created before the actual check that ensures the range of the input. This patch fixes it by moving the creation to the correct place, which is under `inline_block`. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: fix issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25284/files - new: https://git.openjdk.org/jdk/pull/25284/files/5f4487d8..336c295c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25284&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25284&range=00-01 Stats: 12 lines in 2 files changed: 1 ins; 7 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25284.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25284/head:pull/25284 PR: https://git.openjdk.org/jdk/pull/25284 From qamai at openjdk.org Sat May 17 20:33:12 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 17 May 2025 20:33:12 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII [v3] In-Reply-To: References: Message-ID: > Hi, > > The issue here is that the `CastLLNode` is created before the actual check that ensures the range of the input. This patch fixes it by moving the creation to the correct place, which is under `inline_block`. I also noticed that the code there seems incorrect and confusing. `ArrayCopyNode::get_partial_inline_vector_lane_count` takes the length of the array, not the size in bytes. If you look into the method it will multiply `const_len` with `type2aelementbytes(bt)` to get the size in bytes of the array. In the runtime test, we compare `length << log2(type2bytes(bt))` with `ArrayOperationPartialInlineSize`. This seems confusing, why don't we just compare `length` with `ArrayOperationPartialInlineSize / type2bytes(bt)`, it also unifies the test with the actual cast. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25284/files - new: https://git.openjdk.org/jdk/pull/25284/files/336c295c..72e72180 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25284&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25284&range=01-02 Stats: 34 lines in 3 files changed: 4 ins; 4 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/25284.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25284/head:pull/25284 PR: https://git.openjdk.org/jdk/pull/25284 From jbhateja at openjdk.org Sun May 18 01:43:05 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 18 May 2025 01:43:05 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v27] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 17:42:18 GMT, Srinivas Vamsi Parasa wrote: >> Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. >> >> For example: >> >> `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding >> `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Update x86-asmtest.py to enable demotion by default and make test generation optional Hi @vamsi-parasa , Kindly accomodate some re-factoring suggestions, overall patch looks good to me. Best Regards, Jatin src/hotspot/cpu/x86/assembler_x86.cpp line 1657: > 1655: void Assembler::eandl(Register dst, Register src1, Address src2, bool no_flags) { > 1656: InstructionMark im(this); > 1657: evex_prefix_int8_operand(dst, src1, src2, VEX_SIMD_NONE, VEX_OPCODE_0F_3C /* MAP4 */, EVEX_32bit, 0x23, no_flags); Nomenclature of routine is not very clear here, why do you mean by int8 operand, this is operating over 32bit word. src/hotspot/cpu/x86/assembler_x86.cpp line 6809: > 6807: > 6808: void Assembler::eshldl(Register dst, Register src1, Register src2, int8_t imm8, bool no_flags) { > 6809: evex_opcode_prefix_and_encode(dst->encoding(), src1->encoding(), src2->encoding(), imm8, VEX_SIMD_NONE, VEX_OPCODE_0F_3C /* MAP4 */, EVEX_32bit, 0xA4, no_flags, true /* is_map1 */); Here the opcode is set to 0xA4 which is correct for demotion case, in case we don't demote then opcode is 0x24. This is only relevant to NDD flavours of shldl/shldq with imm8 shift values. I will suggest adding a comment here giving clear explanation as it not intiutive at first glance and manul clearly specify 0x24 as the opcode. src/hotspot/cpu/x86/assembler_x86.cpp line 6827: > 6825: > 6826: void Assembler::eshrdl(Register dst, Register src1, Register src2, int8_t imm8, bool no_flags) { > 6827: evex_opcode_prefix_and_encode(dst->encoding(), src1->encoding(), src2->encoding(), imm8, VEX_SIMD_NONE, VEX_OPCODE_0F_3C /* MAP4 */, EVEX_32bit, 0xAC, no_flags, true /* is_map1 */); Suggestion: emit_eevex_or_demote(dst->encoding(), src1->encoding(), src2->encoding(), imm8, VEX_SIMD_NONE, VEX_OPCODE_0F_3C /* MAP4 */, EVEX_32bit, 0xAC, no_flags, true /* is_map1 */); Above nomenclature looks more appropriate here. src/hotspot/cpu/x86/assembler_x86.cpp line 6935: > 6933: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 6934: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_32bit); > 6935: evex_prefix_ndd(src, dst->encoding(), 0, VEX_SIMD_NONE, VEX_OPCODE_0F_3C /* MAP4 */, &attributes, no_flags); To comply with existing convention like below https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.hpp#L762 We should use eevex instead of evex as the prefix src/hotspot/cpu/x86/assembler_x86.cpp line 12857: > 12855: > 12856: void Assembler::evex_prefix_int8_operand(Register dst, Register src1, Address src2, VexSimdPrefix pre, VexOpcode opc, > 12857: int size, int b1, bool no_flags, bool is_map1) { Suggestion: int size, int b1, bool no_flags, bool is_map1) { Suggestion: int size, int opcode_byte, bool no_flags, bool is_map1) { src/hotspot/cpu/x86/assembler_x86.cpp line 12942: > 12940: } > 12941: > 12942: void Assembler::evex_opcode_prefix_and_encode(int dst_enc, int nds_enc, int src_enc, int8_t imm8, VexSimdPrefix pre, VexOpcode opc, Only difference b/w this method and one below is that it accepts an immediate shift count and modifies the opcode if we domete the instruction to legacy / REX2 variant. Demotion logic and rest of the logic is exaclty same. Should we merge these into one and then based on the incoming opcode i.e. if its 0x24 or 0x2C we chonsider immediate shift and associated opcode pruning if demoted. src/hotspot/cpu/x86/assembler_x86.cpp line 12948: > 12946: int encode = is_prefixq ? prefixq_and_encode(src_enc, dst_enc, is_map1) : prefix_and_encode(src_enc, dst_enc, is_map1); > 12947: emit_opcode_prefix_and_encoding((unsigned char)byte1, 0xC0, encode, imm8); > 12948: } else { FTR, existing demotion w.r.t to first operand is safe for all kinds to instructions, for commutative instructions, add, mul, xor, and, or, max , min etc, we can check against the second operand by passing is_commutative flags from top level assembler instruction. I am ok to handle this as part of https://bugs.openjdk.org/browse/JDK-8354348 src/hotspot/cpu/x86/assembler_x86.cpp line 12958: > 12956: void Assembler::evex_opcode_prefix_and_encode(int dst_enc, int nds_enc, int src_enc, VexSimdPrefix pre, VexOpcode opc, > 12957: int size, int byte1, bool no_flags, bool is_map1) { > 12958: bool is_prefixq = (size == EVEX_64bit); Nit pick, on line https://github.com/openjdk/jdk/pull/24431/files#diff-e3576e9c22db89236cdb906f032ff00748ff6d1c21b05277d991d80af75daf3aR12944 you are using a conditional operator to select b/w true and false., Lets follow one convention. src/hotspot/cpu/x86/assembler_x86.cpp line 12962: > 12960: if (size == EVEX_16bit) { > 12961: emit_int8(0x66); > 12962: } I cannot find a caller that passes EVEX_16bit for the size argument. src/hotspot/cpu/x86/assembler_x86.cpp line 12973: > 12971: } > 12972: > 12973: void Assembler::evex_opcode_prefix_and_encode_swap(int dst_enc, int nds_enc, int src_enc, VexSimdPrefix pre, VexOpcode opc, Can we also not unify this one with evex_opcode_prefix_and_encode by passing additional swap argument src/hotspot/cpu/x86/assembler_x86.cpp line 12991: > 12989: > 12990: int Assembler::evex_prefix_and_encode_ndd(int dst_enc, int nds_enc, int src_enc, VexSimdPrefix pre, VexOpcode opc, > 12991: InstructionAttr *attributes, bool no_flags, bool use_prefixq) { evex_prefix_and_encode_ndd => emit_eevex_prefix_or_demote_ndd Naming suggestion. src/hotspot/cpu/x86/assembler_x86.cpp line 13002: > 13000: } > 13001: > 13002: int Assembler::evex_prefix_and_encode_ndd(int dst_enc, int nds_enc, VexSimdPrefix pre, VexOpcode opc, Suggestion: int Assembler::emit_eevex_prefix_or_demote_ndd(int dst_enc, int nds_enc, VexSimdPrefix pre, VexOpcode opc, Nameing suggetion src/hotspot/cpu/x86/assembler_x86.cpp line 13024: > 13022: } > 13023: > 13024: void Assembler::evex_prefix_arith(Register dst, Register nds, int32_t imm32, VexSimdPrefix pre, VexOpcode opc, Suggestion: void Assembler::emit_eevex_prefix_or_demote_arith_ndd(Register dst, Register nds, int32_t imm32, VexSimdPrefix pre, VexOpcode opc, ------------- PR Review: https://git.openjdk.org/jdk/pull/24431#pullrequestreview-2847341916 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094104970 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094296385 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094297820 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2093533374 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094299275 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094299023 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2093497917 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2093540188 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094102193 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094300060 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094300860 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094300471 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094301333 From jbhateja at openjdk.org Sun May 18 01:43:05 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 18 May 2025 01:43:05 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v27] In-Reply-To: References: Message-ID: <4xANhUPLFW33T1AmImCqr0L7LjQxdtmzdBULh6T5bEk=.95918caf-b803-42b6-8f16-4daa0f0028d0@github.com> On Sun, 18 May 2025 01:04:45 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Update x86-asmtest.py to enable demotion by default and make test generation optional > > src/hotspot/cpu/x86/assembler_x86.cpp line 6809: > >> 6807: >> 6808: void Assembler::eshldl(Register dst, Register src1, Register src2, int8_t imm8, bool no_flags) { >> 6809: evex_opcode_prefix_and_encode(dst->encoding(), src1->encoding(), src2->encoding(), imm8, VEX_SIMD_NONE, VEX_OPCODE_0F_3C /* MAP4 */, EVEX_32bit, 0xA4, no_flags, true /* is_map1 */); > > Here the opcode is set to 0xA4 which is correct for demotion case, in case we don't demote then opcode is 0x24. This is only relevant to NDD flavours of shldl/shldq with imm8 shift values. I will suggest adding a comment here giving clear explanation as it not intiutive at first glance and manul clearly specify 0x24 as the opcode. Better idea to pass 0x24 opcode from the top level which is what manual says and do the appropriate adjustments by oring with 0x80 if we demote it to REX2 / REX prefixed instruction ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2094296930 From fjiang at openjdk.org Sun May 18 03:48:53 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Sun, 18 May 2025 03:48:53 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v10] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 08:11:16 GMT, Anjian-Wen wrote: >> From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time >> >> on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below >> >> before the patch >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op >> MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op >> MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op >> MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op >> MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op >> MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op >> MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op >> MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op >> MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op >> MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op >> MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op >> MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op >> MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op >> MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op >> MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op >> MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op >> MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op >> MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op >> MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/o... > > Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > RISC-V: Intrinsify Unsafe::setMemory src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1665: > 1663: Label L_exit, L_fill_elements, L_loop; > 1664: > 1665: const Register to = c_rarg0; As the comments said before, we can use `dest` instead of `to`. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1743: > 1741: // Handle copies less than 8 bytes > 1742: __ bind(L_fill_elements); > 1743: __ beqz(count, L_exit); If `count` may be zero, we can put `beqz` at the beginning of the stub. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1749: > 1747: __ addi(to, to, 1); > 1748: __ subi(count, count, 1); > 1749: __ bnez(count, L_loop); If we unroll the byte storage, will there be additional performance gains when the "count" is less than 8? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2094323633 PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2094342362 PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2094344728 From qamai at openjdk.org Sun May 18 07:06:41 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 18 May 2025 07:06:41 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII [v4] In-Reply-To: References: Message-ID: > Hi, > > The issue here is that the `CastLLNode` is created before the actual check that ensures the range of the input. This patch fixes it by moving the creation to the correct place, which is under `inline_block`. I also noticed that the code there seems incorrect and confusing. `ArrayCopyNode::get_partial_inline_vector_lane_count` takes the length of the array, not the size in bytes. If you look into the method it will multiply `const_len` with `type2aelementbytes(bt)` to get the size in bytes of the array. In the runtime test, we compare `length << log2(type2bytes(bt))` with `ArrayOperationPartialInlineSize`. This seems confusing, why don't we just compare `length` with `ArrayOperationPartialInlineSize / type2bytes(bt)`, it also unifies the test with the actual cast. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: - fix comment - fix comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25284/files - new: https://git.openjdk.org/jdk/pull/25284/files/72e72180..fdbb88bd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25284&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25284&range=02-03 Stats: 4 lines in 2 files changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25284.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25284/head:pull/25284 PR: https://git.openjdk.org/jdk/pull/25284 From mbaesken at openjdk.org Sun May 18 11:34:55 2025 From: mbaesken at openjdk.org (Matthias Baesken) Date: Sun, 18 May 2025 11:34:55 GMT Subject: Integrated: 8356778: Compiler add event logging in case of failures In-Reply-To: <2ADILKC05CmadEbbFmEJ9HIrkDEY0mfPc2XkumnuGMI=.3935341f-d8ef-4702-8b84-9aa4c7c36c2c@github.com> References: <2ADILKC05CmadEbbFmEJ9HIrkDEY0mfPc2XkumnuGMI=.3935341f-d8ef-4702-8b84-9aa4c7c36c2c@github.com> Message-ID: On Mon, 12 May 2025 17:56:47 GMT, Matthias Baesken wrote: > We should add event logging to some related hotspot methods. While testing this functionality it turned out that sometimes the CompileTask pointer is 0, so this needs to be check to avoid crashes. This pull request has now been integrated. Changeset: 6c42856b Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/6c42856b8d5039c14ba04a48c60d09039d5030fe Stats: 22 lines in 3 files changed: 21 ins; 0 del; 1 mod 8356778: Compiler add event logging in case of failures Reviewed-by: lucy ------------- PR: https://git.openjdk.org/jdk/pull/25188 From duke at openjdk.org Sun May 18 11:54:52 2025 From: duke at openjdk.org (Anjian-Wen) Date: Sun, 18 May 2025 11:54:52 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v10] In-Reply-To: References: Message-ID: On Sun, 18 May 2025 02:41:13 GMT, Feilong Jiang wrote: >> Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> RISC-V: Intrinsify Unsafe::setMemory > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1665: > >> 1663: Label L_exit, L_fill_elements, L_loop; >> 1664: >> 1665: const Register to = c_rarg0; > > As the comments said before, we can use `dest` instead of `to`. Thanks for reviews, I will change it later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2094500862 From duke at openjdk.org Sun May 18 11:59:00 2025 From: duke at openjdk.org (Anjian-Wen) Date: Sun, 18 May 2025 11:59:00 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v10] In-Reply-To: References: Message-ID: On Sun, 18 May 2025 03:19:42 GMT, Feilong Jiang wrote: >> Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> RISC-V: Intrinsify Unsafe::setMemory > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1743: > >> 1741: // Handle copies less than 8 bytes >> 1742: __ bind(L_fill_elements); >> 1743: __ beqz(count, L_exit); > > If `count` may be zero, we can put `beqz` at the beginning of the stub. Thanks?but I think there may be relatively fewer case when count is equal to 0 compare to not equal to 0?and it may not worth to execute a beqz at the entrance? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2094502075 From duke at openjdk.org Sun May 18 12:16:44 2025 From: duke at openjdk.org (Anjian-Wen) Date: Sun, 18 May 2025 12:16:44 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v11] In-Reply-To: References: Message-ID: <9sRN50M8x8owgFSEhEbU3qctoPBAhZm6GkHexAw0uWQ=.a2b9e7bc-df09-4bfc-a4a0-e862b7fcd139@github.com> > From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time > > on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below > > before the patch > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op > MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op > MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op > MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op > MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op > MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op > MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op > MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op > MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op > MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op > MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op > MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op > MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op > MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op > MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op > MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op > MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op > MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op > MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/op > MemorySegmentZeroUnsafe.panama false 63 avgt 30... Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: change the name of dest from 'to' to 'dest' ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23890/files - new: https://git.openjdk.org/jdk/pull/23890/files/0b03bb2a..61648e2e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=09-10 Stats: 20 lines in 1 file changed: 1 ins; 0 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/23890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23890/head:pull/23890 PR: https://git.openjdk.org/jdk/pull/23890 From duke at openjdk.org Sun May 18 12:16:46 2025 From: duke at openjdk.org (Anjian-Wen) Date: Sun, 18 May 2025 12:16:46 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v10] In-Reply-To: References: Message-ID: On Sun, 18 May 2025 03:40:05 GMT, Feilong Jiang wrote: >> Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> RISC-V: Intrinsify Unsafe::setMemory > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1749: > >> 1747: __ addi(to, to, 1); >> 1748: __ subi(count, count, 1); >> 1749: __ bnez(count, L_loop); > > If we unroll the byte storage, will there be additional performance gains when the `count` is less than 8? yes, I think normally if we unroll the byte storage we can gains additional performance. But sometime the dest address may not be aligned with the count, make the performance very poor on some align sensitive hardware. An additional alignment is required, it seems store the bytes one by one with a loop may be a simple way with limited performance loss compare with it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2094505521 From duke at openjdk.org Sun May 18 23:30:55 2025 From: duke at openjdk.org (kuaiwei) Date: Sun, 18 May 2025 23:30:55 GMT Subject: Integrated: 8356328: Some C2 IR nodes miss size_of() function In-Reply-To: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> Message-ID: On Wed, 7 May 2025 07:04:26 GMT, kuaiwei wrote: > I wrote a test to check if every C2 IR node has correct size_of() function. And I found some of them are missed. They added new fields and not add size_of() to reflect new size. In linux, it does not cause issue so far, because gcc allocate more space for alignment and can keep these additional `bool` flags. But it will report failure on windows. And if anyone modified base class, it will cause problem. > > PS, My test is in https://github.com/openjdk/jdk/compare/master...kuaiwei:jdk:test/check_node_size , but it has many hack on IR nodes to make test to run. This pull request has now been integrated. Changeset: 9927ec0b Author: Kuai Wei Committer: Shaojin Wen URL: https://git.openjdk.org/jdk/commit/9927ec0b91775db342b2bbc1937253325c367a19 Stats: 28 lines in 4 files changed: 20 ins; 4 del; 4 mod 8356328: Some C2 IR nodes miss size_of() function Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/25081 From haosun at openjdk.org Mon May 19 00:01:51 2025 From: haosun at openjdk.org (Hao Sun) Date: Mon, 19 May 2025 00:01:51 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v3] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 15:30:43 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend (both Neon and SVE) for FP16 vector operations - add, mul, sub, div, min, max, sqrt and fma. >> >> Testing: >> JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) pass on aarch64 which also includes the JTREG test to test the FP16 vector operations - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Remove additional spaces in the aarch64_vector_ad.m4 file Thanks for your update. LGTM. ------------- Marked as reviewed by haosun (Committer). PR Review: https://git.openjdk.org/jdk/pull/25096#pullrequestreview-2849130327 From dzhang at openjdk.org Mon May 19 00:45:52 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Mon, 19 May 2025 00:45:52 GMT Subject: RFR: 8350960: RISC-V: Add riscv backend for Float16 operations - vectorization [v3] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 08:30:07 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> It's a follow-up of https://github.com/openjdk/jdk/commit/9a3f9997b68a1f64e53b9711b878fb073c3c9b90. >> Thanks! >> >> ## Test >> >> Performance data >> >> Benchmark | (vectorDim) | Mode | Cnt | Score - patch | Score - master | Improvement (master/patch) | Error | Units >> -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 382.123 | 2595.718 | 6.793 | 0.631 | ns/op >> Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 563.726 | 5167.687 | 9.167 | 0.063 | ns/op >> Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 888.455 | 9468.714 | 10.658 | 0.147 | ns/op >> Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 1540.255 | 18879.796 | 12.258 | 0.396 | ns/op >> Float16OperationsBenchmark.divBenchmark | 256 | avgt | 10 | 579.959 | 4028.335 | 6.946 | 0.008 | ns/op >> Float16OperationsBenchmark.divBenchmark | 512 | avgt | 10 | 914.634 | 8034.234 | 8.784 | 0.027 | ns/op >> Float16OperationsBenchmark.divBenchmark | 1024 | avgt | 10 | 1494.017 | 15125.924 | 10.124 | 0.292 | ns/op >> Float16OperationsBenchmark.divBenchmark | 2048 | avgt | 10 | 2728.517 | 30197.97 | 11.068 | 32.869 | ns/op >> Float16OperationsBenchmark.fmaBenchmark | 256 | avgt | 10 | 476.764 | 2817.035 | 5.909 | 0.012 | ns/op >> Float16OperationsBenchmark.fmaBenchmark | 512 | avgt | 10 | 707.035 | 5239.438 | 7.41 | 0.129 | ns/op >> Float16OperationsBenchmark.fmaBenchmark | 1024 | avgt | 10 | 1114.29 | 7361.105 | 6.606 | 0.024 | ns/op >> Float16OperationsBenchmark.fmaBenchmark | 2048 | avgt | 10 | 1931.713 | 14465.602 | 7.488 | 1.852 | ns/op >> Float16OperationsBenchmark.maxBenchmark | 256 | avgt | 10 | 501.892 | 3754.563 | 7.481 | 0.408 | ns/op >> Float16OperationsBenchmark.maxBenchmark | 512 | avgt | 10 | 738.148 | 7450.666 | 10.094 | 1.206 | ns/op >> Float16OperationsBenchmark.maxBenchmark | 1024 | avgt | 10 | 1195.262 | 15463.892 | 12.938 | 8.889 | ns/op >> Float16OperationsBenchmark.maxBenchmark | 2048 | avgt | 10 | 2253.656 | 30649.239 | 13.6 | 6.154 | ns/op >> Float16OperationsBenchmark.minBenchmark | 256 | avgt | 10 | 501.873 | 3753.9 | 7.48 ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minor LGTM, thanks! ------------- Marked as reviewed by dzhang (Author). PR Review: https://git.openjdk.org/jdk/pull/25181#pullrequestreview-2849151661 From xgong at openjdk.org Mon May 19 03:13:55 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 19 May 2025 03:13:55 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API In-Reply-To: References: Message-ID: On Fri, 9 May 2025 07:35:41 GMT, Xiaohong Gong wrote: > JDK-8318650 introduced hotspot intrinsification of subword gather load APIs for X86 platforms [1]. However, the current implementation is not optimal for AArch64 SVE platform, which natively supports vector instructions for subword gather load operations using an int vector for indices (see [2][3]). > > Two key areas require improvement: > 1. At the Java level, vector indices generated for range validation could be reused for the subsequent gather load operation on architectures with native vector instructions like AArch64 SVE. However, the current implementation prevents compiler reuse of these index vectors due to divergent control flow, potentially impacting performance. > 2. At the compiler IR level, the additional `offset` input for `LoadVectorGather`/`LoadVectorGatherMasked` with subword types increases IR complexity and complicates backend implementation. Furthermore, generating `add` instructions before each memory access negatively impacts performance. > > This patch refactors the implementation at both the Java level and compiler mid-end to improve efficiency and maintainability across different architectures. > > Main changes: > 1. Java-side API refactoring: > - Explicitly passes generated index vectors to hotspot, eliminating duplicate index vectors for gather load instructions on > architectures like AArch64. > 2. C2 compiler IR refactoring: > - Refactors `LoadVectorGather`/`LoadVectorGatherMasked` IR for subword types by removing the memory offset input and incorporating it into the memory base `addr` at the IR level. This simplifies backend implementation, reduces add operations, and unifies the IR across all types. > 3. Backend changes: > - Streamlines X86 implementation of subword gather operations following the removal of the offset input from the IR level. > > Performance: > The performance of the relative JMH improves up to 27% on a X86 AVX512 system. Please see the data below: > > Benchmark Mode Cnt Unit SIZE Before After Gain > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 64 53682.012 52650.325 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 256 14484.252 14255.156 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 1024 3664.900 3595.615 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 4096 908.312 935.269 1.02 > GatherOperationsBenchmark.micr... Ping again~ could any one please take a look at this PR? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-2889487313 From fyang at openjdk.org Mon May 19 03:43:00 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 19 May 2025 03:43:00 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v3] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: <9hPoS71U0nWv19FLVE7E5Q1SUTRziIPqhoATcrPAe0E=.0397e761-048e-436e-bbd6-dbfb418fa76f@github.com> On Fri, 16 May 2025 08:02:16 GMT, Roberto Casta?eda Lozano wrote: >> Two small nits/questions, but otherwise ready from my side :) > >> Two small nits/questions, but otherwise ready from my side :) > > Thanks again for reviewing @eme64, I have addressed your questions now. And thanks also for your review @vnkozlov. > > @stefank @fisk @xmas92 @jsikstro may I get a review from the GC side? > > @RealFYang @TheRealMDoerr note that this PR also introduces implicit null check support for ZGC loads in RISC-V and PPC, but I cannot test it beyond GHA. May I ask you to test the changes on your respective platforms? (or let me know if you prefer to add the support in separate PRs). @robcasloz : Hi, Thanks for the ping! I performed tier1-3 tests on linux-riscv64 platform, result is good. The new test `test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java` also pass when running with G1 and ZGC using fastdebug build. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2889518370 From jkarthikeyan at openjdk.org Mon May 19 05:01:43 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 19 May 2025 05:01:43 GMT Subject: RFR: 8355512: Test compiler/vectorization/TestVectorZeroCount.java times out with -XX:TieredStopAtLevel=3 [v2] In-Reply-To: References: Message-ID: <2RgkiVAWZ2fjJ-V4DP68sSiGAe4GzGAujK2t5yhYpyQ=.e26331df-e781-4b4c-856d-edde038c2389@github.com> > Hi all, > This is a small patch to TestVectorZeroCount to make it only execute when C2 is enabled, to fix a timeout with -XX:TieredStopAtLevel=3. This test takes a long time to finish without C2 because it iterates through all of the integers twice. Since the intention of the test is to stress the C2-specific `numberOfLeadingZeros` and `numberOfTrailingZeros` intrinsics, I think it makes sense to limit it to running with C2 only. > > Reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Change requires to run test on Graal as well ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25243/files - new: https://git.openjdk.org/jdk/pull/25243/files/a35fbef4..3a7031fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25243&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25243&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25243.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25243/head:pull/25243 PR: https://git.openjdk.org/jdk/pull/25243 From jkarthikeyan at openjdk.org Mon May 19 05:01:44 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 19 May 2025 05:01:44 GMT Subject: RFR: 8355512: Test compiler/vectorization/TestVectorZeroCount.java times out with -XX:TieredStopAtLevel=3 In-Reply-To: <_ZX5WU8JRDjnkJg7YiE29_Pfzp6LhgpWxKBAfo9rKBE=.a7fc5946-b082-43f3-9ba6-e256304e43e3@github.com> References: <_ZX5WU8JRDjnkJg7YiE29_Pfzp6LhgpWxKBAfo9rKBE=.a7fc5946-b082-43f3-9ba6-e256304e43e3@github.com> Message-ID: On Thu, 15 May 2025 14:12:31 GMT, Emanuel Peter wrote: >> Hi all, >> This is a small patch to TestVectorZeroCount to make it only execute when C2 is enabled, to fix a timeout with -XX:TieredStopAtLevel=3. This test takes a long time to finish without C2 because it iterates through all of the integers twice. Since the intention of the test is to stress the C2-specific `numberOfLeadingZeros` and `numberOfTrailingZeros` intrinsics, I think it makes sense to limit it to running with C2 only. >> >> Reviews would be appreciated! > > @jaskarth @chhagedorn Hmm, this means that this test could not be run with GraalVM, for example. That's a shame. > Do we not have some way to assert that there should be "some fast compiler"? Ah, what does `vm.flavor == "server"` do? Because I have seen lines like `vm.flavor == "server" & !vm.graal.enabled` before. @eme64 I think `vm.flavor == "server"` ensures that the test runs on server-class VMs, but I wasn't sure if manually setting the `TieredStopAtLevel` changes that designation. I looked at some more tests and I saw that they were using `@requires vm.flavor == "server" & (vm.opt.TieredStopAtLevel == null | vm.opt.TieredStopAtLevel == 4)`, which looks like a good way to check for any tier4 compiler. I've updated the patch to use that method instead, which should let it run on graal as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25243#issuecomment-2889616594 From thartmann at openjdk.org Mon May 19 05:42:50 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 19 May 2025 05:42:50 GMT Subject: RFR: 8357166: Many AOT tests failed with VM crash In-Reply-To: <_BC6P3NWZR1E2i8UxpXIVf3UggbEh3eRkwDpfNcNUDM=.7cede7ec-df62-4c0a-9410-caa38ba18428@github.com> References: <_BC6P3NWZR1E2i8UxpXIVf3UggbEh3eRkwDpfNcNUDM=.7cede7ec-df62-4c0a-9410-caa38ba18428@github.com> Message-ID: On Fri, 16 May 2025 22:22:21 GMT, Vladimir Kozlov wrote: > Disable AOT runtime blobs generation when VerifyOops flag is on. > AOT adapters are not affected - they don't have oops operation. > > Tested tier1 and tier7-comp (which failed without fix) Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25277#pullrequestreview-2849435850 From fjiang at openjdk.org Mon May 19 06:11:55 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 19 May 2025 06:11:55 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v10] In-Reply-To: References: Message-ID: On Sun, 18 May 2025 11:56:24 GMT, Anjian-Wen wrote: >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1743: >> >>> 1741: // Handle copies less than 8 bytes >>> 1742: __ bind(L_fill_elements); >>> 1743: __ beqz(count, L_exit); >> >> If `count` may be zero, we can put `beqz` at the beginning of the stub. > > Thanks?but I think there may be relatively fewer case when count is equal to 0 compare to not equal to 0?and it may not worth to execute a beqz at the entrance? This makes sense, so let's just leave it as it is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2094905144 From jbhateja at openjdk.org Mon May 19 06:11:59 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 19 May 2025 06:11:59 GMT Subject: RFR: 8349138: Optimize Math.copySign API for Intel e-core targets [v2] In-Reply-To: References: Message-ID: On Thu, 13 Feb 2025 09:18:00 GMT, Emanuel Peter wrote: >>> > @jatin-bhateja Doing the transformation to `AndF` would be a more general solution and thus better. >>> > > Introducing another new IR "AndF" will again need changes in auto-vectorizer. >>> > >>> > >>> > But currently, `CopySign` and `MoveF2I` are not vectorized anyway so we can do the vectorization of `AndF` in a separate patch without much hassle. `AndF` is vectorized into existing `AndV` nicely so it is not a too complicated work. >>> >>> Yes, I have a follow-up patch to auto-vectorized CopySign. >>> >>> > > this patch does not break existing IR invariants >>> > >>> > >>> > Also, what invariant can be broken by transforming `AndI(MoveF2I(x), MoveF2I(y)` into `MoveF2I(AndF(x, y))`? >>> >>> Hi @merykitty , I meant that in the context of CopySign, targets emit efficient instruction sequences for existing IR (CopySignF/D), this patch simply tuned x86 backend implementation to improve performance. >> >> >> Also currently, logical And mask is a long value, in case we opt-in for new AndF/D node creation, to preserve the IR semantics we would also need to perform an integral to floating point constant conversion, this will incur additional memory load penalty since floating-point constants are emitted into the constant table before native method body. >> >> For the time being, taking CopySign intrinsic route looks reasonable. > > @jatin-bhateja let me know when this is ready for more testing / review. > > Quick comment: it seems you are not just optimizing Math.copySign as the PR title says, but also adding vector nodes. Maybe you should update the PR title? Have not looked at the code in detail to suggest a better one yet ;) Hi @eme64 , your comments are addressed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23386#issuecomment-2889732253 From fjiang at openjdk.org Mon May 19 06:26:51 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 19 May 2025 06:26:51 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v10] In-Reply-To: References: Message-ID: On Sun, 18 May 2025 12:10:59 GMT, Anjian-Wen wrote: >> src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1749: >> >>> 1747: __ addi(to, to, 1); >>> 1748: __ subi(count, count, 1); >>> 1749: __ bnez(count, L_loop); >> >> If we unroll the byte storage, will there be additional performance gains when the `count` is less than 8? > > yes, I think normally if we unroll the byte storage we can gains additional performance. But sometime the dest address may not be aligned with the count, make the performance very poor on some align sensitive hardware. An additional alignment is required, it seems store the bytes one by one with a loop may be a simple way with limited performance loss compare with it? The current version is okay. I mean, we can unroll the storage bytes and save some "bnez" by eliminating the loop, something like this: bind(unroll_4); test_bit(tmp, count, 2); beqz(tmp, unroll_2); sb(value, Address(dest, 0); sb(value, Address(dest, 1); sb(value, Address(dest, 2); sb(value, Address(dest, 3); addi(dest, dest, 4); subi(count, count, 4); bind(unroll_2); test_bit(tmp, count, 1); beqz(tmp, unroll_1); sb(value, Address(dest, 0); sb(value, Address(dest, 1); addi(dest, dest, 2); subi(count, count, 2); bind(unroll_1); test_bit(tmp, count, 0); beqz(tmp, end); sb(value, Address(dest, 0); addi(dest, dest, 1); subi(count, count, 1); bind(end); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2094920943 From rehn at openjdk.org Mon May 19 06:30:26 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 19 May 2025 06:30:26 GMT Subject: RFR: 8357056: RISC-V: Asm fixes - load/store width Message-ID: Hi, please consider. While working on https://github.com/openjdk/jdk/pull/25252, I notice: - Major op code was just repeat - Width coded in binary - Stores have mixed up rs1 and rs2 - Bonus, fsd used a macro for no reason I think this improves readability. Tested tier1 Thanks, Robbin ------------- Commit messages: - Fixes Changes: https://git.openjdk.org/jdk/pull/25253/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25253&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357056 Stats: 148 lines in 1 file changed: 87 ins; 28 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/25253.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25253/head:pull/25253 PR: https://git.openjdk.org/jdk/pull/25253 From duke at openjdk.org Mon May 19 06:47:53 2025 From: duke at openjdk.org (Anjian-Wen) Date: Mon, 19 May 2025 06:47:53 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v10] In-Reply-To: References: Message-ID: On Mon, 19 May 2025 06:22:21 GMT, Feilong Jiang wrote: >> yes, I think normally if we unroll the byte storage we can gains additional performance. But sometime the dest address may not be aligned with the count, make the performance very poor on some align sensitive hardware. An additional alignment is required, it seems store the bytes one by one with a loop may be a simple way with limited performance loss compare with it? > > The current version is okay. I mean, we can unroll the storage bytes and save some "bnez" by eliminating the loop, something like this: > > bind(unroll_4); > test_bit(tmp, count, 2); > beqz(tmp, unroll_2); > sb(value, Address(dest, 0); > sb(value, Address(dest, 1); > sb(value, Address(dest, 2); > sb(value, Address(dest, 3); > addi(dest, dest, 4); > subi(count, count, 4); > > bind(unroll_2); > test_bit(tmp, count, 1); > beqz(tmp, unroll_1); > sb(value, Address(dest, 0); > sb(value, Address(dest, 1); > addi(dest, dest, 2); > subi(count, count, 2); > > bind(unroll_1); > test_bit(tmp, count, 0); > beqz(tmp, end); > sb(value, Address(dest, 0); > addi(dest, dest, 1); > subi(count, count, 1); > > bind(end); I understand, that makes sence, it seems can reduce 4 jump when count is 7, I will test on that later, thanks!! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2094953863 From epeter at openjdk.org Mon May 19 06:50:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 May 2025 06:50:59 GMT Subject: RFR: 8355512: Test compiler/vectorization/TestVectorZeroCount.java times out with -XX:TieredStopAtLevel=3 [v2] In-Reply-To: <2RgkiVAWZ2fjJ-V4DP68sSiGAe4GzGAujK2t5yhYpyQ=.e26331df-e781-4b4c-856d-edde038c2389@github.com> References: <2RgkiVAWZ2fjJ-V4DP68sSiGAe4GzGAujK2t5yhYpyQ=.e26331df-e781-4b4c-856d-edde038c2389@github.com> Message-ID: On Mon, 19 May 2025 05:01:43 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This is a small patch to TestVectorZeroCount to make it only execute when C2 is enabled, to fix a timeout with -XX:TieredStopAtLevel=3. This test takes a long time to finish without C2 because it iterates through all of the integers twice. Since the intention of the test is to stress the C2-specific `numberOfLeadingZeros` and `numberOfTrailingZeros` intrinsics, I think it makes sense to limit it to running with C2 only. >> >> Reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Change requires to run test on Graal as well Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25243#pullrequestreview-2849580311 From epeter at openjdk.org Mon May 19 06:50:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 May 2025 06:50:59 GMT Subject: RFR: 8355512: Test compiler/vectorization/TestVectorZeroCount.java times out with -XX:TieredStopAtLevel=3 In-Reply-To: References: <_ZX5WU8JRDjnkJg7YiE29_Pfzp6LhgpWxKBAfo9rKBE=.a7fc5946-b082-43f3-9ba6-e256304e43e3@github.com> Message-ID: On Mon, 19 May 2025 04:57:53 GMT, Jasmine Karthikeyan wrote: >> @jaskarth @chhagedorn Hmm, this means that this test could not be run with GraalVM, for example. That's a shame. >> Do we not have some way to assert that there should be "some fast compiler"? Ah, what does `vm.flavor == "server"` do? Because I have seen lines like `vm.flavor == "server" & !vm.graal.enabled` before. > > @eme64 I think `vm.flavor == "server"` ensures that the test runs on server-class VMs, but I wasn't sure if manually setting the `TieredStopAtLevel` changes that designation. I looked at some more tests and I saw that they were using `@requires vm.flavor == "server" & (vm.opt.TieredStopAtLevel == null | vm.opt.TieredStopAtLevel == 4)`, which looks like a good way to check for any tier4 compiler. I've updated the patch to use that method instead, which should let it run on graal as well. @jaskarth Thanks for doing the research on this, nice to see that there is a solution :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25243#issuecomment-2889815708 From epeter at openjdk.org Mon May 19 06:58:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 May 2025 06:58:59 GMT Subject: RFR: 8349138: Optimize Math.copySign API for Intel e-core targets [v4] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 17:28:48 GMT, Jatin Bhateja wrote: >> Math.copySign is only intrinsified on x86 targets supporting the AVX512 feature. >> Intel E-core Xeons support only the AVX2 feature set and still compile Java implementation which is composed of logical operations. >> >> Since there is a 3-cycle penalty for copying incoming float/double values to GPRs before being operated upon by logical operation there is an opportunity to optimize this using an efficient instruction sequence. >> >> Patch uses ANDPS and ANDPD logical instruction to generate efficient instruction sequences to absorb domain copy over penalty. Also, performs minor tuning for existing AVX512 instruction sequence based on VPTERNLOG instruction. >> >> Following are the performance numbers of the following existing microbenchmark >> https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/vm/compiler/Signum.java >> >> Patch passes following validation test >> [test/jdk/java/lang/Math/IeeeRecommendedTests.java >> ](https://github.com/openjdk/jdk/blob/master/test/jdk/java/lang/Math/IeeeRecommendedTests.java) >> >> >> Granite Rapids-AP (P-core Xeon) >> Baseline AVX512: >> Benchmark Mode Cnt Score Error Units >> Signum._5_copySignFloatTest thrpt 2 1296.141 ops/ns >> Signum._7_copySignDoubleTest thrpt 2 838.954 ops/ns >> >> Withopt : >> Benchmark Mode Cnt Score Error Units >> Signum._5_copySignFloatTest thrpt 2 940.240 ops/ns >> Signum._7_copySignDoubleTest thrpt 2 967.370 ops/ns >> >> Baseline AVX2: >> Benchmark Mode Cnt Score Error Units >> Signum._5_copySignFloatTest thrpt 2 63.673 ops/ns >> Signum._7_copySignDoubleTest thrpt 2 26.898 ops/ns >> >> Withopt : >> Benchmark Mode Cnt Score Error Units >> Signum._5_copySignFloatTest thrpt 2 785.801 ops/ns >> Signum._7_copySignDoubleTest thrpt 2 558.710 ops/ns >> >> Sierra Forest (E-core Xeon) >> Baseline: >> Benchmark (seed) Mode Cnt Score Error Units >> o.o.b.vm.compiler.Signum._5_copySignFloatTest N/A thrpt 2 40.528 ops/ns >> o.o.b.vm.compiler.Signum._7_copySignDoubleTest N/A thrpt 2 25.101 ops/ns >> >> Withopt: >> Benchmark (seed) Mode Cnt Score Error Units >> o.o.b.vm.compiler.Signum._5_copySignFloatTest N/A thrpt 2 676.... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Review comments resolutions > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8349138 > - Adding vector support along with some refactoring. > - Adding IR framework verification test > - 8349138: Optimize Math.copySign API for Intel e-core and p-core targets Changes requested by epeter (Reviewer). test/hotspot/jtreg/compiler/intrinsics/math/TestCopySignIntrinsic.java line 86: > 84: fsign[i] = genFloat.next(); > 85: dsign[i] = genFloat.next(); > 86: } Two comments: - You probably wanted to use the double generator for the double arrays, right? - You can fill a whole array directly with e.g. `Generators.G.fill(genFloat, fmagniture)`. test/hotspot/jtreg/compiler/intrinsics/math/TestCopySignIntrinsic.java line 105: > 103: public void checkCopySignF() { > 104: for (int i = 0; i < SIZE; i++) { > 105: Verify.checkEQ(afresult[i], efresult[i]); You could add a comment that we consider NaN with different encoding as the same value. ------------- PR Review: https://git.openjdk.org/jdk/pull/23386#pullrequestreview-2849597497 PR Review Comment: https://git.openjdk.org/jdk/pull/23386#discussion_r2094967399 PR Review Comment: https://git.openjdk.org/jdk/pull/23386#discussion_r2094970598 From mchevalier at openjdk.org Mon May 19 07:01:52 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 19 May 2025 07:01:52 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: <7e0IhYYv_1dDlLgmUM8rKj5bjDx3lIhY2PRt-fC-rTs=.35437a80-80c7-4332-9339-a6f047b73289@github.com> References: <7e0IhYYv_1dDlLgmUM8rKj5bjDx3lIhY2PRt-fC-rTs=.35437a80-80c7-4332-9339-a6f047b73289@github.com> Message-ID: On Tue, 13 May 2025 03:12:29 GMT, Quan Anh Mai wrote: >>> I think a very simple approach you can take is having CallPureNode as a pure data node >> >> It's not as simple as it seems. In order to work reliably it requires full control of the code being called, so without extra work it is appropriate for generated stubs only. If you want to call some native code VM doesn't control, then either all caller-saved registers should be preserved across the call (which may be prohibitively expensive) or it should be made explicit there's a call taking place so all ABI effects are taken into account. > > @iwanowww I believe `effect(CALL)` marks that a call is taking place and the register allocator will know how to save the registers accordingly. Note that on arm, long division is implemented as a call: > > https://github.com/openjdk/jdk/blob/adebfa7ffda6383f5793278ced14a193066c5f6a/src/hotspot/cpu/arm/arm.ad#L5962 > > And `SharedRuntime::ldiv` is implemented in C++: > > https://github.com/openjdk/jdk/blob/adebfa7ffda6383f5793278ced14a193066c5f6a/src/hotspot/share/runtime/sharedRuntime.cpp#L272 I like @merykitty's suggestion, but I don't understand how bad are the disadvantages of it. Commoning can be prevented as you mentioned above. As for scheduling, isn't it the same problem for many nodes? If we have something like var x = anOject.aField; // anObject known to be not null if (flag) { // flag independent of `anObject` // something with x } else { // [...] nothing with x } I don't think there is any ordering between the if and the definition of `x`, and so we should push the latter under the if. And conversely, if the declaration is already in the branch in the original code, we should not let it float above. Or in case of loop, we should rather put it outside as much as possible. But none of that seems enforced by edges: memory node is not a CFG node, the nodes if the `if(flag)` might not use memory (so no memory edges)... The same would be true for an arithmetic node (like `AddI`, for instance), but we could argue those are cheap (even if in a loop, cheap becomes expensive), while a memory access is not that cheap. So, don't the problems we have with @merykitty's pure-call-as-pure-data-node suggestion already exist for other node kinds? And if we would have troubles with scheduling of pure calls, shouldn't we have this kind of issue already? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2889840427 From qamai at openjdk.org Mon May 19 07:24:01 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 19 May 2025 07:24:01 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: <4vbXpgvmXv6Ba1fEkMKIRpUnXZ-QVdAZ7rgicqxVhpM=.7dda802c-9b8a-459d-9bd7-7a83d9fc1744@github.com> References: <4vbXpgvmXv6Ba1fEkMKIRpUnXZ-QVdAZ7rgicqxVhpM=.7dda802c-9b8a-459d-9bd7-7a83d9fc1744@github.com> Message-ID: <_iOKkIEZDrhUNSnn4GshsjW79IzVkUyY31LozGq8fcI=.01ecf0ab-641a-427d-bb65-f657df4f49e4@github.com> On Thu, 15 May 2025 21:56:32 GMT, Vladimir Ivanov wrote: >> A first part toward a better support of pure functions. >> >> ## Pure Functions >> >> Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. >> >> ## Scope >> >> We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. >> >> ## Implementation Overview >> >> We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! >> >> IR framework and IGV needed a little bit of fixing. >> >> Thanks, >> Marc > > Interesting! I wasn't aware ADLC already features such support. Thanks for the pointers. > > It does look attractive, especially for platform-specific use cases. But there are some pitfalls which makes it hard to use on its own. In particular, data nodes are aggressively commoned and freely flow in the graph. Unless it is taken into account during GVN and code motion, the final schedule may end up far from optimal. (In other words, it's highly beneficial to match only expensive nodes in such a way.) Moreover, some optimizations are highly sensitive to the presence of calls. (Think of the consequences of a call scheduled inside a heavily vectorized loop.) > > Macro-expansion also suffers from some of those issues, but still IMO an explicit `Call` node is a more appropriate solution to the problem. Tbh I don't understand @iwanowww arguments. We have expensive data nodes such as `SqrtD` that have control inputs to prevent them floating too aggressively. Additionally, a `CallNode` is pinned AT its control input, while a data node is pinned UNDER its control input. It gives the scheduler much more freedom scheduling a data node to a better location compared to a call node. Ideally, what we want to do with expensive data nodes is to common them aggressively like any other data node. Then, during code motion, we can clone them if it is beneficial. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2889891820 From duke at openjdk.org Mon May 19 07:26:56 2025 From: duke at openjdk.org (duke) Date: Mon, 19 May 2025 07:26:56 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases [v6] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 15:00:30 GMT, Manuel H?ssig wrote: >> This PR introduces the flag `-XX:PrintPhaseLevel` that works like the flag `-XX:PrintIdealGraphLevel` and prints the name phases of a C2 compilation (essentially what we have in the left side bar in IGV) to the terminal. This allows redirecting the output to a file and comparing phase decisions between two compilations. Further, it is useful in conjunction with loop opts tracing to immediately see in which phase a certain optimization happened. >> >>
>> Output with `-XX:PrintPhaseLevel=2` >> >> >>> java-fastdebug -Xbatch -XX:CompileCommand=compileonly,TestLoop.test10 -XX:CompileCommand=printcompilation,TestLoop.test* -XX:PrintPhaseLevel=2 TestLoop.java >> CompileCommand: compileonly TestLoop.test10 bool compileonly = true >> CompileCommand: PrintCompilation TestLoop.test* bool PrintCompilation = true >> 3577 98 % b 3 TestLoop::test10 @ 2 (64 bytes) >> 3584 99 b 3 TestLoop::test10 (64 bytes) >> 3648 100 % b 4 TestLoop::test10 @ 2 (64 bytes) >> 1. After Parsing >> 2. Iter GVN 1 >> 3. Incremental Inline >> 4. Incremental Boxing Inline >> 5. Before Loop Optimizations >> 6. PhaseIdealLoop 1 >> 7. PhaseIdealLoop 2 >> 8. PhaseIdealLoop 3 >> 9. Before PhaseCCP 1 >> 10. PhaseCCP 1 >> 11. Iter GVN 2 >> 12. PhaseIdealLoop iterations >> 13. After Loop Optimizations >> 14. After Macro Expansion >> 15. Barrier expand >> 16. Optimize finished >> 17. Before matching >> 18. After matching >> 19. Global code motion >> 20. Register Allocation >> 21. Final Code >> 3668 103 b 4 TestLoop::test10 (64 bytes) >> 1. After Parsing >> 2. Iter GVN 1 >> 3. Incremental Inline >> 4. Incremental Boxing Inline >> 5. Before Loop Optimizations >> 6. PhaseIdealLoop 1 >> 7. PhaseIdealLoop 2 >> 8. PhaseIdealLoop 3 >> 9. Before PhaseCCP 1 >> 10. PhaseCCP 1 >> 11. Iter GVN 2 >> 12. PhaseIdealLoop iterations >> 13. PhaseIdealLoop iterations 2 >> 14. PhaseIdealLoop iterations 3 >> 15. PhaseIdealLoop iterations 4 >> 16. PhaseIdealLoop iterations 5 >> 17. PhaseIdealLoop iterations 6 >> 18. PhaseIdealLoop iterations 7 >> 19. PhaseIdealLoop iterations 8 >> 20. PhaseIdealLoop iterations 9 >> 21. After Loop Optimizations >> 22. After Macro Expansion >> 23. Barrier expand >> 24. Optimize finished >> 25. Before matching >> 26. After matching >> 27. Global code motion >> 28. Registe... > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from @chhagedorn > > Co-authored-by: Christian Hagedorn @mhaessig Your change (at version 69873d356286f3af36ae5cc88f4672b064e57c36) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25183#issuecomment-2889901028 From epeter at openjdk.org Mon May 19 07:31:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 May 2025 07:31:15 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v39] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 15:31:42 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > tutorial with mutable and subtyping I'm realizing that we have to do a bit more work on the `Name`s. I'm not yet happy with the API, and I'll need your help here :) As a reminder: the `Name`s are there to model variables and fields, but possibly also other "names" like methods, classes, etc. Necessary features: - Add a `Name` for the current code scope. - Sample a random `Name` from the current code scope. - The `Name`s each have a `Type`. For example, this is the Java type of a variable/field. These `Type`s could be in subtype/supertype relations with each other. - This allows us to sample from only the `int` variables/fields, or from a fields/variables that are of a specific `Type`, or subtype thereof. - Each `Name` has a weight, so that sampling can be biased, i.e. some variables/fields are used more often than others. - We want to check if there are any variables of a type or subtype, so we can decide if we can even sample, or have to do something else. The current API: - `addName(new Name(String name, Type type, boolean onlyMutable, int weight))` - `sampleName(Type type, boolean onlyMutable)` -> samples from names of `type` and subtype thereof, possibly constraining to only mutable names if requested. - `weighNames(Type type, boolean onlyMutable)` -> weighs all names of `type` and subtype thereof, possibly constraining to only mutable names if requested. So far so good. One thing I missed: in some cases, you also want to sample from a `Type` or any supertype, and not just subtypes. I had so far only considered loading values from random variables: in that scenario, you must receive an Object/value that fits in your requested `Type`, which can be of that `Type` or any subtype. But there is the inverse problem: you want to store a given value, and that value can only be stored to a `Type` or a supertype thereof. And currently, the API does not allow you to only sample from a `Type` or supertypes. The API already has methods with multiple arguments, adding a `subtype/supertype` flag will make it even more clunky. Plus, it may be helpful to the user to also have a way to retrieve all `Name`s so that they can understand better what is happening in their code generation. I've been thinking about a new interface like this: - `addName(String name, Type type, boolean onlyMutable, int weight)`, i.e. removing the need for the user to instantiate the `new Name`. - We could consider reducing the number of arguments here. - `addMutableName(String name, Type type)` could give a mutable name with weight 1. - `addImmutableName(String name, Type type)` could give a immutable name with weight 1. - Access to the `Name`s could be given in more of a stream-like interface: - `names()` returns a `NameSetView`. - We can sample from it with `names().sample()`, picking a random element biased by the weights. - We can further filter the `NameSetView`, with: - `names().onlyMutable()` to only get mutable `Names`, and return a filtered `NameSetView`. - `names().subtypeOf(Type type)` and `names().supertypeOf(Type type)` - We could even add a custom `NameSetView.map(filter)`, with a binary predicate `filter`, so the user has even more control. - Next to `names().sample()`, we could also have `names().weight()` and even a `names().toList()` so the user can access all names directly. Another benefit of this `NameSetView` approach is that it is much easier to extend. @chhagedorn We can discuss the options this afternoon in an offline meeting, I just thought I wanted to sketch my ideas already now :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2889913886 From shade at openjdk.org Mon May 19 07:32:56 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 May 2025 07:32:56 GMT Subject: RFR: 8357166: Many AOT tests failed with VM crash In-Reply-To: <_BC6P3NWZR1E2i8UxpXIVf3UggbEh3eRkwDpfNcNUDM=.7cede7ec-df62-4c0a-9410-caa38ba18428@github.com> References: <_BC6P3NWZR1E2i8UxpXIVf3UggbEh3eRkwDpfNcNUDM=.7cede7ec-df62-4c0a-9410-caa38ba18428@github.com> Message-ID: <-5pwU95zCnUwjVYpsSAVuPtoWyJIJbU0pffcg3p5qBU=.c15e324b-7425-4fa9-a459-387c37c8856b@github.com> On Fri, 16 May 2025 22:22:21 GMT, Vladimir Kozlov wrote: > Disable AOT runtime blobs generation when VerifyOops flag is on. > AOT adapters are not affected - they don't have oops operation. > > Tested tier1 and tier7-comp (which failed without fix) Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25277#pullrequestreview-2849693756 From mhaessig at openjdk.org Mon May 19 07:42:58 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 19 May 2025 07:42:58 GMT Subject: Integrated: 8355970: C2: Add command line option to print the compile phases In-Reply-To: References: Message-ID: On Mon, 12 May 2025 13:03:16 GMT, Manuel H?ssig wrote: > This PR introduces the flag `-XX:PrintPhaseLevel` that works like the flag `-XX:PrintIdealGraphLevel` and prints the name phases of a C2 compilation (essentially what we have in the left side bar in IGV) to the terminal. This allows redirecting the output to a file and comparing phase decisions between two compilations. Further, it is useful in conjunction with loop opts tracing to immediately see in which phase a certain optimization happened. > >
> Output with `-XX:PrintPhaseLevel=2` > > >> java-fastdebug -Xbatch -XX:CompileCommand=compileonly,TestLoop.test10 -XX:CompileCommand=printcompilation,TestLoop.test* -XX:PrintPhaseLevel=2 TestLoop.java > CompileCommand: compileonly TestLoop.test10 bool compileonly = true > CompileCommand: PrintCompilation TestLoop.test* bool PrintCompilation = true > 3577 98 % b 3 TestLoop::test10 @ 2 (64 bytes) > 3584 99 b 3 TestLoop::test10 (64 bytes) > 3648 100 % b 4 TestLoop::test10 @ 2 (64 bytes) > 1. After Parsing > 2. Iter GVN 1 > 3. Incremental Inline > 4. Incremental Boxing Inline > 5. Before Loop Optimizations > 6. PhaseIdealLoop 1 > 7. PhaseIdealLoop 2 > 8. PhaseIdealLoop 3 > 9. Before PhaseCCP 1 > 10. PhaseCCP 1 > 11. Iter GVN 2 > 12. PhaseIdealLoop iterations > 13. After Loop Optimizations > 14. After Macro Expansion > 15. Barrier expand > 16. Optimize finished > 17. Before matching > 18. After matching > 19. Global code motion > 20. Register Allocation > 21. Final Code > 3668 103 b 4 TestLoop::test10 (64 bytes) > 1. After Parsing > 2. Iter GVN 1 > 3. Incremental Inline > 4. Incremental Boxing Inline > 5. Before Loop Optimizations > 6. PhaseIdealLoop 1 > 7. PhaseIdealLoop 2 > 8. PhaseIdealLoop 3 > 9. Before PhaseCCP 1 > 10. PhaseCCP 1 > 11. Iter GVN 2 > 12. PhaseIdealLoop iterations > 13. PhaseIdealLoop iterations 2 > 14. PhaseIdealLoop iterations 3 > 15. PhaseIdealLoop iterations 4 > 16. PhaseIdealLoop iterations 5 > 17. PhaseIdealLoop iterations 6 > 18. PhaseIdealLoop iterations 7 > 19. PhaseIdealLoop iterations 8 > 20. PhaseIdealLoop iterations 9 > 21. After Loop Optimizations > 22. After Macro Expansion > 23. Barrier expand > 24. Optimize finished > 25. Before matching > 26. After matching > 27. Global code motion > 28. Register Allocation > 29. Final Code > >
> >
> Output with `-XX:PrintPhaseLevel=2` in conjunction with loo... This pull request has now been integrated. Changeset: 50a7c61d Author: Manuel H?ssig Committer: Marc Chevalier URL: https://git.openjdk.org/jdk/commit/50a7c61d28b9885ff48f4fcd8bfd460b507bbcef Stats: 49 lines in 5 files changed: 28 ins; 9 del; 12 mod 8355970: C2: Add command line option to print the compile phases Reviewed-by: chagedorn, kvn, mchevalier ------------- PR: https://git.openjdk.org/jdk/pull/25183 From mchevalier at openjdk.org Mon May 19 07:42:57 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 19 May 2025 07:42:57 GMT Subject: RFR: 8355970: C2: Add command line option to print the compile phases [v6] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 15:00:30 GMT, Manuel H?ssig wrote: >> This PR introduces the flag `-XX:PrintPhaseLevel` that works like the flag `-XX:PrintIdealGraphLevel` and prints the name phases of a C2 compilation (essentially what we have in the left side bar in IGV) to the terminal. This allows redirecting the output to a file and comparing phase decisions between two compilations. Further, it is useful in conjunction with loop opts tracing to immediately see in which phase a certain optimization happened. >> >>
>> Output with `-XX:PrintPhaseLevel=2` >> >> >>> java-fastdebug -Xbatch -XX:CompileCommand=compileonly,TestLoop.test10 -XX:CompileCommand=printcompilation,TestLoop.test* -XX:PrintPhaseLevel=2 TestLoop.java >> CompileCommand: compileonly TestLoop.test10 bool compileonly = true >> CompileCommand: PrintCompilation TestLoop.test* bool PrintCompilation = true >> 3577 98 % b 3 TestLoop::test10 @ 2 (64 bytes) >> 3584 99 b 3 TestLoop::test10 (64 bytes) >> 3648 100 % b 4 TestLoop::test10 @ 2 (64 bytes) >> 1. After Parsing >> 2. Iter GVN 1 >> 3. Incremental Inline >> 4. Incremental Boxing Inline >> 5. Before Loop Optimizations >> 6. PhaseIdealLoop 1 >> 7. PhaseIdealLoop 2 >> 8. PhaseIdealLoop 3 >> 9. Before PhaseCCP 1 >> 10. PhaseCCP 1 >> 11. Iter GVN 2 >> 12. PhaseIdealLoop iterations >> 13. After Loop Optimizations >> 14. After Macro Expansion >> 15. Barrier expand >> 16. Optimize finished >> 17. Before matching >> 18. After matching >> 19. Global code motion >> 20. Register Allocation >> 21. Final Code >> 3668 103 b 4 TestLoop::test10 (64 bytes) >> 1. After Parsing >> 2. Iter GVN 1 >> 3. Incremental Inline >> 4. Incremental Boxing Inline >> 5. Before Loop Optimizations >> 6. PhaseIdealLoop 1 >> 7. PhaseIdealLoop 2 >> 8. PhaseIdealLoop 3 >> 9. Before PhaseCCP 1 >> 10. PhaseCCP 1 >> 11. Iter GVN 2 >> 12. PhaseIdealLoop iterations >> 13. PhaseIdealLoop iterations 2 >> 14. PhaseIdealLoop iterations 3 >> 15. PhaseIdealLoop iterations 4 >> 16. PhaseIdealLoop iterations 5 >> 17. PhaseIdealLoop iterations 6 >> 18. PhaseIdealLoop iterations 7 >> 19. PhaseIdealLoop iterations 8 >> 20. PhaseIdealLoop iterations 9 >> 21. After Loop Optimizations >> 22. After Macro Expansion >> 23. Barrier expand >> 24. Optimize finished >> 25. Before matching >> 26. After matching >> 27. Global code motion >> 28. Registe... > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from @chhagedorn > > Co-authored-by: Christian Hagedorn No more insights, but good work. src/hotspot/share/opto/compile.hpp line 644: > 642: > 643: // check the CompilerOracle for special behaviours for this compile > 644: bool method_has_option(CompileCommandEnum option) const { Good! ------------- Marked as reviewed by mchevalier (Committer). PR Review: https://git.openjdk.org/jdk/pull/25183#pullrequestreview-2849719413 PR Review Comment: https://git.openjdk.org/jdk/pull/25183#discussion_r2095044114 From thartmann at openjdk.org Mon May 19 08:04:54 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 19 May 2025 08:04:54 GMT Subject: RFR: 8355230: Crash in fuzzer tests: assert(n != nullptr) failed: must not be null In-Reply-To: References: Message-ID: <49ygi2IN3BH7wNu86C2UQSMTkVTxL0DFB46_-Ct7Gsw=.bcef926d-9d29-41ac-8957-adc74001b269@github.com> On Fri, 16 May 2025 14:16:29 GMT, Roland Westrelin wrote: > During IGVN, `TypeNode::make_paths_from_here_dead()` follows data > nodes until a `Phi`. The `Region` input for the input that that logic > goes through to reach the `Phi` is `null` causing the crash. I propose > simply adding an extra check for that corner case. The test could use some simplification / cleanup but looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25268#pullrequestreview-2849804863 From shade at openjdk.org Mon May 19 08:08:00 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 May 2025 08:08:00 GMT Subject: RFR: 8356946: x86: Optimize interpreter profile updates In-Reply-To: References: Message-ID: On Wed, 14 May 2025 09:48:54 GMT, Aleksey Shipilev wrote: > Noticed two awkward things in current x86 interpreter profiling code. > > First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. > > Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. > > So we can save a few instructions / memory accesses on this path. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25223#issuecomment-2890035032 From shade at openjdk.org Mon May 19 08:08:00 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 May 2025 08:08:00 GMT Subject: Integrated: 8356946: x86: Optimize interpreter profile updates In-Reply-To: References: Message-ID: On Wed, 14 May 2025 09:48:54 GMT, Aleksey Shipilev wrote: > Noticed two awkward things in current x86 interpreter profiling code. > > First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. > > Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. > > So we can save a few instructions / memory accesses on this path. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `tier1` > - [x] Linux x86_64 server fastdebug, `all` This pull request has now been integrated. Changeset: 67fb1ee7 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/67fb1ee7f11c840a28ace21d381c86353fd9b22b Stats: 40 lines in 2 files changed: 0 ins; 30 del; 10 mod 8356946: x86: Optimize interpreter profile updates Reviewed-by: kvn, jsjolen ------------- PR: https://git.openjdk.org/jdk/pull/25223 From epeter at openjdk.org Mon May 19 08:13:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 May 2025 08:13:01 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v22] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 14:59:18 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request incrementally with 11 additional commits since the last revision: > > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/graphKit.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - ... and 1 more: https://git.openjdk.org/jdk/compare/b0129598...2164c15f src/hotspot/share/opto/loopnode.hpp line 219: > 217: > 218: virtual void set_trip_count(julong tc) = 0; > 219: virtual julong trip_count() = 0; GitHub Actions seems to disagree with something here ;) /home/runner/work/jdk/jdk/src/hotspot/share/opto/loopnode.hpp:219:18: error: ?virtual julong BaseCountedLoopNode::trip_count()? was hidden [-Werror=overloaded-virtual] 219 | virtual julong trip_count() = 0; | ^~~~~~~~~~ /home/runner/work/jdk/jdk/src/hotspot/share/opto/loopnode.hpp:310:10: note: by ?julong CountedLoopNode::trip_count() const? 310 | julong trip_count() const { return _trip_count; } | ^~~~~~~~~~ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2095111310 From epeter at openjdk.org Mon May 19 08:18:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 May 2025 08:18:01 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v22] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 14:59:18 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request incrementally with 11 additional commits since the last revision: > > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/graphKit.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - ... and 1 more: https://git.openjdk.org/jdk/compare/b0129598...2164c15f src/hotspot/share/opto/c2_globals.hpp line 839: > 837: product(bool, ShortRunningLongLoop, true, DIAGNOSTIC, \ > 838: "long counted loop/long range checks: don't create loop nest if " \ > 839: "loop runs for small enough number of iterations.") \ It sounds like we are doing this: Disable an exception, which disables an optimization. This double negation can be a little confusing / ambiguous. I wonder if we should instead have a limit here, which we can move to 0, or higher. That would allow us to benchmark with different levels more easily too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2095119671 From fyang at openjdk.org Mon May 19 08:46:52 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 19 May 2025 08:46:52 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v10] In-Reply-To: References: Message-ID: On Mon, 19 May 2025 06:45:31 GMT, Anjian-Wen wrote: >> The current version is okay. I mean, we can unroll the storage bytes and save some "bnez" by eliminating the loop, something like this: >> >> bind(unroll_4); >> test_bit(tmp, count, 2); >> beqz(tmp, unroll_2); >> sb(value, Address(dest, 0); >> sb(value, Address(dest, 1); >> sb(value, Address(dest, 2); >> sb(value, Address(dest, 3); >> addi(dest, dest, 4); >> subi(count, count, 4); >> >> bind(unroll_2); >> test_bit(tmp, count, 1); >> beqz(tmp, unroll_1); >> sb(value, Address(dest, 0); >> sb(value, Address(dest, 1); >> addi(dest, dest, 2); >> subi(count, count, 2); >> >> bind(unroll_1); >> test_bit(tmp, count, 0); >> beqz(tmp, end); >> sb(value, Address(dest, 0); >> addi(dest, dest, 1); >> subi(count, count, 1); >> >> bind(end); > > I understand, that makes sence, it seems can reduce 4 jump when count is 7, I will test on that later, thanks!! Interesting! There's no need to update `dest` and `count` in `unroll_1`. And I think we can reuse existing label names like `L_fill_4`, `L_fill_2`, `L_fill_1` here which will look more obvious. We can make them local with `{` and `}` when they are decared and used. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2095176894 From luhenry at openjdk.org Mon May 19 08:49:52 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 19 May 2025 08:49:52 GMT Subject: RFR: 8357056: RISC-V: Asm fixes - load/store width In-Reply-To: References: Message-ID: On Thu, 15 May 2025 14:46:12 GMT, Robbin Ehn wrote: > Hi, please consider. > > While working on https://github.com/openjdk/jdk/pull/25252, I notice: > - Major op code was just repeat > - Width coded in binary > - Stores have mixed up rs1 and rs2 > - Bonus, fsd used a macro for no reason > > I think this improves readability. > > Tested tier1 > > Thanks, Robbin Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25253#pullrequestreview-2849945008 From mcimadamore at openjdk.org Mon May 19 09:04:59 2025 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Mon, 19 May 2025 09:04:59 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v16] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 12:38:03 GMT, Roland Westrelin wrote: > Right. But it's Maurizio's benchmark. I think it would make sense to integrate it separately. What do you think @mcimadamore ? I tend to agree that it would be better to have the benchmark checked in as part of this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2890216155 From epeter at openjdk.org Mon May 19 09:11:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 May 2025 09:11:01 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v16] In-Reply-To: References: Message-ID: <5XjnBOKh8AL8v4IFn521v3Hu4fkTnrQYVLKUb_J6U3o=.82c9c031-940c-4e2e-972e-9119af6a604b@github.com> On Fri, 9 May 2025 12:38:03 GMT, Roland Westrelin wrote: >>> Interesting work and results with the benchmark! I have a few comments. Will have another look again later. >> >> Thanks for reviewing this @chhagedorn >> I think new commit addresses all your comments. > >> @rwestrel Sorry it took so long for me to look at this. >> >> Did you do some benchmarking to prove that `ShortLoopIter = 1000` is reasonable? > > As mentioned in one of my replies to your questions, I removed that command line argument. > >> The Benchmark you published earlier does not seem to do that, right? [#21630 (comment)](https://github.com/openjdk/jdk/pull/21630#issuecomment-2587016221) >> >> I would also like to see that benchmark integrated. If you are using a benchmark to demonstrate the performance, it should be integrate so others can easily verify on their platform :) > > Right. But it's Maurizio's benchmark. I think it would make sense to integrate it separately. What do you think @mcimadamore ? @rwestrel You could add the benchmark and add @mcimadamore as a contributor :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2890236505 From epeter at openjdk.org Mon May 19 09:11:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 May 2025 09:11:03 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v22] In-Reply-To: References: Message-ID: On Mon, 19 May 2025 08:14:48 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with 11 additional commits since the last revision: >> >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/graphKit.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.hpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.hpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - ... and 1 more: https://git.openjdk.org/jdk/compare/b0129598...2164c15f > > src/hotspot/share/opto/c2_globals.hpp line 839: > >> 837: product(bool, ShortRunningLongLoop, true, DIAGNOSTIC, \ >> 838: "long counted loop/long range checks: don't create loop nest if " \ >> 839: "loop runs for small enough number of iterations.") \ > > It sounds like we are doing this: > Disable an exception, which disables an optimization. > > This double negation can be a little confusing / ambiguous. > > I wonder if we should instead have a limit here, which we can move to 0, or higher. > That would allow us to benchmark with different levels more easily too. It could be named `ShortRunningLongLoopIterationLimit`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2095234651 From epeter at openjdk.org Mon May 19 09:13:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 May 2025 09:13:59 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v22] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 14:59:18 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request incrementally with 11 additional commits since the last revision: > > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/graphKit.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - ... and 1 more: https://git.openjdk.org/jdk/compare/b0129598...2164c15f I also looked back at the results, I think it was this: https://github.com/openjdk/jdk/pull/21630#issuecomment-2587016221 Nice to see that your patch is faster for small iterations. But what I am missing: where do the lines cross? I.e. at what iteration count does the loop-nest become profitable? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2890249311 From epeter at openjdk.org Mon May 19 09:16:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 May 2025 09:16:01 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v16] In-Reply-To: References: Message-ID: On Fri, 9 May 2025 12:38:03 GMT, Roland Westrelin wrote: >>> Interesting work and results with the benchmark! I have a few comments. Will have another look again later. >> >> Thanks for reviewing this @chhagedorn >> I think new commit addresses all your comments. > >> @rwestrel Sorry it took so long for me to look at this. >> >> Did you do some benchmarking to prove that `ShortLoopIter = 1000` is reasonable? > > As mentioned in one of my replies to your questions, I removed that command line argument. > >> The Benchmark you published earlier does not seem to do that, right? [#21630 (comment)](https://github.com/openjdk/jdk/pull/21630#issuecomment-2587016221) >> >> I would also like to see that benchmark integrated. If you are using a benchmark to demonstrate the performance, it should be integrate so others can easily verify on their platform :) > > Right. But it's Maurizio's benchmark. I think it would make sense to integrate it separately. What do you think @mcimadamore ? @rwestrel The PR description is now out of sync with what you are doing. For example, it still mentions `ShortLoopIter`. It would also be nice if the benchmark results were in the PR description. At least links to the relevant mesasges below, because otherwise the most relevant information is burried somewhere in the "hidden items". ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2890256755 From mchevalier at openjdk.org Mon May 19 09:20:02 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 19 May 2025 09:20:02 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll Message-ID: This assert seems a bit too tight. See the JBS issue to check the math: the bound of `trip_count` should be `<= 2^31`, while the current bound is ` < (julong)max_juint/2` = floor((2^32-1)/2) = (2^32-2) / 2 = 2^31-1. ------------- Commit messages: - Relax the assert Changes: https://git.openjdk.org/jdk/pull/25295/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25295&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356647 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25295/head:pull/25295 PR: https://git.openjdk.org/jdk/pull/25295 From thartmann at openjdk.org Mon May 19 09:25:56 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 19 May 2025 09:25:56 GMT Subject: RFR: 8355094: Performance drop in auto-vectorized kernel due to split store [v2] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 09:21:34 GMT, Emanuel Peter wrote: >> **Summary** >> >> Before [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155) / https://github.com/openjdk/jdk/pull/18822, we used to prefer aligning to stores. But in that change, I removed that preference, and since then we have been aligning to loads instead (there is no preference, but since loads usually come before stores in the loop body, the load gets picked). This lead to a performance regression, especially on `x64`. >> >> Especially on `x64`, it is more important to align stores than aligning loads. This is because memory operations that cross a cacheline boundary are split. And `x64` CPU's generally have more throughput for loads than for stores, so splitting a store is worse than splitting a load. >> >> On `aarch64`, the results are less clear. On two machines, the differences were marginal, but surprisingly aligning to loads was marginally faster. On another machine, aligning to stores was significantly faster. I suspect performance depends on the exact `aarch64` implementation. I'm not an `aarch64` specialist, and only have access to a limited number of machines. >> >> **Fix**: make automatic alignment configurable with `SuperWordAutomaticAlignment` (no alignment, align to store, align to load). Default is align to store. >> >> For now, I will just align to stores on all platforms. If someone has various `aarch64` machines, they are welcome do do deeper investigations. Same for other platforms. We could always turn the flag into a platform dependent one, and set different defaults depending on the exact CPU. >> >> If you are interested you can read my investigations/benchmark results below. Therre are a lot of colorful plots ? ? >> >> **FYI about Vector API:** if you are working with the Vector API, you may also want to worry about **alignment**, because there can be a **significant performance impact** (30%+ in some cases). You may also want to know about **4k aliasing**, discussed below. >> >> **Shoutout:** >> - @jatin-bhateja filed the regression, and explained that it was about split stores. >> - @mhaessig helped me talk through some of the early benchmarks. >> - @iwanowww pointed me to the 4k aliasing explanation. >> >> -------------------- >> >> **Introduction** >> >> I had long lived with the **theory that on modern CPUs, misalignment has no consequence, especially no performance impact**. When you google, many sources say that misalignment used to be an issue on older CPUs, but not any more. >> >> That may **technically** be true: >> - A ... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/superword.cpp > > Co-authored-by: Manuel H?ssig > Impressive analysis, Emanuel! Very deep, thorough, and insightful. +1 to this. Great work, Emanuel! The fix looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25065#pullrequestreview-2850063795 From mhaessig at openjdk.org Mon May 19 09:27:56 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 19 May 2025 09:27:56 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value In-Reply-To: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: On Thu, 15 May 2025 15:13:18 GMT, Hannes Greule wrote: > This change improves the precision of the `Mod(I|L)Node::Value()` functions. > > I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early. > The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions. > > ### Monotonicity > > Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range). > > ### Testing > > I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something). > > Please review and let me know what you think. > > ### Other > > The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508. > > During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into: > - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement? > - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd. Thank you for working on this, @SirYwell. I especially like the citations directly from the spec to motivate and justify the optimizations. I commented only on the `int` side of things, but the comments apply equally to the `long` changes. You exclude zero from the dividend magnitude only based on the constant check. That is not correct. You have to check the range as well to exclude zero. Hence, it would also be nice to have test cases where the value is known to be in a given range in the ideal graph. To get such a value, you can call `array.length()`, which is always `>=0`, or use `Parse::sharpen_type_after_if()`: https://github.com/openjdk/jdk/blob/effe40a24c29dc507eea5efef7b0736a33bc34a7/src/hotspot/share/opto/parse2.cpp#L1772-L1794 src/hotspot/share/opto/divnode.cpp line 1229: > 1227: // Mod by zero? Throw exception at runtime! > 1228: if (i2->is_con() && i2->get_con() == 0) { > 1229: return TypeInt::ZERO; Like @merykitty , I am unsure of returning zero in this case. The original code probably returned TypeInt::POS for the same reason you bring up below: > JVMS `irem` bytecode: "the result of the remainder operation can be negative only if the dividend is negative and can be positive only if the dividend is positive" Hence, I would argue to keep that oldbehaviorr, since the result of a modulo with zero is not defined to be zero. I like the idea of returning TOP, but that needs to be tested really well, since all uses of the modulo computation will get removed. I am not familiar enough with the type lattice to reason about the formal correctness of this. src/hotspot/share/opto/divnode.cpp line 1242: > 1240: // The magnitude of the divisor is in range [1, 2^31]. > 1241: // We know it isn't 0 as we handled that above. > 1242: // That means at least one value is nonzero, so its absolute value is bigger than zero. Is that really what you checked above? AFAIU, above you check whether the divisor is a zero constant. But if the divisor is not a constant, then its range might still contain zero. You should check this claim using the bounds, otherwise this will not hold. test/hotspot/jtreg/compiler/c2/gvn/ModINodeValueTests.java line 39: > 37: * @bug 8356813 > 38: * @summary Test that Value method of ModINode is working as expected. > 39: * @library /test/lib / Suggestion: * @summary Test that Value method of ModINode is working as expected. * @key randomness * @library /test/lib / test/hotspot/jtreg/compiler/c2/gvn/ModINodeValueTests.java line 43: > 41: */ > 42: public class ModINodeValueTests { > 43: private static final Generator G = Generators.G.ints(); Nit: please give `G` a more meaningful name like `IntGenerator`. That way, it is easier to see what falls out of the `next()` method. test/hotspot/jtreg/compiler/c2/gvn/ModINodeValueTests.java line 91: > 89: // i.e., posVal % x < 0 => false. > 90: public boolean nonNegativeDividend(int x) { > 91: return x != 0 && 123 % x < 0; It would be nice to have random dividends in the tests as well to further increase coverage. To still get constant folding, you can put random values in final fields. test/hotspot/jtreg/compiler/c2/gvn/ModLNodeValueTests.java line 39: > 37: * @bug 8356813 > 38: * @summary Test that Value method of ModLNode is working as expected. > 39: * @library /test/lib / Suggestion: * @summary Test that Value method of ModLNode is working as expected. * @key randomness * @library /test/lib / ------------- Changes requested by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/25254#pullrequestreview-2849962885 PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2095215903 PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2095201222 PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2095218234 PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2095220887 PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2095224553 PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2095225118 From epeter at openjdk.org Mon May 19 09:32:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 May 2025 09:32:54 GMT Subject: RFR: 8355094: Performance drop in auto-vectorized kernel due to split store [v2] In-Reply-To: References: Message-ID: On Mon, 19 May 2025 09:23:28 GMT, Tobias Hartmann wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/superword.cpp >> >> Co-authored-by: Manuel H?ssig > >> Impressive analysis, Emanuel! Very deep, thorough, and insightful. > > +1 to this. Great work, Emanuel! The fix looks good to me. @TobiHartmann Thank you for the review :) @theRealAph @XiaohongGong Do you have any idea about the somewhat confusing behavior of aarch64 in these benchmarks? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25065#issuecomment-2890320006 From shade at openjdk.org Mon May 19 09:36:00 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 19 May 2025 09:36:00 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function [v5] In-Reply-To: References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> Message-ID: On Fri, 16 May 2025 05:12:38 GMT, kuaiwei wrote: >> I wrote a test to check if every C2 IR node has correct size_of() function. And I found some of them are missed. They added new fields and not add size_of() to reflect new size. In linux, it does not cause issue so far, because gcc allocate more space for alignment and can keep these additional `bool` flags. But it will report failure on windows. And if anyone modified base class, it will cause problem. >> >> PS, My test is in https://github.com/openjdk/jdk/compare/master...kuaiwei:jdk:test/check_node_size , but it has many hack on IR nodes to make test to run. > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Minor change Post-review comment: Doesn't this mean that super-class `Node::size_of` gives us a wrong answer for any node that has its own fields? uint Node::size_of() const { return sizeof(*this); } So, this looks mechanically preventable by making `Node::size_of` pure virtual, and thus _forcing_ subclasses to implement its own `size_of`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25081#issuecomment-2890328747 From luhenry at openjdk.org Mon May 19 10:06:02 2025 From: luhenry at openjdk.org (Ludovic Henry) Date: Mon, 19 May 2025 10:06:02 GMT Subject: RFR: 8350960: RISC-V: Add riscv backend for Float16 operations - vectorization [v3] In-Reply-To: References: Message-ID: <4-XDIYVf8D-tkhxf6OXcl6KYmMTrI9cv6H5Xj9_1wAo=.6ae58c9d-cfa7-465e-8ca5-9b169b1050b5@github.com> On Wed, 14 May 2025 08:30:07 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> It's a follow-up of https://github.com/openjdk/jdk/commit/9a3f9997b68a1f64e53b9711b878fb073c3c9b90. >> Thanks! >> >> ## Test >> >> Performance data >> >> Benchmark | (vectorDim) | Mode | Cnt | Score - patch | Score - master | Improvement (master/patch) | Error | Units >> -- | -- | -- | -- | -- | -- | -- | -- | -- >> Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 382.123 | 2595.718 | 6.793 | 0.631 | ns/op >> Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 563.726 | 5167.687 | 9.167 | 0.063 | ns/op >> Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 888.455 | 9468.714 | 10.658 | 0.147 | ns/op >> Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 1540.255 | 18879.796 | 12.258 | 0.396 | ns/op >> Float16OperationsBenchmark.divBenchmark | 256 | avgt | 10 | 579.959 | 4028.335 | 6.946 | 0.008 | ns/op >> Float16OperationsBenchmark.divBenchmark | 512 | avgt | 10 | 914.634 | 8034.234 | 8.784 | 0.027 | ns/op >> Float16OperationsBenchmark.divBenchmark | 1024 | avgt | 10 | 1494.017 | 15125.924 | 10.124 | 0.292 | ns/op >> Float16OperationsBenchmark.divBenchmark | 2048 | avgt | 10 | 2728.517 | 30197.97 | 11.068 | 32.869 | ns/op >> Float16OperationsBenchmark.fmaBenchmark | 256 | avgt | 10 | 476.764 | 2817.035 | 5.909 | 0.012 | ns/op >> Float16OperationsBenchmark.fmaBenchmark | 512 | avgt | 10 | 707.035 | 5239.438 | 7.41 | 0.129 | ns/op >> Float16OperationsBenchmark.fmaBenchmark | 1024 | avgt | 10 | 1114.29 | 7361.105 | 6.606 | 0.024 | ns/op >> Float16OperationsBenchmark.fmaBenchmark | 2048 | avgt | 10 | 1931.713 | 14465.602 | 7.488 | 1.852 | ns/op >> Float16OperationsBenchmark.maxBenchmark | 256 | avgt | 10 | 501.892 | 3754.563 | 7.481 | 0.408 | ns/op >> Float16OperationsBenchmark.maxBenchmark | 512 | avgt | 10 | 738.148 | 7450.666 | 10.094 | 1.206 | ns/op >> Float16OperationsBenchmark.maxBenchmark | 1024 | avgt | 10 | 1195.262 | 15463.892 | 12.938 | 8.889 | ns/op >> Float16OperationsBenchmark.maxBenchmark | 2048 | avgt | 10 | 2253.656 | 30649.239 | 13.6 | 6.154 | ns/op >> Float16OperationsBenchmark.minBenchmark | 256 | avgt | 10 | 501.873 | 3753.9 | 7.48 ... > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > minor Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25181#pullrequestreview-2850192105 From mli at openjdk.org Mon May 19 10:45:59 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 19 May 2025 10:45:59 GMT Subject: RFR: 8350960: RISC-V: Add riscv backend for Float16 operations - vectorization [v3] In-Reply-To: <4-XDIYVf8D-tkhxf6OXcl6KYmMTrI9cv6H5Xj9_1wAo=.6ae58c9d-cfa7-465e-8ca5-9b169b1050b5@github.com> References: <4-XDIYVf8D-tkhxf6OXcl6KYmMTrI9cv6H5Xj9_1wAo=.6ae58c9d-cfa7-465e-8ca5-9b169b1050b5@github.com> Message-ID: On Mon, 19 May 2025 10:03:30 GMT, Ludovic Henry wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> minor > > Marked as reviewed by luhenry (Committer). Thank you @luhenry @DingliZhang ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25181#issuecomment-2890539133 From roland at openjdk.org Mon May 19 11:10:56 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 19 May 2025 11:10:56 GMT Subject: RFR: 8350329: C2: Div looses dependency on condition that guarantees divisor not zero in counted loop after peeling [v2] In-Reply-To: <-n4NMccK5Dn1ud60aMQ7VCzOz_nXB76BECvqXojFXVw=.7d6bbfa3-b9fe-4779-8974-61267ef343c7@github.com> References: <-n4NMccK5Dn1ud60aMQ7VCzOz_nXB76BECvqXojFXVw=.7d6bbfa3-b9fe-4779-8974-61267ef343c7@github.com> Message-ID: On Fri, 16 May 2025 13:17:01 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Marked as reviewed by thartmann (Reviewer). @TobiHartmann @chhagedorn thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25262#issuecomment-2890608136 From roland at openjdk.org Mon May 19 11:10:58 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 19 May 2025 11:10:58 GMT Subject: Integrated: 8350329: C2: Div looses dependency on condition that guarantees divisor not zero in counted loop after peeling In-Reply-To: References: Message-ID: On Fri, 16 May 2025 08:28:20 GMT, Roland Westrelin wrote: > This is an issue similar to 8349139: the type of the iv phi of a > counted loop is narrowed down so a `Div` node doesn't need a control > input. The loop is then peeled. The `Div` in the loop body is > guaranteed to be non zero only if it is actually executed so the `Div` > is implicitly dependent on the zero trip guard. Then the loop looses > its backedge and the `Div` freely floats. The `Div` instruction is > scheduled above the zero trip guard and faults. Had the `Div` been > control dependent on the zero trip guard, it wouldn't have > executed. The fix, similar to 8349139 is to add a `CastII` on peeling > to make the dependency between what's in the loop body and relies on > the narrowed down type of the iv phi and the zero trip guard explicit. This pull request has now been integrated. Changeset: 26cb016b Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/26cb016b750963a4622985399130024792691984 Stats: 64 lines in 2 files changed: 62 ins; 0 del; 2 mod 8350329: C2: Div looses dependency on condition that guarantees divisor not zero in counted loop after peeling Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/25262 From jbhateja at openjdk.org Mon May 19 11:30:50 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 19 May 2025 11:30:50 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API In-Reply-To: References: Message-ID: On Mon, 19 May 2025 03:10:46 GMT, Xiaohong Gong wrote: >> JDK-8318650 introduced hotspot intrinsification of subword gather load APIs for X86 platforms [1]. However, the current implementation is not optimal for AArch64 SVE platform, which natively supports vector instructions for subword gather load operations using an int vector for indices (see [2][3]). >> >> Two key areas require improvement: >> 1. At the Java level, vector indices generated for range validation could be reused for the subsequent gather load operation on architectures with native vector instructions like AArch64 SVE. However, the current implementation prevents compiler reuse of these index vectors due to divergent control flow, potentially impacting performance. >> 2. At the compiler IR level, the additional `offset` input for `LoadVectorGather`/`LoadVectorGatherMasked` with subword types increases IR complexity and complicates backend implementation. Furthermore, generating `add` instructions before each memory access negatively impacts performance. >> >> This patch refactors the implementation at both the Java level and compiler mid-end to improve efficiency and maintainability across different architectures. >> >> Main changes: >> 1. Java-side API refactoring: >> - Explicitly passes generated index vectors to hotspot, eliminating duplicate index vectors for gather load instructions on >> architectures like AArch64. >> 2. C2 compiler IR refactoring: >> - Refactors `LoadVectorGather`/`LoadVectorGatherMasked` IR for subword types by removing the memory offset input and incorporating it into the memory base `addr` at the IR level. This simplifies backend implementation, reduces add operations, and unifies the IR across all types. >> 3. Backend changes: >> - Streamlines X86 implementation of subword gather operations following the removal of the offset input from the IR level. >> >> Performance: >> The performance of the relative JMH improves up to 27% on a X86 AVX512 system. Please see the data below: >> >> Benchmark Mode Cnt Unit SIZE Before After Gain >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 64 53682.012 52650.325 0.98 >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 256 14484.252 14255.156 0.98 >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 1024 3664.900 3595.615 0.98 >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 4096 908.31... > > Ping again~ could any one please take a look at this PR? Thanks a lot! Hi @XiaohongGong , Very nice work!, Looks good to me, will do some testing and get back. Do you have any idea about following regression? GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms 64 55844.814 48311.847 0.86 GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms 256 15139.459 13009.848 0.85 GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms 1024 3861.834 3284.944 0.85 GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms 4096 938.665 817.673 0.87 Best Regards ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-2890659302 From galder at openjdk.org Mon May 19 12:23:53 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Mon, 19 May 2025 12:23:53 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v5] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Thu, 15 May 2025 12:33:18 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/library_call.cpp line 5554: >> >>> 5552: if (proj->_con == TypeFunc::Memory) { >>> 5553: int alias_idx = C->get_alias_index(proj->adr_type()); >>> 5554: assert(alias_idx == Compile::AliasIdxRaw || alias_idx == elemidx || alias_idx == mark_idx || alias_idx == klass_idx, "should be raw memory or array element type"); >> >> Shouldn't this `assert` be wrapped around an `#ifdef ASSERT` section? > > `assert` is a nop if `ASSERT` not defined. Does that answer your question? Yup, thanks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2095585908 From jkarthikeyan at openjdk.org Mon May 19 12:50:52 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 19 May 2025 12:50:52 GMT Subject: RFR: 8355512: Test compiler/vectorization/TestVectorZeroCount.java times out with -XX:TieredStopAtLevel=3 In-Reply-To: References: <_ZX5WU8JRDjnkJg7YiE29_Pfzp6LhgpWxKBAfo9rKBE=.a7fc5946-b082-43f3-9ba6-e256304e43e3@github.com> Message-ID: On Mon, 19 May 2025 06:48:55 GMT, Emanuel Peter wrote: >> @eme64 I think `vm.flavor == "server"` ensures that the test runs on server-class VMs, but I wasn't sure if manually setting the `TieredStopAtLevel` changes that designation. I looked at some more tests and I saw that they were using `@requires vm.flavor == "server" & (vm.opt.TieredStopAtLevel == null | vm.opt.TieredStopAtLevel == 4)`, which looks like a good way to check for any tier4 compiler. I've updated the patch to use that method instead, which should let it run on graal as well. > > @jaskarth Thanks for doing the research on this, nice to see that there is a solution :) @eme64 @chhagedorn Thanks a lot for the reviews! Does testing need to be run on this change, or can I integrate it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25243#issuecomment-2890888703 From rcastanedalo at openjdk.org Mon May 19 12:53:56 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 May 2025 12:53:56 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Sat, 17 May 2025 09:04:28 GMT, Andrew Haley wrote: > I think you only have to mark both the lea and the memory access with an exception table entry. Could you elaborate a bit more on this part of your suggestion? My understanding is that [C2's `PhaseOutput`](https://github.com/openjdk/jdk/blob/3acfa9e4e7be2f37ac55f97348aad4f74ba802a0/src/hotspot/share/opto/output.hpp#L72) (the component responsible for populating the [implicit null exception table](https://github.com/openjdk/jdk/blob/3acfa9e4e7be2f37ac55f97348aad4f74ba802a0/src/hotspot/share/opto/output.cpp#L3451)) can at most add one entry per Mach node (in this case `zLoadP`), where [the entry key is the address of the first emitted machine instruction](https://github.com/openjdk/jdk/blob/3acfa9e4e7be2f37ac55f97348aad4f74ba802a0/src/hotspot/share/opto/output.cpp#L1611-L1614). Therefore if we want to mark both the lea and the memory access as you suggest, we would need to extend `C2_MacroAssembler` to express which instructions we want to mark and extend C2's `PhaseOutput` to add entries for each of the marked instructions. Is there a simpler way I have missed to achieve this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2095643031 From mli at openjdk.org Mon May 19 13:35:00 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 19 May 2025 13:35:00 GMT Subject: Integrated: 8350960: RISC-V: Add riscv backend for Float16 operations - vectorization In-Reply-To: References: Message-ID: On Mon, 12 May 2025 11:35:40 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > It's a follow-up of https://github.com/openjdk/jdk/commit/9a3f9997b68a1f64e53b9711b878fb073c3c9b90. > Thanks! > > ## Test > > Performance data > > Benchmark | (vectorDim) | Mode | Cnt | Score - patch | Score - master | Improvement (master/patch) | Error | Units > -- | -- | -- | -- | -- | -- | -- | -- | -- > Float16OperationsBenchmark.addBenchmark | 256 | avgt | 10 | 382.123 | 2595.718 | 6.793 | 0.631 | ns/op > Float16OperationsBenchmark.addBenchmark | 512 | avgt | 10 | 563.726 | 5167.687 | 9.167 | 0.063 | ns/op > Float16OperationsBenchmark.addBenchmark | 1024 | avgt | 10 | 888.455 | 9468.714 | 10.658 | 0.147 | ns/op > Float16OperationsBenchmark.addBenchmark | 2048 | avgt | 10 | 1540.255 | 18879.796 | 12.258 | 0.396 | ns/op > Float16OperationsBenchmark.divBenchmark | 256 | avgt | 10 | 579.959 | 4028.335 | 6.946 | 0.008 | ns/op > Float16OperationsBenchmark.divBenchmark | 512 | avgt | 10 | 914.634 | 8034.234 | 8.784 | 0.027 | ns/op > Float16OperationsBenchmark.divBenchmark | 1024 | avgt | 10 | 1494.017 | 15125.924 | 10.124 | 0.292 | ns/op > Float16OperationsBenchmark.divBenchmark | 2048 | avgt | 10 | 2728.517 | 30197.97 | 11.068 | 32.869 | ns/op > Float16OperationsBenchmark.fmaBenchmark | 256 | avgt | 10 | 476.764 | 2817.035 | 5.909 | 0.012 | ns/op > Float16OperationsBenchmark.fmaBenchmark | 512 | avgt | 10 | 707.035 | 5239.438 | 7.41 | 0.129 | ns/op > Float16OperationsBenchmark.fmaBenchmark | 1024 | avgt | 10 | 1114.29 | 7361.105 | 6.606 | 0.024 | ns/op > Float16OperationsBenchmark.fmaBenchmark | 2048 | avgt | 10 | 1931.713 | 14465.602 | 7.488 | 1.852 | ns/op > Float16OperationsBenchmark.maxBenchmark | 256 | avgt | 10 | 501.892 | 3754.563 | 7.481 | 0.408 | ns/op > Float16OperationsBenchmark.maxBenchmark | 512 | avgt | 10 | 738.148 | 7450.666 | 10.094 | 1.206 | ns/op > Float16OperationsBenchmark.maxBenchmark | 1024 | avgt | 10 | 1195.262 | 15463.892 | 12.938 | 8.889 | ns/op > Float16OperationsBenchmark.maxBenchmark | 2048 | avgt | 10 | 2253.656 | 30649.239 | 13.6 | 6.154 | ns/op > Float16OperationsBenchmark.minBenchmark | 256 | avgt | 10 | 501.873 | 3753.9 | 7.48 | 0.298 | ns/op > Float16OperationsBenchmark.minBenchmark ... This pull request has now been integrated. Changeset: 92fd4499 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/92fd44992b9326fa10ec8303394dac17bb81b168 Stats: 160 lines in 3 files changed: 145 ins; 4 del; 11 mod 8350960: RISC-V: Add riscv backend for Float16 operations - vectorization Reviewed-by: fyang, dzhang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/25181 From rcastanedalo at openjdk.org Mon May 19 13:49:56 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 19 May 2025 13:49:56 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v5] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Fri, 16 May 2025 13:06:25 GMT, Roland Westrelin wrote: > > Note that I am not necessarily suggesting disabling "late" elimination of allocations at macro expansion. But it would be good, in light of the above findings, to find actual cases where the seemingly simpler alternative of removing dead allocations early is not sufficient for correctness, to motivate the more complex approach proposed in this PR. > > What happens with this bug is that a Phi created sometime after parsing inherits the type of the projection out of the Initialize which is wrong. No issue happens until the allocation is removed though. Only having allocations be removed early one shortens the window where bad things (a new Phi) can happen. But bad things could still happen. After all, we do some loop opts to help EA so maybe a similar issue could happen there. Or maybe, down the road, someone will change the way we do loop opts during EA because it helps EA and the bug will be back but we don't necessarily notice it until it happens at a user's site. > > Beyond that, you're suggesting restricting elimination of allocation. What if, down the road, someone notices that it gets in the way of some other optimization? Then that someone we'll have to reconstruct the history here. > > There's a history in c2 of fixing issues that are complicated by working around them. Often what happens is that we, later on, realize that the first work around wasn't sufficient and try to pile on more arounds. I also already ran into situations where everything to perform the needed optimization is there but it's disabled for some reason in a particular case and it's unclear why. > > I'm in favor of fixing issues once and for all by targeting their root cause and trying to not sacrifice performance whenever possible. Truth is we have a limited view in what people are running though a set of microbenchmarks but we can't be sure something has no performance impact only because that set of microbenchmarks doesn't regress. Beyond that, what can appear as a safer workaround today could be more trouble down the line and in the end cause more confusion and work. Thanks for the elaborate reply, Roland. I am also in favor of fundamental fixes instead of accumulating point fixes over time. Note that I do not suggest restricting elimination of dead array allocations. I guess I am mostly surprised to learn that we do not eliminate them as early as we can, but I understand your concern that similar issues may arise even if we did. I still think it would be good to include test cases to confirm that these are not only theoretical concerns, but that should not block the progress of this PR. I also think it would be good to investigate, separately, early elimination of dead array allocations, even after the integration of this work. Dead allocations may inhibit later optimizations so it would be good to eliminate them as early as possible anyway. One difficulty (not addressed in https://github.com/openjdk/jdk/commit/c28f81a7ef2a4f3d3cb761ea23a80c09276e7e58) is that early array elimination should still generate the nonnegative array size check code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2891102076 From chagedorn at openjdk.org Mon May 19 13:53:55 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 19 May 2025 13:53:55 GMT Subject: RFR: 8355512: Test compiler/vectorization/TestVectorZeroCount.java times out with -XX:TieredStopAtLevel=3 [v2] In-Reply-To: <2RgkiVAWZ2fjJ-V4DP68sSiGAe4GzGAujK2t5yhYpyQ=.e26331df-e781-4b4c-856d-edde038c2389@github.com> References: <2RgkiVAWZ2fjJ-V4DP68sSiGAe4GzGAujK2t5yhYpyQ=.e26331df-e781-4b4c-856d-edde038c2389@github.com> Message-ID: On Mon, 19 May 2025 05:01:43 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This is a small patch to TestVectorZeroCount to make it only execute when C2 is enabled, to fix a timeout with -XX:TieredStopAtLevel=3. This test takes a long time to finish without C2 because it iterates through all of the integers twice. Since the intention of the test is to stress the C2-specific `numberOfLeadingZeros` and `numberOfTrailingZeros` intrinsics, I think it makes sense to limit it to running with C2 only. >> >> Reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Change requires to run test on Graal as well Good improvement! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25243#pullrequestreview-2850852361 From chagedorn at openjdk.org Mon May 19 13:58:57 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 19 May 2025 13:58:57 GMT Subject: RFR: 8355512: Test compiler/vectorization/TestVectorZeroCount.java times out with -XX:TieredStopAtLevel=3 In-Reply-To: References: <_ZX5WU8JRDjnkJg7YiE29_Pfzp6LhgpWxKBAfo9rKBE=.a7fc5946-b082-43f3-9ba6-e256304e43e3@github.com> Message-ID: On Mon, 19 May 2025 06:48:55 GMT, Emanuel Peter wrote: >> @eme64 I think `vm.flavor == "server"` ensures that the test runs on server-class VMs, but I wasn't sure if manually setting the `TieredStopAtLevel` changes that designation. I looked at some more tests and I saw that they were using `@requires vm.flavor == "server" & (vm.opt.TieredStopAtLevel == null | vm.opt.TieredStopAtLevel == 4)`, which looks like a good way to check for any tier4 compiler. I've updated the patch to use that method instead, which should let it run on graal as well. > > @jaskarth Thanks for doing the research on this, nice to see that there is a solution :) > @eme64 @chhagedorn Thanks a lot for the reviews! Does testing need to be run on this change, or can I integrate it? I think it's good to go in since you only exclude it from running in the problematic scenarios. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25243#issuecomment-2891134635 From epeter at openjdk.org Mon May 19 14:17:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 May 2025 14:17:53 GMT Subject: RFR: 8355512: Test compiler/vectorization/TestVectorZeroCount.java times out with -XX:TieredStopAtLevel=3 In-Reply-To: References: <_ZX5WU8JRDjnkJg7YiE29_Pfzp6LhgpWxKBAfo9rKBE=.a7fc5946-b082-43f3-9ba6-e256304e43e3@github.com> Message-ID: On Mon, 19 May 2025 12:48:25 GMT, Jasmine Karthikeyan wrote: >> @jaskarth Thanks for doing the research on this, nice to see that there is a solution :) > > @eme64 @chhagedorn Thanks a lot for the reviews! Does testing need to be run on this change, or can I integrate it? @jaskarth Thanks for asking. I'll run some testing, just to be safe :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25243#issuecomment-2891199341 From epeter at openjdk.org Mon May 19 14:38:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 May 2025 14:38:03 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v32] In-Reply-To: References: <32OxhVRhwuY_Flt3Dmo-mcU5ruQIptcC2lBATGpQdZc=.ceeb5e58-b083-445d-a7dd-131380c75508@github.com> Message-ID: On Fri, 16 May 2025 12:02:29 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix up review suggestions > > Some more comments, I will take this up again on Monday. Looking good so far :-) Great work! @chhagedorn @mhaessig Thanks for the offline meeting, we really made some good progress. Here the whiteboard we worked on: ![image](https://github.com/user-attachments/assets/c75bb45a-3e51-415f-9029-6f62deb588bb) We decided to have two kinds of `Name`s: - `DataName`: can be mutable or immutable. These are used to model variables and fields. - `StructuralName`: for things like methods, classes, labels, etc. They are "structural" and it does not really make sense to talk about them being mutable or immutable. - Accordingly, we also have `DataType` and `StructuralType`. Accordingly, we will have: - `addDataName(String name, DataType type, Mutability mutability, int weight)`. The weight is optional, default `1`. `Mutability` has 2 options here: `MUTABLE` and `IMMUTABLE`. - `dataNames(Mutability mutability)` gets us a `DataNamesView`. `Mutability` can either pick `MUTABLE`, `IMMUTABLE` or `DONT_CARE`. - Optionally, we can filter with `subtypeOf(DataType)`, `exactOf(DataType)` or `supertypeOf(DataType)`. - As "terminal operators", we have: - `sample()` - `count()` - `hasAny()` - `toList()` - I will probably remove the `weight()` operator, as its use is not very clear, we could still add it later on. - `addStructuralName(String name, StructuralType type, int weight)`, weight it optional, default `1`. - `dataNames()` gets us a `StructuralNamesView`. - Optionally, we can filter with `subtypeOf(StructuralType)`, `exactOf(StructuralType)` or `supertypeOf(StructuralType)`. - As "terminal operators", we have: - `sample()` - `count()` - `hasAny()` - `toList()` Some additional feedback I want to incorporate: - Templates are often declared from innermost scope first to outermost last. But it can make sense to start reading the examples from the outermost first, to understand the context. - `Hook.set` should instead be `Hook.anchor`. That is more clear. - I need to have a very clear example that makes it clear how the `addDataName` interacts with the Template scope and the Hook scope, especially when we `Hook.insert`, and the name spills out one scope. I will work on this ASAP. @mhaessig @chhagedorn @robcasloz You can still review anything other than the Names, I only intend to refactor things around the Names. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2891271844 From epeter at openjdk.org Mon May 19 14:42:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 19 May 2025 14:42:40 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v40] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Offline review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/fcbd76a0..aedc5095 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=39 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=38-39 Stats: 21 lines in 1 file changed: 5 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From kvn at openjdk.org Mon May 19 14:53:56 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 19 May 2025 14:53:56 GMT Subject: RFR: 8357166: Many AOT tests failed with VM crash In-Reply-To: <-5pwU95zCnUwjVYpsSAVuPtoWyJIJbU0pffcg3p5qBU=.c15e324b-7425-4fa9-a459-387c37c8856b@github.com> References: <_BC6P3NWZR1E2i8UxpXIVf3UggbEh3eRkwDpfNcNUDM=.7cede7ec-df62-4c0a-9410-caa38ba18428@github.com> <-5pwU95zCnUwjVYpsSAVuPtoWyJIJbU0pffcg3p5qBU=.c15e324b-7425-4fa9-a459-387c37c8856b@github.com> Message-ID: On Mon, 19 May 2025 07:30:31 GMT, Aleksey Shipilev wrote: >> Disable AOT runtime blobs generation when VerifyOops flag is on. >> AOT adapters are not affected - they don't have oops operation. >> >> Tested tier1 and tier7-comp (which failed without fix) > > Marked as reviewed by shade (Reviewer). Thank you, @shipilev and @TobiHartmann ------------- PR Comment: https://git.openjdk.org/jdk/pull/25277#issuecomment-2891329725 From kvn at openjdk.org Mon May 19 14:56:57 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 19 May 2025 14:56:57 GMT Subject: Integrated: 8357166: Many AOT tests failed with VM crash In-Reply-To: <_BC6P3NWZR1E2i8UxpXIVf3UggbEh3eRkwDpfNcNUDM=.7cede7ec-df62-4c0a-9410-caa38ba18428@github.com> References: <_BC6P3NWZR1E2i8UxpXIVf3UggbEh3eRkwDpfNcNUDM=.7cede7ec-df62-4c0a-9410-caa38ba18428@github.com> Message-ID: <0aEgXy9RHd3FTQFMOBhWFUIRVcqrIYvWtAxMWDfnbM0=.ca78d2c3-5f7e-41e2-8058-79084f7a9959@github.com> On Fri, 16 May 2025 22:22:21 GMT, Vladimir Kozlov wrote: > Disable AOT runtime blobs generation when VerifyOops flag is on. > AOT adapters are not affected - they don't have oops operation. > > Tested tier1 and tier7-comp (which failed without fix) This pull request has now been integrated. Changeset: 84a98ab4 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/84a98ab43ff268d0b10926b35320717d691337ae Stats: 13 lines in 1 file changed: 13 ins; 0 del; 0 mod 8357166: Many AOT tests failed with VM crash Reviewed-by: thartmann, shade ------------- PR: https://git.openjdk.org/jdk/pull/25277 From yzheng at openjdk.org Mon May 19 15:19:15 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 19 May 2025 15:19:15 GMT Subject: RFR: 8334717: Add JVMCI support for APX EGPRs Message-ID: This PR marks extra general purpose registers introduced by Intel APX as Graal allocatables. It also drops AMD64/AArch64/RISCV64.flags and RegisterArray ------------- Commit messages: - address comments - Add JVMCI support for APX EGPRs Changes: https://git.openjdk.org/jdk/pull/23159/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23159&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8334717 Stats: 546 lines in 18 files changed: 41 ins; 334 del; 171 mod Patch: https://git.openjdk.org/jdk/pull/23159.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23159/head:pull/23159 PR: https://git.openjdk.org/jdk/pull/23159 From yzheng at openjdk.org Mon May 19 15:19:15 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 19 May 2025 15:19:15 GMT Subject: RFR: 8334717: Add JVMCI support for APX EGPRs In-Reply-To: References: Message-ID: On Thu, 16 Jan 2025 16:01:32 GMT, Yudi Zheng wrote: > This PR marks extra general purpose registers introduced by Intel APX as Graal allocatables. It also drops AMD64/AArch64/RISCV64.flags and RegisterArray keep alive ------------- PR Comment: https://git.openjdk.org/jdk/pull/23159#issuecomment-2724418056 From dnsimon at openjdk.org Mon May 19 15:19:15 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 19 May 2025 15:19:15 GMT Subject: RFR: 8334717: Add JVMCI support for APX EGPRs In-Reply-To: References: Message-ID: On Thu, 16 Jan 2025 16:01:32 GMT, Yudi Zheng wrote: > This PR marks extra general purpose registers introduced by Intel APX as Graal allocatables. It also drops AMD64/AArch64/RISCV64.flags and RegisterArray src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/code/RegisterAttributes.java line 58: > 56: * element at index i holds the attributes of the register whose number is i. > 57: */ > 58: public static RegisterAttributes[] createMap(RegisterConfig registerConfig, List registers) { We should remove raw arrays as much as possible in JVMCI and replace them with immutable Lists: * @return an immutable list whose length is the max register number in {@code registers} plus 1. An * element at index i holds the attributes of the register whose number is i. */ public static List registers) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23159#discussion_r2033171362 From yzheng at openjdk.org Mon May 19 15:19:15 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 19 May 2025 15:19:15 GMT Subject: RFR: 8334717: Add JVMCI support for APX EGPRs In-Reply-To: References: Message-ID: On Tue, 8 Apr 2025 13:15:29 GMT, Doug Simon wrote: >> This PR marks extra general purpose registers introduced by Intel APX as Graal allocatables. It also drops AMD64/AArch64/RISCV64.flags and RegisterArray > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/code/RegisterAttributes.java line 58: > >> 56: * element at index i holds the attributes of the register whose number is i. >> 57: */ >> 58: public static RegisterAttributes[] createMap(RegisterConfig registerConfig, List registers) { > > We should remove raw arrays as much as possible in JVMCI and replace them with immutable Lists: > > * @return an immutable list whose length is the max register number in {@code registers} plus 1. An > * element at index i holds the attributes of the register whose number is i. > */ > public static List registers) { I have audited all the .clone() on array objects and changed as much as possible. Let me know if there is still some opportunity ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23159#discussion_r2095952012 From dnsimon at openjdk.org Mon May 19 15:26:53 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 19 May 2025 15:26:53 GMT Subject: RFR: 8334717: Add JVMCI support for APX EGPRs In-Reply-To: References: Message-ID: <_MgCOg5EY1Sa1zo0ZwQ5Xr8rU3cU5CI5GHxCwIHGoSo=.d90889b8-f821-4695-85cb-1c8727638e7a@github.com> On Mon, 19 May 2025 15:16:27 GMT, Yudi Zheng wrote: >> src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/code/RegisterAttributes.java line 58: >> >>> 56: * element at index i holds the attributes of the register whose number is i. >>> 57: */ >>> 58: public static RegisterAttributes[] createMap(RegisterConfig registerConfig, List registers) { >> >> We should remove raw arrays as much as possible in JVMCI and replace them with immutable Lists: >> >> * @return an immutable list whose length is the max register number in {@code registers} plus 1. An >> * element at index i holds the attributes of the register whose number is i. >> */ >> public static List registers) { > > I have audited all the .clone() on array objects and changed as much as possible. Let me know if there is still some opportunity Looks good - thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23159#discussion_r2095973654 From hgreule at openjdk.org Mon May 19 16:03:58 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 19 May 2025 16:03:58 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: On Mon, 19 May 2025 08:58:37 GMT, Manuel H?ssig wrote: > The original code probably returned TypeInt::POS for the same reason you bring up below: I doubt that, as it doesn't account for the sign of the dividend at all here. We also can't keep the existing behavior (see the section about monotonicity in the PR description). >From my understanding, the node should also be kept alive no matter the value due to its control input. I'll test with returning TOP. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2096056366 From hgreule at openjdk.org Mon May 19 16:08:52 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Mon, 19 May 2025 16:08:52 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: On Mon, 19 May 2025 08:51:08 GMT, Manuel H?ssig wrote: >> This change improves the precision of the `Mod(I|L)Node::Value()` functions. >> >> I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early. >> The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions. >> >> ### Monotonicity >> >> Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range). >> >> ### Testing >> >> I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something). >> >> Please review and let me know what you think. >> >> ### Other >> >> The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508. >> >> During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into: >> - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement? >> - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd. > > src/hotspot/share/opto/divnode.cpp line 1242: > >> 1240: // The magnitude of the divisor is in range [1, 2^31]. >> 1241: // We know it isn't 0 as we handled that above. >> 1242: // That means at least one value is nonzero, so its absolute value is bigger than zero. > > Is that really what you checked above? AFAIU, above you check whether the divisor is a zero constant. But if the divisor is not a constant, then its range might still contain zero. You should check this claim using the bounds, otherwise this will not hold. We only care about the magnitude of the divisor here. `_lo == _hi == 0` can't be the case here anymore, because that means we have a constant 0. As we use the larger absolute value of the bounds, it can't be 0. We don't need to care about a 0 divisor (if we have a range of e.g., -2..2 here), as the node is kept alive as long as we can't prove in `Ideal` that the divisor isn't 0 (https://github.com/openjdk/jdk/blob/20a19bf545dd55f21b71eba2e2313dc12c359157/src/hotspot/share/opto/divnode.cpp#L1095-L1100) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2096071343 From dnsimon at openjdk.org Mon May 19 16:50:51 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 19 May 2025 16:50:51 GMT Subject: RFR: 8334717: Add JVMCI support for APX EGPRs In-Reply-To: References: Message-ID: <6U3nTT2mClWCu8SHNL9JmMfwaKITOkvSmzI-3GAr-WY=.d51c748f-199c-4254-8555-15ab31ce78fd@github.com> On Thu, 16 Jan 2025 16:01:32 GMT, Yudi Zheng wrote: > This PR marks extra general purpose registers introduced by Intel APX as Graal allocatables. It also drops AMD64/AArch64/RISCV64.flags and RegisterArray LGTM ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23159#pullrequestreview-2851438732 From dnsimon at openjdk.org Mon May 19 17:56:01 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 19 May 2025 17:56:01 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 Message-ID: As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: Error occurred during initialization of VM java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. Instead of exiting the VM, the failure should be silent (unless `-XX:+PrintCompilation` is enabled) as the VM can continue without libgraal, albeit in a crippled state. This PR implements this solution. Alternative solutions include: 1. Trying to adjust the values used with `ulimit -v` in the tests to accommodate the [virtual address reservations](https://github.com/oracle/graal/blob/69f10d3d658a6aeca3d5ce59c64af6a18336f14c/substratevm/src/com.oracle.svm.core.genscavenge/src/com/oracle/svm/core/genscavenge/AddressRangeCommittedMemoryProvider.java#L150) needed by libgraal. This is brittle as it assumes knowledge about how much address space is needed (which is turn depends on how many libgraal compiler threads are created). 2. Add a `@requires !vm.libgraal.jit` guard to the tests so they are not run when libgraal is in use. I think the solution in this PR is the most robust for the long term. ------------- Commit messages: - do not exit VM if libjvmci env creation fails Changes: https://git.openjdk.org/jdk/pull/25307/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25307&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357135 Stats: 29 lines in 3 files changed: 9 ins; 17 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25307.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25307/head:pull/25307 PR: https://git.openjdk.org/jdk/pull/25307 From sparasa at openjdk.org Mon May 19 20:49:35 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 19 May 2025 20:49:35 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v28] In-Reply-To: References: Message-ID: <9zdkGwB0mR-chtMUInJpUZ5a7qBSYTIQbKdhvsRRk5E=.75b55d56-2be0-4c00-8e52-122c57f6e100@github.com> > Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. > > For example: > > `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding > `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/cpu/x86/assembler_x86.cpp Change int b1 to int opcode_byte Co-authored-by: Jatin Bhateja ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/b2e8fd2a..c3e8f9ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=26-27 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From sviswanathan at openjdk.org Mon May 19 22:25:55 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 19 May 2025 22:25:55 GMT Subject: RFR: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry [v2] In-Reply-To: References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: On Mon, 12 May 2025 12:17:11 GMT, Jatin Bhateja wrote: >> PR adds missing EVEX compressed displacement attributes used for computing the scale factor (N) of compressed displacement. >> AVX512 memory operand instructions use compressed disp8 encoding if the displacement is a multiple of scale (N), which depends on Vector Length, embedded broadcasting, and lane size. Please refer to section 2.7.5 of Intel SDM for more details. >> >> e.g., Consider two instructions, one with displacement 0x10203040 and the other with displacement 0x40, instruction operates over full 64-byte vector hence scale N = 64. Displacement of latter instruction is a multiple of scale, thus can be represented by 1 byte displacement encoding, while the former requires 4 bytes to represent displacement in instruction encoding. >> >> >> 1) vpternlogq $0xff,0x10203040(%r20,%r21,8),%zmm23,%zmm24 >> EVEX OP MR SIB DISP IMM >> --------------|----|----|----|---------------|-----| >> 62 6b c1 40 25 84 ec 40 30 20 10 ff >> >> 2) vpternlogq $0xff,0x40(%r20,%r21,8),%zmm23,%zmm24 >> For full vector width operation, scalar matches with vector size, hence scale N = 64 >> effective displacement / compressed DISP8 = OFFSET(64) / 64 = 0x1 >> EVEX OP MR SIB DISP IMM >> -------------|----|---|---|-----------|---| >> 62 6b c1 40 25 44 ec 01 ff >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Correcting tuple types in some assembler routines Some more places correction needs to be done for address attributes: 1. evpmovzxbd tuple type needs change from HVM to QVM. 2. Address attribute missing for two additional instructions taking Address as input/output: vpermb, paddd. 3. The input_size_in_bits should be EVEX_32bit for cvtsi2ssq, cvtsi2sdq. 4. The input_size_in_bits should be EVEX_64bit for evpgatherdq, evpscatterdq, evgatherdpd, evscatterdpd. src/hotspot/cpu/x86/assembler_x86.cpp line 11379: > 11377: assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), ""); > 11378: InstructionAttr attributes(vector_len, /* vex_w */ false,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true); > 11379: attributes.set_address_attributes(/* tuple_type */ EVEX_FVM,/* input_size_in_bits */ EVEX_NObit); No address attribute needed for this instruction. src/hotspot/cpu/x86/assembler_x86.cpp line 11408: > 11406: assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), ""); > 11407: InstructionAttr attributes(vector_len, /* vex_w */ false,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true); > 11408: attributes.set_address_attributes(/* tuple_type */ EVEX_FVM,/* input_size_in_bits */ EVEX_NObit); No address attribute needed for this instruction. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25021#issuecomment-2892419514 PR Review Comment: https://git.openjdk.org/jdk/pull/25021#discussion_r2096520907 PR Review Comment: https://git.openjdk.org/jdk/pull/25021#discussion_r2096521528 From dlong at openjdk.org Mon May 19 23:22:52 2025 From: dlong at openjdk.org (Dean Long) Date: Mon, 19 May 2025 23:22:52 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll In-Reply-To: References: Message-ID: On Mon, 19 May 2025 06:43:38 GMT, Marc Chevalier wrote: > This assert seems a bit too tight. See the JBS issue to check the math: the bound of `trip_count` should be `<= 2^31`, while the current bound is ` < (julong)max_juint/2` = floor((2^32-1)/2) = (2^32-2) / 2 = 2^31-1. src/hotspot/share/opto/loopTransform.cpp line 1903: > 1901: jlong trip_count = (limit_con - init_con + stride_m)/new_stride_con; > 1902: // New trip count should satisfy next conditions. > 1903: assert(trip_count > 0 && (julong)trip_count <= (julong)1 << (sizeof(juint)*BitsPerByte-1), "sanity"); Suggestion: assert((julong)trip_count * 2 <= max_juint, "sanity"); This should catch negative values and any value that would make new_trip_count*2 below overflow. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25295#discussion_r2096599846 From iveresov at openjdk.org Tue May 20 00:37:14 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 20 May 2025 00:37:14 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v22] In-Reply-To: References: Message-ID: <_HjblHzZnDzpYl3gZrcQL7FWeqJ_7jxEY_LwHQV6AiU=.ad599c04-7bd1-48db-9485-dd0dab940930@github.com> > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 87 commits: - Merge branch 'master' into pp2 - 8357284: runtime/cds/appcds/aotProfile/AOTProfileFlags.java fails on non-debug platform - 8357283: compiler/debug/TestStressBailout.java hangs when running with AOT cache - Merge branch 'master' into pp2 - Address Ioi's comments - Merge branch 'master' into pp2 - Address Ioi's comments - 8356885: Don't emit C1 profiling for casts if TypeProfileCasts is off Reviewed-by: vlivanov, kvn - 8352755: Misconceptions about j.text.DecimalFormat digits during parsing Reviewed-by: naoto - 8356667: GenShen: Eliminate races with ShenandoahFreeSet::available() Reviewed-by: wkemper - ... and 77 more: https://git.openjdk.org/jdk/compare/890456f0...2740c2f2 ------------- Changes: https://git.openjdk.org/jdk/pull/24886/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=21 Stats: 3325 lines in 59 files changed: 3111 ins; 100 del; 114 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From fyang at openjdk.org Tue May 20 01:36:52 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 20 May 2025 01:36:52 GMT Subject: RFR: 8357056: RISC-V: Asm fixes - load/store width In-Reply-To: References: Message-ID: On Thu, 15 May 2025 14:46:12 GMT, Robbin Ehn wrote: > Hi, please consider. > > While working on https://github.com/openjdk/jdk/pull/25252, I notice: > - Major op code was just repeat > - Width coded in binary > - Stores have mixed up rs1 and rs2 > - Bonus, fsd used a macro for no reason > > I think this improves readability. > > Tested tier1 > > Thanks, Robbin Nice cleanup! Thanks. src/hotspot/cpu/riscv/assembler_riscv.hpp line 730: > 728: void _ld(Register Rd, Register Rs, const int32_t offset) { > 729: load_base(Rd, Rs, offset); > 730: } Question: Can we refactor and move definition of `flh`, `flw` and `fld` here? The definition of `fp_load` [1] looks quite similar as `load_base` here. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/assembler_riscv.hpp#L1352 ------------- PR Review: https://git.openjdk.org/jdk/pull/25253#pullrequestreview-2852297912 PR Review Comment: https://git.openjdk.org/jdk/pull/25253#discussion_r2096692780 From sparasa at openjdk.org Tue May 20 01:45:52 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 20 May 2025 01:45:52 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v29] In-Reply-To: References: Message-ID: > Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. > > For example: > > `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding > `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: refactor to use opcode_byte & ternary op for comparison ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/c3e8f9ec..c0086590 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=27-28 Stats: 8 lines in 1 file changed: 0 ins; 3 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From vlivanov at openjdk.org Tue May 20 02:10:32 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 20 May 2025 02:10:32 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence Message-ID: This PR introduces C2 support for `Reference.reachabilityFence()`. After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." Testing: - [x] hs-tier1 - hs-tier8 - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations - [x] java/lang/foreign microbenchmarks ------------- Commit messages: - Merge branch 'master' into 8290892.rf - 8290892: C2: Intrinsify Reference.reachabilityFence Changes: https://git.openjdk.org/jdk/pull/25315/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8290892 Stats: 1152 lines in 36 files changed: 1092 ins; 20 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/25315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315 PR: https://git.openjdk.org/jdk/pull/25315 From xgong at openjdk.org Tue May 20 02:24:51 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 20 May 2025 02:24:51 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API In-Reply-To: References: Message-ID: On Mon, 19 May 2025 03:10:46 GMT, Xiaohong Gong wrote: >> JDK-8318650 introduced hotspot intrinsification of subword gather load APIs for X86 platforms [1]. However, the current implementation is not optimal for AArch64 SVE platform, which natively supports vector instructions for subword gather load operations using an int vector for indices (see [2][3]). >> >> Two key areas require improvement: >> 1. At the Java level, vector indices generated for range validation could be reused for the subsequent gather load operation on architectures with native vector instructions like AArch64 SVE. However, the current implementation prevents compiler reuse of these index vectors due to divergent control flow, potentially impacting performance. >> 2. At the compiler IR level, the additional `offset` input for `LoadVectorGather`/`LoadVectorGatherMasked` with subword types increases IR complexity and complicates backend implementation. Furthermore, generating `add` instructions before each memory access negatively impacts performance. >> >> This patch refactors the implementation at both the Java level and compiler mid-end to improve efficiency and maintainability across different architectures. >> >> Main changes: >> 1. Java-side API refactoring: >> - Explicitly passes generated index vectors to hotspot, eliminating duplicate index vectors for gather load instructions on >> architectures like AArch64. >> 2. C2 compiler IR refactoring: >> - Refactors `LoadVectorGather`/`LoadVectorGatherMasked` IR for subword types by removing the memory offset input and incorporating it into the memory base `addr` at the IR level. This simplifies backend implementation, reduces add operations, and unifies the IR across all types. >> 3. Backend changes: >> - Streamlines X86 implementation of subword gather operations following the removal of the offset input from the IR level. >> >> Performance: >> The performance of the relative JMH improves up to 27% on a X86 AVX512 system. Please see the data below: >> >> Benchmark Mode Cnt Unit SIZE Before After Gain >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 64 53682.012 52650.325 0.98 >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 256 14484.252 14255.156 0.98 >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 1024 3664.900 3595.615 0.98 >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 4096 908.31... > > Ping again~ could any one please take a look at this PR? Thanks a lot! > Hi @XiaohongGong , Very nice work!, Looks good to me, will do some testing and get back. > > Do you have any idea about following regression? > > ``` > GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms 64 55844.814 48311.847 0.86 > GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms 256 15139.459 13009.848 0.85 > GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms 1024 3861.834 3284.944 0.85 > GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms 4096 938.665 817.673 0.87 > ``` > > Best Regards Yes, I also observed such regression. After analyzing, I found it was caused by the java side changes, which influences the range check elimination inside `IntVector.fromArray()` and `ByteVector.intoArray()` in the benchmark. The root cause is the counted loop in following benchmark case is not recognized by compiler as expected: public void microByteGather256() { for (int i = 0; i < SIZE; i += B256.length()) { ByteVector.fromArray(B256, barr, 0, index, i) .intoArray(bres, i); } } ``` The loop iv phi node is not recognized successfully when C2 recognize the counted loop pattern, because it was casted twice with `CastII` in this case. The ideal graph looks like: Loop \ \ / ------------------------- \ / | Phi | | | CastII | | | CastII | | | \ ConI | \ | | AddVI | |--------------------| Relative code is https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopnode.cpp#L1667. Before the change, the graph should be: Loop \ \ / ------------------------- \ / | Phi | | | CastII | | | | \ ConI | \ | | AddVI | |--------------------| ``` The difference comes from the index generation in `ByteVector.fromArray()` (I mean calling of `IntVector.fromArray()` in java). Before, the `i` in above loop is not directly used by `IntVector.fromArray()`. Instead, it was used by another loop phi node. Hence, there is no additional `CastII`. But after my change, the loop in the java implementation of `ByteVector.fromArray()` is removed, and `i` is directly used by `IntVector.fromArray()` . It will be used by boundary check before loading the indexes. Hence another `CastII` is generated. When recognizing the loop iv phi node, it will check whether the `Phi` is used by a `CastII` first. And get the input of the `CastII` if then. But this check only happens once. See https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopnode.cpp#L1659 if (xphi->Opcode() == Op_Cast(iv_bt)) { xphi = xphi->in(1); } Once the `PhiNode` is casted more than one times, the pattern is not recognized. I think we should refine the logic of recognizing the counted loop, by changing above `if` to `while` to make the iv phi node is recognized successfully. Potential change should be: while (xphi->Opcode() == Op_Cast(iv_bt)) { xphi = xphi->in(1); } I'v tested this change, and found the benchmarks with regression can be improved as before. Consider I'm not familiar with C2's loop transform code, I prefer to do more investigation for this issue, and may fix it with a followed-up patch. Any suggestions? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-2892720654 From vlivanov at openjdk.org Tue May 20 03:29:54 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 20 May 2025 03:29:54 GMT Subject: RFR: 8347901: C2 should remove unused leaf / pure runtime calls In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 13:18:33 GMT, Marc Chevalier wrote: > A first part toward a better support of pure functions. > > ## Pure Functions > > Pure functions (considered here) are functions that have no side effects, no effect on the control flow (no exception or such), cannot deopt etc.. It's really a function that you can execute anywhere, with whichever arguments without effect other than wasting time. Integer division is not pure as dividing by zero is throwing. But many floating point functions will just return `NaN` or `+/-infinity` in problematic cases. > > ## Scope > > We are not going all powerful for now! It's mostly about identifying some pure functions and being able to remove them if the result is unused. Some other things are not part of this PR, on purpose. Especially, this PR doesn't propose a way to move pure calls around. The reason is that pure calls are macro nodes later expanded into other, regular calls, which require a control input. To be able to do the expansion, we just keep the control in the pure call as well. > > ## Implementation Overview > > We created here some new node kind for pure calls that are expanded into regular calls during macro expansion. This also allows the removal of `ModD` and `ModF` nodes that have their pure equivalent now. They are surprisingly hard to unify with other floating point functions from an implementation point of view! > > IR framework and IGV needed a little bit of fixing. > > Thanks, > Marc I'm just pointing out that delaying lowering decision till matching phase neither makes scheduling easier nor makes implementation simpler. For loop opts it is important to know when loops contain calls and act accordingly (by trying to hoist relevant nodes out of loops and disabling some optimizations when the calls are still there). The difference between CFG nodes effectively pinned AT some point and non-CFG nodes with control dependency (effectively pushing them UNDER their control input) becomes insignificant once CFG nodes depend solely on control. In other words, once a call node doesn't consume/produce memory and I/O states, it becomes straightforward to move it around in CFG when desired (between it's inputs and users). Speaking of scheduling, would default scheduling heuristics do a good job? The case of expensive nodes exemplifies the need of custom scheduling heuristics for such nodes. Implementation-wise, lowering during matching becomes platform-specific and requires each platform to introduce `effect(CALL)` AD instructions. Moreover, each call shape (determined by arity and argument kinds) has to be explicitly handled with a dedicated AD instruction. And it doesn't benefit from existing support of call nodes every platform already has. > Ideally, what we want to do with expensive data nodes is to common them aggressively like any other data node. Then, during code motion, we can clone them if it is beneficial. The current implementation of expensive nodes can definitely be improved, but the nice property it has is that it only decreases the number of nodes through careful commoning during loop opts. Once cloning is allowed, there's a new problem to care about: the case of too many clones. A simple incremental improvement would be to teach `PhaseIdealLoop::process_expensive_nodes()` to push expensive nodes closer to their users if they are on less frequent code paths. Then it can be taught (how and when) to clone expensive nodes between multiple users. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24966#issuecomment-2892797262 From epeter at openjdk.org Tue May 20 05:37:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 05:37:51 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API In-Reply-To: References: Message-ID: On Tue, 20 May 2025 02:22:13 GMT, Xiaohong Gong wrote: >> Ping again~ could any one please take a look at this PR? Thanks a lot! > >> Hi @XiaohongGong , Very nice work!, Looks good to me, will do some testing and get back. >> >> Do you have any idea about following regression? >> >> ``` >> GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms 64 55844.814 48311.847 0.86 >> GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms 256 15139.459 13009.848 0.85 >> GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms 1024 3861.834 3284.944 0.85 >> GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms 4096 938.665 817.673 0.87 >> ``` >> >> Best Regards > > Yes, I also observed such regression. After analyzing, I found it was caused by the java side changes, which influences the range check elimination inside `IntVector.fromArray()` and `ByteVector.intoArray()` in the benchmark. The root cause is the counted loop in following benchmark case is not recognized by compiler as expected: > > public void microByteGather256() { > for (int i = 0; i < SIZE; i += B256.length()) { > ByteVector.fromArray(B256, barr, 0, index, i) > .intoArray(bres, i); > } > } > ``` > The loop iv phi node is not recognized successfully when C2 recognize the counted loop pattern, because it was casted twice with `CastII` in this case. The ideal graph looks like: > > Loop > \ > \ / ----------------------------- > \ / | > Phi | > | | > CastII | > | | > CastII | > | | > \ ConI | > \ | | > AddVI | > |-------------------------| > > Relative code is https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopnode.cpp#L1667. > > Befor... @XiaohongGong Thanks for splitting this one out, and for investigating the regressions here. Putting the permalink here, fixed to the current change (the link you pasted will always refer to the newest, which may later on point to the wrong line when lines above are inserted / deleted): https://github.com/openjdk/jdk/blob/7077535c0b0a6ea0a2a167f9135b1504a3d71fb3/src/hotspot/share/opto/loopnode.cpp#L1659-L1661 I wonder if we should just use `Node::uncast` there? But I'm quite unsure about that. > Yes, I also observed such regression. It would be nice if you proactively mentioned regressions, so it does not have to be pointed out by reviewers. For me, it could be ok to fix it in a follow-up patch. I think we are too close to RDP1 for JDK25 now anyway, and so we could push this patch here into JDK26, and then we have enough time in JDK26 to investigate the regression. Even better would be if we could do the other patch first, so we never even encounter a regression. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-2893017948 From xgong at openjdk.org Tue May 20 05:42:51 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 20 May 2025 05:42:51 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API In-Reply-To: References: Message-ID: On Tue, 20 May 2025 02:22:13 GMT, Xiaohong Gong wrote: >> Ping again~ could any one please take a look at this PR? Thanks a lot! > >> Hi @XiaohongGong , Very nice work!, Looks good to me, will do some testing and get back. >> >> Do you have any idea about following regression? >> >> ``` >> GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms 64 55844.814 48311.847 0.86 >> GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms 256 15139.459 13009.848 0.85 >> GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms 1024 3861.834 3284.944 0.85 >> GatherOperationsBenchmark.microByteGather256 thrpt 30 ops/ms 4096 938.665 817.673 0.87 >> ``` >> >> Best Regards > > Yes, I also observed such regression. After analyzing, I found it was caused by the java side changes, which influences the range check elimination inside `IntVector.fromArray()` and `ByteVector.intoArray()` in the benchmark. The root cause is the counted loop in following benchmark case is not recognized by compiler as expected: > > public void microByteGather256() { > for (int i = 0; i < SIZE; i += B256.length()) { > ByteVector.fromArray(B256, barr, 0, index, i) > .intoArray(bres, i); > } > } > ``` > The loop iv phi node is not recognized successfully when C2 recognize the counted loop pattern, because it was casted twice with `CastII` in this case. The ideal graph looks like: > > Loop > \ > \ / ----------------------------- > \ / | > Phi | > | | > CastII | > | | > CastII | > | | > \ ConI | > \ | | > AddVI | > |-------------------------| > > Relative code is https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/loopnode.cpp#L1667. > > Befor... > @XiaohongGong Thanks for splitting this one out, and for investigating the regressions here. > > Putting the permalink here, fixed to the current change (the link you pasted will always refer to the newest, which may later on point to the wrong line when lines above are inserted / deleted): > > https://github.com/openjdk/jdk/blob/7077535c0b0a6ea0a2a167f9135b1504a3d71fb3/src/hotspot/share/opto/loopnode.cpp#L1659-L1661 > > I wonder if we should just use `Node::uncast` there? But I'm quite unsure about that. Sounds good to me. I will have a deep investigation for it. Thanks! > > Yes, I also observed such regression. > > It would be nice if you proactively mentioned regressions, so it does not have to be pointed out by reviewers. > > For me, it could be ok to fix it in a follow-up patch. I think we are too close to RDP1 for JDK25 now anyway, and so we could push this patch here into JDK26, and then we have enough time in JDK26 to investigate the regression. Even better would be if we could do the other patch first, so we never even encounter a regression. Sounds good to me. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-2893026228 From jpai at openjdk.org Tue May 20 06:10:55 2025 From: jpai at openjdk.org (Jaikiran Pai) Date: Tue, 20 May 2025 06:10:55 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v6] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 22:26:30 GMT, Chen Liang wrote: >> In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list". > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Move intrinsic to be a subsection; just one most common function of the annotation > - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate > - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate > - Update src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java > > Co-authored-by: Raffaello Giulietti > - Shorter first sentence > - Updates, thanks to John > - Refine validation and defensive copying > - 8355223: Improve documentation on @IntrinsicCandidate src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 39: > 37: *

Intrinsification

> 38: * The most frequently special treatment is intrinsification, which replaces a > 39: * candidate method's body, bytecode or native, with handwritten platform Is this sentence missing the word "code" after "native"? Should it have been: > bytecode or native code, ... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2097002980 From jpai at openjdk.org Tue May 20 06:10:56 2025 From: jpai at openjdk.org (Jaikiran Pai) Date: Tue, 20 May 2025 06:10:56 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v6] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 19:55:58 GMT, John R Rose wrote: >> Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Move intrinsic to be a subsection; just one most common function of the annotation >> - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate >> - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate >> - Update src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java >> >> Co-authored-by: Raffaello Giulietti >> - Shorter first sentence >> - Updates, thanks to John >> - Refine validation and defensive copying >> - 8355223: Improve documentation on @IntrinsicCandidate > > src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 47: > >> 45: * intrinsics necessary. >> 46: *

>> 47: * Intrinsification may never happen, or happen at any moment during execution. > > s/or happen/or may happen/ (easier to parse) Hello John, are there are any hotspot VM flags that can be enabled to check whether or not intrinsification happen for a particular method during the lifetime of an application? Should any of those flags be documented in this proposed text? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2097001318 From kwei at openjdk.org Tue May 20 06:14:00 2025 From: kwei at openjdk.org (Kuai Wei) Date: Tue, 20 May 2025 06:14:00 GMT Subject: RFR: 8356328: Some C2 IR nodes miss size_of() function [v5] In-Reply-To: References: <3LgkcYxzGDgfPGcegyzyM_Z8Fpkc6aZEd9B1OzvhB2E=.d5dee5b6-dc41-42a9-b7b1-843952a845b9@github.com> Message-ID: On Mon, 19 May 2025 09:33:25 GMT, Aleksey Shipilev wrote: > Post-review comment: Doesn't this mean that super-class `Node::size_of` gives us a wrong answer for any node that has its own fields? > > ``` > uint Node::size_of() const { return sizeof(*this); } > ``` > > So, this looks mechanically preventable by making `Node::size_of` pure virtual, and thus _forcing_ subclasses to implement its own `size_of`. Good idea. I plan to check size_of/cmp/hash for every IR node. I may use a static analysis tool to do this job. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25081#issuecomment-2893078092 From yzheng at openjdk.org Tue May 20 06:14:09 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 20 May 2025 06:14:09 GMT Subject: RFR: 8334717: Add JVMCI support for APX EGPRs [v2] In-Reply-To: References: Message-ID: > This PR marks extra general purpose registers introduced by Intel APX as Graal allocatables. It also drops AMD64/AArch64/RISCV64.flags and RegisterArray Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: fix tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23159/files - new: https://git.openjdk.org/jdk/pull/23159/files/aabb8996..37e4d2a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23159&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23159&range=00-01 Stats: 15 lines in 3 files changed: 3 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/23159.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23159/head:pull/23159 PR: https://git.openjdk.org/jdk/pull/23159 From jpai at openjdk.org Tue May 20 06:21:54 2025 From: jpai at openjdk.org (Jaikiran Pai) Date: Tue, 20 May 2025 06:21:54 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v6] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 22:26:30 GMT, Chen Liang wrote: >> In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list". > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: > > - Move intrinsic to be a subsection; just one most common function of the annotation > - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate > - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate > - Update src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java > > Co-authored-by: Raffaello Giulietti > - Shorter first sentence > - Updates, thanks to John > - Refine validation and defensive copying > - 8355223: Improve documentation on @IntrinsicCandidate src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 50: > 48: * For example, the bytecodes of a candidate method may be executed by lower > 49: * compilation tiers of VM execution, while higher compilation tiers may replace > 50: * the bytecodes with specialized assembly code and/or compiler IR. Therefore, > while higher compilation tiers may replace the bytecodes with specialized assembly code and/or compiler IR Is there ever a case, where for a `@IntrinsicCandidate` method, the runtime will choose to execute the instrinsic for that method for a certain duration and then at a later point in time replace the intrinsic with compiler generated code? In other words, once the runtime executes the intrinsic implementation for a `@IntrinsicCandidate` method, will the method's implementation be switched to anything else during the lifetime of an application? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2097016882 From mhaessig at openjdk.org Tue May 20 06:59:53 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 20 May 2025 06:59:53 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: On Mon, 19 May 2025 16:06:21 GMT, Hannes Greule wrote: >> src/hotspot/share/opto/divnode.cpp line 1242: >> >>> 1240: // The magnitude of the divisor is in range [1, 2^31]. >>> 1241: // We know it isn't 0 as we handled that above. >>> 1242: // That means at least one value is nonzero, so its absolute value is bigger than zero. >> >> Is that really what you checked above? AFAIU, above you check whether the divisor is a zero constant. But if the divisor is not a constant, then its range might still contain zero. You should check this claim using the bounds, otherwise this will not hold. > > We only care about the magnitude of the divisor here. `_lo == _hi == 0` can't be the case here anymore, because that means we have a constant 0. As we use the larger absolute value of the bounds, it can't be 0. We don't need to care about a 0 divisor (if we have a range of e.g., -2..2 here), as the node is kept alive as long as we can't prove in `Ideal` that the divisor isn't 0 (https://github.com/openjdk/jdk/blob/20a19bf545dd55f21b71eba2e2313dc12c359157/src/hotspot/share/opto/divnode.cpp#L1095-L1100) That makes sense. Thank you for the explanation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2097128633 From mchevalier at openjdk.org Tue May 20 08:29:52 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 20 May 2025 08:29:52 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll In-Reply-To: References: Message-ID: On Mon, 19 May 2025 23:20:18 GMT, Dean Long wrote: >> This assert seems a bit too tight. See the JBS issue to check the math: the bound of `trip_count` should be `<= 2^31`, while the current bound is ` < (julong)max_juint/2` = floor((2^32-1)/2) = (2^32-2) / 2 = 2^31-1. > > src/hotspot/share/opto/loopTransform.cpp line 1903: > >> 1901: jlong trip_count = (limit_con - init_con + stride_m)/new_stride_con; >> 1902: // New trip count should satisfy next conditions. >> 1903: assert(trip_count > 0 && (julong)trip_count <= (julong)1 << (sizeof(juint)*BitsPerByte-1), "sanity"); > > Suggestion: > > assert((julong)trip_count * 2 <= max_juint, "sanity"); > > This should catch negative values and any value that would make new_trip_count*2 below overflow. I'm not convinced this is relaxed enough or that it shouldn't overflow. Where we set trip_count: https://github.com/openjdk/jdk/blob/e961b13cd68bc352b86af17c7e53df8537519beb/src/hotspot/share/opto/loopTransform.cpp#L133-L141 we have a check that trip count is `< 2^32 - 1`, but it seems to me that the value of `trip_count` there might be 2^32 or 2^32-1 (same computation as the code I'm fixing). It's fine: if it would not fit in the `uint` we don't record, fine, I guess. In the code I'm touching, `old_trip_count` is the value stored in the loop head previously. In the case where the new `trip_count` is 2^31, the old_trip_count haven't been set since construction, so it's still `2^32 - 1` but without the exact flag (not sure what it means). So in the case new trip_count is 2^31, old_trip_count is 2^32-1: the `*2` overflows and we get `adjust_min_trip == true`. Which I presume is harmless (or maybe necessary?). With the version you suggest, we would guard against the overflow and allow `trip_count == 2^31-1`, but at the cost of crashing in the case of `trip_count == 2^31`, which seems possible to me (and still have the overflow happens in product). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25295#discussion_r2097319830 From dzhang at openjdk.org Tue May 20 08:40:35 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 20 May 2025 08:40:35 GMT Subject: RFR: 8356924: RISC-V: Clean up cost for vector instructions [v2] In-Reply-To: References: Message-ID: > As mentioned in https://bugs.openjdk.org/browse/JDK-8285790 regarding the ARM64 vector instruct modifications: > Since the new rules are unique and setting different "ins_cost" makes no sense, we have switched to using the default cost. > > Currently, there is a similar situation on RISC-V. Over half of the instructions in riscv_v.ad do not include ins_cost definitions. Additionally, as RVV nodes are also unique, we can unify the format by removing these ins_cost entries from riscv_v.ad. > > ### Testing > qemu-system 9.2.3 with UseRVV (ubuntu24.10): > * [x] Run test/jdk/jdk/incubator/vector (fastdebug) > * [x] Run test/hotspot/jtreg/compiler/vectorapi (fastdebug) Dingli Zhang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Remove added ins_cost(VEC_COST) due to merging the main branch - Merge branch 'master' into master-remove-ins_cost - 8356924: RISC-V: Clean up cost for vector instructions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25221/files - new: https://git.openjdk.org/jdk/pull/25221/files/80ab16a5..44521d8a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25221&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25221&range=00-01 Stats: 23889 lines in 747 files changed: 8828 ins; 11491 del; 3570 mod Patch: https://git.openjdk.org/jdk/pull/25221.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25221/head:pull/25221 PR: https://git.openjdk.org/jdk/pull/25221 From rcastanedalo at openjdk.org Tue May 20 08:46:02 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 20 May 2025 08:46:02 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v40] In-Reply-To: References: Message-ID: <2TRfqPePF5VnERueckcKG9YeMKZaulJ_t1JjAIoCmso=.9c6b2139-0a59-43e6-81ce-b5bc5c649744@github.com> On Mon, 19 May 2025 14:42:40 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Offline review test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 197: > 195: *

> 196: * More examples for these functionalities can be found in {@link TestTutorial}, {@link TestSimple}, and > 197: * {@link TestAdvanced}. These links cannot be resolved by `javadoc` when using the command recommended in the PR description: $ javadoc -sourcepath test/hotspot/jtreg:./test/lib compiler.lib.template_framework Loading source files for package compiler.lib.template_framework... Constructing Javadoc information... Building index for all the packages and classes... Standard Doclet version 25-internal-2025-05-20-0715327.rocastan.open Building tree for all the packages and classes... test/hotspot/jtreg/compiler/lib/template_framework/Template.java:196: error: reference not found * More examples for these functionalities can be found in {@link TestTutorial}, {@link TestSimple}, and ^ test/hotspot/jtreg/compiler/lib/template_framework/Template.java:196: error: reference not found * More examples for these functionalities can be found in {@link TestTutorial}, {@link TestSimple}, and ^ test/hotspot/jtreg/compiler/lib/template_framework/Template.java:197: error: reference not found * {@link TestAdvanced}. ^ (...) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2097382034 From duke at openjdk.org Tue May 20 08:46:07 2025 From: duke at openjdk.org (Anjian-Wen) Date: Tue, 20 May 2025 08:46:07 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v12] In-Reply-To: References: Message-ID: <6pVNNL7kKbqHZP4Sj0tRGg1QCQbqrGZzIOMXd1jxGl4=.10d6ec8b-3661-4676-9224-de7f30f00c72@github.com> > From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time > > on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below > > before the patch > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op > MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op > MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op > MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op > MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op > MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op > MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op > MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op > MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op > MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op > MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op > MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op > MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op > MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op > MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op > MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op > MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op > MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op > MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/op > MemorySegmentZeroUnsafe.panama false 63 avgt 30... Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: update code for optimize ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23890/files - new: https://git.openjdk.org/jdk/pull/23890/files/61648e2e..af6995d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=10-11 Stats: 48 lines in 1 file changed: 20 ins; 0 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/23890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23890/head:pull/23890 PR: https://git.openjdk.org/jdk/pull/23890 From duke at openjdk.org Tue May 20 08:51:56 2025 From: duke at openjdk.org (Anjian-Wen) Date: Tue, 20 May 2025 08:51:56 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v12] In-Reply-To: <6pVNNL7kKbqHZP4Sj0tRGg1QCQbqrGZzIOMXd1jxGl4=.10d6ec8b-3661-4676-9224-de7f30f00c72@github.com> References: <6pVNNL7kKbqHZP4Sj0tRGg1QCQbqrGZzIOMXd1jxGl4=.10d6ec8b-3661-4676-9224-de7f30f00c72@github.com> Message-ID: <0rDDqwSnbG9mfLnxnyKeL9kvPLKXyFDqNq4a9DG0scU=.66eeb034-b541-46dd-80e7-93bb5540b54c@github.com> On Tue, 20 May 2025 08:46:07 GMT, Anjian-Wen wrote: >> From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time >> >> on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below >> >> before the patch >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op >> MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op >> MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op >> MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op >> MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op >> MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op >> MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op >> MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op >> MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op >> MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op >> MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op >> MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op >> MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op >> MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op >> MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op >> MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op >> MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op >> MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op >> MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/o... > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > update code for optimize @feilongjiang @RealFYang MemorySegmentFillUnsafe Test show that the time reduce from `29.728 ? 0.294` to `23.747 ? 0.215` when the count is 7. which produce very good effects, thanks for commit!! below is the jmh test result before unroll Benchmark (aligned) (size) Mode Cnt Score Error Units MemorySegmentFillUnsafe.panama true 1 avgt 30 23.235 ? 0.092 ns/op MemorySegmentFillUnsafe.panama true 2 avgt 30 20.672 ? 0.005 ns/op MemorySegmentFillUnsafe.panama true 3 avgt 30 20.686 ? 0.008 ns/op MemorySegmentFillUnsafe.panama true 4 avgt 30 19.599 ? 0.116 ns/op MemorySegmentFillUnsafe.panama true 5 avgt 30 20.793 ? 0.144 ns/op MemorySegmentFillUnsafe.panama true 6 avgt 30 20.707 ? 0.058 ns/op MemorySegmentFillUnsafe.panama true 7 avgt 30 21.387 ? 0.093 ns/op MemorySegmentFillUnsafe.panama true 8 avgt 30 25.170 ? 0.113 ns/op MemorySegmentFillUnsafe.panama true 15 avgt 30 31.145 ? 0.284 ns/op MemorySegmentFillUnsafe.panama true 16 avgt 30 26.315 ? 0.009 ns/op MemorySegmentFillUnsafe.panama true 63 avgt 30 46.668 ? 0.611 ns/op MemorySegmentFillUnsafe.panama true 64 avgt 30 49.265 ? 0.569 ns/op MemorySegmentFillUnsafe.panama true 255 avgt 30 62.224 ? 1.244 ns/op MemorySegmentFillUnsafe.panama true 256 avgt 30 61.213 ? 0.788 ns/op MemorySegmentFillUnsafe.panama false 1 avgt 30 23.224 ? 0.077 ns/op MemorySegmentFillUnsafe.panama false 2 avgt 30 20.673 ? 0.005 ns/op MemorySegmentFillUnsafe.panama false 3 avgt 30 20.679 ? 0.016 ns/op MemorySegmentFillUnsafe.panama false 4 avgt 30 19.779 ? 0.349 ns/op MemorySegmentFillUnsafe.panama false 5 avgt 30 20.672 ? 0.004 ns/op MemorySegmentFillUnsafe.panama false 6 avgt 30 20.803 ? 0.077 ns/op MemorySegmentFillUnsafe.panama false 7 avgt 30 21.329 ? 0.037 ns/op MemorySegmentFillUnsafe.panama false 8 avgt 30 25.131 ? 0.086 ns/op MemorySegmentFillUnsafe.panama false 15 avgt 30 31.021 ? 0.227 ns/op MemorySegmentFillUnsafe.panama false 16 avgt 30 26.939 ? 0.008 ns/op MemorySegmentFillUnsafe.panama false 63 avgt 30 47.253 ? 0.397 ns/op MemorySegmentFillUnsafe.panama false 64 avgt 30 47.614 ? 0.267 ns/op MemorySegmentFillUnsafe.panama false 255 avgt 30 61.818 ? 0.407 ns/op MemorySegmentFillUnsafe.panama false 256 avgt 30 62.879 ? 0.901 ns/op MemorySegmentFillUnsafe.unsafe true 1 avgt 30 20.561 ? 0.212 ns/op MemorySegmentFillUnsafe.unsafe true 2 avgt 30 22.979 ? 0.196 ns/op MemorySegmentFillUnsafe.unsafe true 3 avgt 30 25.152 ? 0.545 ns/op MemorySegmentFillUnsafe.unsafe true 4 avgt 30 27.713 ? 0.243 ns/op MemorySegmentFillUnsafe.unsafe true 5 avgt 30 27.877 ? 0.433 ns/op MemorySegmentFillUnsafe.unsafe true 6 avgt 30 28.356 ? 0.159 ns/op MemorySegmentFillUnsafe.unsafe true 7 avgt 30 29.442 ? 0.008 ns/op MemorySegmentFillUnsafe.unsafe true 8 avgt 30 34.050 ? 0.497 ns/op MemorySegmentFillUnsafe.unsafe true 15 avgt 30 34.128 ? 0.215 ns/op MemorySegmentFillUnsafe.unsafe true 16 avgt 30 33.516 ? 0.157 ns/op MemorySegmentFillUnsafe.unsafe true 63 avgt 30 35.779 ? 0.094 ns/op MemorySegmentFillUnsafe.unsafe true 64 avgt 30 38.035 ? 0.113 ns/op MemorySegmentFillUnsafe.unsafe true 255 avgt 30 50.912 ? 0.142 ns/op MemorySegmentFillUnsafe.unsafe true 256 avgt 30 50.586 ? 0.070 ns/op MemorySegmentFillUnsafe.unsafe false 1 avgt 30 20.307 ? 0.211 ns/op MemorySegmentFillUnsafe.unsafe false 2 avgt 30 22.574 ? 0.017 ns/op MemorySegmentFillUnsafe.unsafe false 3 avgt 30 24.593 ? 0.240 ns/op MemorySegmentFillUnsafe.unsafe false 4 avgt 30 27.805 ? 0.206 ns/op MemorySegmentFillUnsafe.unsafe false 5 avgt 30 26.974 ? 0.058 ns/op MemorySegmentFillUnsafe.unsafe false 6 avgt 30 28.188 ? 0.011 ns/op MemorySegmentFillUnsafe.unsafe false 7 avgt 30 29.728 ? 0.294 ns/op MemorySegmentFillUnsafe.unsafe false 8 avgt 30 31.559 ? 0.104 ns/op MemorySegmentFillUnsafe.unsafe false 15 avgt 30 36.024 ? 0.149 ns/op MemorySegmentFillUnsafe.unsafe false 16 avgt 30 37.215 ? 0.201 ns/op MemorySegmentFillUnsafe.unsafe false 63 avgt 30 38.211 ? 0.011 ns/op MemorySegmentFillUnsafe.unsafe false 64 avgt 30 39.056 ? 0.221 ns/op MemorySegmentFillUnsafe.unsafe false 255 avgt 30 53.070 ? 0.351 ns/op MemorySegmentFillUnsafe.unsafe false 256 avgt 30 53.406 ? 0.178 ns/op after unroll Benchmark (aligned) (size) Mode Cnt Score Error Units MemorySegmentFillUnsafe.panama true 1 avgt 30 23.424 ? 0.200 ns/op MemorySegmentFillUnsafe.panama true 2 avgt 30 20.679 ? 0.009 ns/op MemorySegmentFillUnsafe.panama true 3 avgt 30 20.769 ? 0.105 ns/op MemorySegmentFillUnsafe.panama true 4 avgt 30 19.432 ? 0.018 ns/op MemorySegmentFillUnsafe.panama true 5 avgt 30 20.675 ? 0.008 ns/op MemorySegmentFillUnsafe.panama true 6 avgt 30 20.734 ? 0.089 ns/op MemorySegmentFillUnsafe.panama true 7 avgt 30 21.305 ? 0.010 ns/op MemorySegmentFillUnsafe.panama true 8 avgt 30 24.605 ? 0.466 ns/op MemorySegmentFillUnsafe.panama true 15 avgt 30 31.731 ? 0.521 ns/op MemorySegmentFillUnsafe.panama true 16 avgt 30 26.319 ? 0.007 ns/op MemorySegmentFillUnsafe.panama true 63 avgt 30 46.153 ? 0.413 ns/op MemorySegmentFillUnsafe.panama true 64 avgt 30 48.146 ? 0.345 ns/op MemorySegmentFillUnsafe.panama true 255 avgt 30 61.937 ? 0.301 ns/op MemorySegmentFillUnsafe.panama true 256 avgt 30 61.462 ? 0.546 ns/op MemorySegmentFillUnsafe.panama false 1 avgt 30 23.202 ? 0.077 ns/op MemorySegmentFillUnsafe.panama false 2 avgt 30 20.692 ? 0.019 ns/op MemorySegmentFillUnsafe.panama false 3 avgt 30 20.678 ? 0.009 ns/op MemorySegmentFillUnsafe.panama false 4 avgt 30 19.808 ? 0.373 ns/op MemorySegmentFillUnsafe.panama false 5 avgt 30 21.633 ? 0.859 ns/op MemorySegmentFillUnsafe.panama false 6 avgt 30 20.775 ? 0.116 ns/op MemorySegmentFillUnsafe.panama false 7 avgt 30 21.395 ? 0.092 ns/op MemorySegmentFillUnsafe.panama false 8 avgt 30 25.065 ? 0.012 ns/op MemorySegmentFillUnsafe.panama false 15 avgt 30 31.904 ? 0.384 ns/op MemorySegmentFillUnsafe.panama false 16 avgt 30 27.172 ? 0.199 ns/op MemorySegmentFillUnsafe.panama false 63 avgt 30 48.113 ? 1.377 ns/op MemorySegmentFillUnsafe.panama false 64 avgt 30 48.306 ? 0.413 ns/op MemorySegmentFillUnsafe.panama false 255 avgt 30 61.440 ? 0.128 ns/op MemorySegmentFillUnsafe.panama false 256 avgt 30 62.360 ? 0.342 ns/op MemorySegmentFillUnsafe.unsafe true 1 avgt 30 21.759 ? 0.176 ns/op MemorySegmentFillUnsafe.unsafe true 2 avgt 30 22.074 ? 0.068 ns/op MemorySegmentFillUnsafe.unsafe true 3 avgt 30 21.303 ? 0.011 ns/op MemorySegmentFillUnsafe.unsafe true 4 avgt 30 23.178 ? 0.006 ns/op MemorySegmentFillUnsafe.unsafe true 5 avgt 30 23.189 ? 0.011 ns/op MemorySegmentFillUnsafe.unsafe true 6 avgt 30 23.848 ? 0.072 ns/op MemorySegmentFillUnsafe.unsafe true 7 avgt 30 23.393 ? 0.151 ns/op MemorySegmentFillUnsafe.unsafe true 8 avgt 30 33.539 ? 0.169 ns/op MemorySegmentFillUnsafe.unsafe true 15 avgt 30 36.204 ? 0.391 ns/op MemorySegmentFillUnsafe.unsafe true 16 avgt 30 34.218 ? 0.730 ns/op MemorySegmentFillUnsafe.unsafe true 63 avgt 30 35.807 ? 0.124 ns/op MemorySegmentFillUnsafe.unsafe true 64 avgt 30 37.984 ? 0.065 ns/op MemorySegmentFillUnsafe.unsafe true 255 avgt 30 50.843 ? 0.133 ns/op MemorySegmentFillUnsafe.unsafe true 256 avgt 30 50.643 ? 0.078 ns/op MemorySegmentFillUnsafe.unsafe false 1 avgt 30 21.782 ? 0.413 ns/op MemorySegmentFillUnsafe.unsafe false 2 avgt 30 22.102 ? 0.073 ns/op MemorySegmentFillUnsafe.unsafe false 3 avgt 30 21.727 ? 0.406 ns/op MemorySegmentFillUnsafe.unsafe false 4 avgt 30 23.175 ? 0.007 ns/op MemorySegmentFillUnsafe.unsafe false 5 avgt 30 23.402 ? 0.203 ns/op MemorySegmentFillUnsafe.unsafe false 6 avgt 30 23.791 ? 0.007 ns/op MemorySegmentFillUnsafe.unsafe false 7 avgt 30 23.747 ? 0.215 ns/op MemorySegmentFillUnsafe.unsafe false 8 avgt 30 31.518 ? 0.073 ns/op MemorySegmentFillUnsafe.unsafe false 15 avgt 30 36.252 ? 0.071 ns/op MemorySegmentFillUnsafe.unsafe false 16 avgt 30 37.290 ? 0.236 ns/op MemorySegmentFillUnsafe.unsafe false 63 avgt 30 38.373 ? 0.163 ns/op MemorySegmentFillUnsafe.unsafe false 64 avgt 30 38.947 ? 0.300 ns/op MemorySegmentFillUnsafe.unsafe false 255 avgt 30 52.648 ? 0.189 ns/op MemorySegmentFillUnsafe.unsafe false 256 avgt 30 53.219 ? 0.195 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2893516321 From thartmann at openjdk.org Tue May 20 08:52:56 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 20 May 2025 08:52:56 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence In-Reply-To: References: Message-ID: On Tue, 20 May 2025 00:49:49 GMT, Vladimir Ivanov wrote: > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks test/hotspot/jtreg/compiler/c2/TestReachabilityFence.java line 93: > 91: return payload[id][offset]; > 92: } finally { > 93: // Reference.reachabilityFence(this); Drive-by comment: Is this intentionally disabled? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2097397919 From epeter at openjdk.org Tue May 20 08:54:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 08:54:12 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v40] In-Reply-To: <2TRfqPePF5VnERueckcKG9YeMKZaulJ_t1JjAIoCmso=.9c6b2139-0a59-43e6-81ce-b5bc5c649744@github.com> References: <2TRfqPePF5VnERueckcKG9YeMKZaulJ_t1JjAIoCmso=.9c6b2139-0a59-43e6-81ce-b5bc5c649744@github.com> Message-ID: On Tue, 20 May 2025 08:43:13 GMT, Roberto Casta?eda Lozano wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Offline review > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 197: > >> 195: *

>> 196: * More examples for these functionalities can be found in {@link TestTutorial}, {@link TestSimple}, and >> 197: * {@link TestAdvanced}. > > These links cannot be resolved by `javadoc` when using the command recommended in the PR description: > > $ javadoc -sourcepath test/hotspot/jtreg:./test/lib compiler.lib.template_framework > Loading source files for package compiler.lib.template_framework... > Constructing Javadoc information... > Building index for all the packages and classes... > Standard Doclet version 25-internal-2025-05-20-0715327.rocastan.open > Building tree for all the packages and classes... > test/hotspot/jtreg/compiler/lib/template_framework/Template.java:196: error: reference not found > * More examples for these functionalities can be found in {@link TestTutorial}, {@link TestSimple}, and > ^ > test/hotspot/jtreg/compiler/lib/template_framework/Template.java:196: error: reference not found > * More examples for these functionalities can be found in {@link TestTutorial}, {@link TestSimple}, and > ^ > test/hotspot/jtreg/compiler/lib/template_framework/Template.java:197: error: reference not found > * {@link TestAdvanced}. > ^ > (...) @robcasloz I saw that as well. But IDE / neovim still thinks its ok. Do you know an alternative that works for `javadoc`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2097398947 From roland at openjdk.org Tue May 20 08:54:52 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 20 May 2025 08:54:52 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII [v4] In-Reply-To: References: Message-ID: On Sun, 18 May 2025 07:06:41 GMT, Quan Anh Mai wrote: >> Hi, >> >> The issue here is that the `CastLLNode` is created before the actual check that ensures the range of the input. This patch fixes it by moving the creation to the correct place, which is under `inline_block`. I also noticed that the code there seems incorrect and confusing. `ArrayCopyNode::get_partial_inline_vector_lane_count` takes the length of the array, not the size in bytes. If you look into the method it will multiply `const_len` with `type2aelementbytes(bt)` to get the size in bytes of the array. In the runtime test, we compare `length << log2(type2bytes(bt))` with `ArrayOperationPartialInlineSize`. This seems confusing, why don't we just compare `length` with `ArrayOperationPartialInlineSize / type2bytes(bt)`, it also unifies the test with the actual cast. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - fix comment > - fix comment `CastLL` related change looks good to me. `ArrayCopyNode::get_partial_inline_vector_lane_count()` looks reasonable but let's see what @jatin-bhateja thinks. src/hotspot/share/opto/arraycopynode.cpp line 772: > 770: > 771: // As an optimization, choose the optimal vector size for bounded copy length > 772: int ArrayCopyNode::get_partial_inline_vector_lane_count(BasicType type, jlong max_len) { @jatin-bhateja you wrote this code. What do you think of the proposed change? ------------- PR Review: https://git.openjdk.org/jdk/pull/25284#pullrequestreview-2853320078 PR Review Comment: https://git.openjdk.org/jdk/pull/25284#discussion_r2097400484 From epeter at openjdk.org Tue May 20 09:02:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 09:02:54 GMT Subject: RFR: 8355230: Crash in fuzzer tests: assert(n != nullptr) failed: must not be null In-Reply-To: References: Message-ID: <04Og4ec5FPNyFYVFM565_eKF4Kv8zbE623ovl01If8k=.64c23b9d-5512-4c1c-8219-cc2fede5f7a1@github.com> On Fri, 16 May 2025 14:16:29 GMT, Roland Westrelin wrote: > During IGVN, `TypeNode::make_paths_from_here_dead()` follows data > nodes until a `Phi`. The `Region` input for the input that that logic > goes through to reach the `Phi` is `null` causing the crash. I propose > simply adding an extra check for that corner case. How do we get to this case that the `phi->in(j) != nullptr` but `region->in(j) == nullptr`? I agree with @TobiHartmann : it is generally nicer if the test is a little more cleaned up, and if possible even some comments on how we get to the pattern in question. It can make it easier for someone encountering issues with this test later. But we leave that up to you, and also understand if you don't want to spend too much time on it. test/hotspot/jtreg/compiler/c2/TestNullRegionInputAtPhiMakePathDead.java line 28: > 26: * @bug 8355230 > 27: * @summary Crash in fuzzer tests: assert(n != nullptr) failed: must not be null > 28: * @run main/othervm -XX:CompileCommand=compileonly,TestNullRegionInputAtPhiMakePathDead::* -Xcomp TestNullRegionInputAtPhiMakePathDead What about a run without `Xcomp`? ------------- PR Review: https://git.openjdk.org/jdk/pull/25268#pullrequestreview-2853328997 PR Review Comment: https://git.openjdk.org/jdk/pull/25268#discussion_r2097406552 From rcastanedalo at openjdk.org Tue May 20 09:08:07 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 20 May 2025 09:08:07 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v40] In-Reply-To: References: Message-ID: On Mon, 19 May 2025 14:42:40 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Offline review test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 159: > 157: *

> 158: * Ideally, we would have used {@link String} Templates to inject these Template arguments into the strings. > 159: * But since {@link String} Templates are not (yet) available, the Templates provide hashtag replacements Suggestion: * Ideally, we would have used string templates to inject these Template arguments into the strings. * But since string templates are not (yet) available, the Templates provide hashtag replacements test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 172: > 170: * > 171: *

> 172: * A {@link TemplateToken} can not just be used in {@link Template#body}, but it can also be Suggestion: * A {@link TemplateToken} cannot just be used in {@link Template#body}, but it can also be ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2097411499 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2097424818 From rcastanedalo at openjdk.org Tue May 20 09:08:08 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 20 May 2025 09:08:08 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v40] In-Reply-To: References: <2TRfqPePF5VnERueckcKG9YeMKZaulJ_t1JjAIoCmso=.9c6b2139-0a59-43e6-81ce-b5bc5c649744@github.com> Message-ID: On Tue, 20 May 2025 08:50:22 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 197: >> >>> 195: *

>>> 196: * More examples for these functionalities can be found in {@link TestTutorial}, {@link TestSimple}, and >>> 197: * {@link TestAdvanced}. >> >> These links cannot be resolved by `javadoc` when using the command recommended in the PR description: >> >> $ javadoc -sourcepath test/hotspot/jtreg:./test/lib compiler.lib.template_framework >> Loading source files for package compiler.lib.template_framework... >> Constructing Javadoc information... >> Building index for all the packages and classes... >> Standard Doclet version 25-internal-2025-05-20-0715327.rocastan.open >> Building tree for all the packages and classes... >> test/hotspot/jtreg/compiler/lib/template_framework/Template.java:196: error: reference not found >> * More examples for these functionalities can be found in {@link TestTutorial}, {@link TestSimple}, and >> ^ >> test/hotspot/jtreg/compiler/lib/template_framework/Template.java:196: error: reference not found >> * More examples for these functionalities can be found in {@link TestTutorial}, {@link TestSimple}, and >> ^ >> test/hotspot/jtreg/compiler/lib/template_framework/Template.java:197: error: reference not found >> * {@link TestAdvanced}. >> ^ >> (...) > > @robcasloz I saw that as well. But IDE / neovim still thinks its ok. Do you know an alternative that works for `javadoc`? I guess it is a matter of including `test/hotspot/jtreg/testlibrary_tests/template_framework/examples` into the source path when invoking `javadoc`. But I think it is fine to treat `Test*.java` as external sources and just use `{@code TestTutorial.java}` etc. as you do elsewhere in this file. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2097420698 From rcastanedalo at openjdk.org Tue May 20 10:00:02 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 20 May 2025 10:00:02 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v40] In-Reply-To: References: Message-ID: On Mon, 19 May 2025 14:42:40 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Offline review test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 184: > 182: * template can be changed when we {@code render()} it (e.g. {@link ZeroArgs#render(float)}) and the default > 183: * fuel cost with {@link #setFuelCost}) when defining the {@link #body(Object...)}. Recursive templates are > 184: * supposed to terminate once the {@link #fuel} is depleted (i.e. reaches zero). This sentence is a bit vague, maybe you could state explicitly who is responsible for ensuring termination (e.g. "Once the {@link #fuel} is depleted (i.e. reaches zero), the writer of a recursive template should ensure ..."). test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 252: > 250: * @param function The {@link Function} that creates the {@link TemplateBody} given the template argument. > 251: */ > 252: record OneArgs(String arg0Name, Function function) implements Template { Suggestion: rename to `OneArg`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2097526940 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2097495269 From roland at openjdk.org Tue May 20 10:00:51 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 20 May 2025 10:00:51 GMT Subject: RFR: 8355230: Crash in fuzzer tests: assert(n != nullptr) failed: must not be null In-Reply-To: <04Og4ec5FPNyFYVFM565_eKF4Kv8zbE623ovl01If8k=.64c23b9d-5512-4c1c-8219-cc2fede5f7a1@github.com> References: <04Og4ec5FPNyFYVFM565_eKF4Kv8zbE623ovl01If8k=.64c23b9d-5512-4c1c-8219-cc2fede5f7a1@github.com> Message-ID: On Tue, 20 May 2025 09:00:08 GMT, Emanuel Peter wrote: > How do we get to this case that the `phi->in(j) != nullptr` but `region->in(j) == nullptr`? The `Region` is cleared by `RegionNode::Ideal` at parse time because it was initialized to `top`. When the crash occurs, the `Region` is enqueued for igvn but not yet processed. `RegionNode::Ideal` has logic to remove the null inputs and update the `Phi`s but it only runs during igvn. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25268#issuecomment-2893726576 From epeter at openjdk.org Tue May 20 10:08:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 10:08:51 GMT Subject: RFR: 8355230: Crash in fuzzer tests: assert(n != nullptr) failed: must not be null In-Reply-To: References: Message-ID: On Fri, 16 May 2025 14:16:29 GMT, Roland Westrelin wrote: > During IGVN, `TypeNode::make_paths_from_here_dead()` follows data > nodes until a `Phi`. The `Region` input for the input that that logic > goes through to reach the `Phi` is `null` causing the crash. I propose > simply adding an extra check for that corner case. @rwestrel Thanks for the explanation! I leave it up to you if you want to update the test with my suggestions, or not. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25268#pullrequestreview-2853565585 From fyang at openjdk.org Tue May 20 10:18:55 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 20 May 2025 10:18:55 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v12] In-Reply-To: <6pVNNL7kKbqHZP4Sj0tRGg1QCQbqrGZzIOMXd1jxGl4=.10d6ec8b-3661-4676-9224-de7f30f00c72@github.com> References: <6pVNNL7kKbqHZP4Sj0tRGg1QCQbqrGZzIOMXd1jxGl4=.10d6ec8b-3661-4676-9224-de7f30f00c72@github.com> Message-ID: On Tue, 20 May 2025 08:46:07 GMT, Anjian-Wen wrote: >> From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time >> >> on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below >> >> before the patch >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op >> MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op >> MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op >> MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op >> MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op >> MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op >> MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op >> MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op >> MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op >> MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op >> MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op >> MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op >> MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op >> MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op >> MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op >> MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op >> MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op >> MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op >> MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/o... > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > update code for optimize src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1696: > 1694: // One byte misalignment happens. > 1695: __ test_bit(t0, dest, 0); > 1696: __ beqz(t0, L_skip_align1); Can we use `tmp_reg` in places where `t0` is used in this function? src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1726: > 1724: // Remaining count is less than 8 bytes and address is heapword aligned. > 1725: { > 1726: Label L_fill_2, L_fill_1; You can declare a local `L_exit` and remove `L_exit1`. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1730: > 1728: __ beqz(t0, L_fill_2); > 1729: __ sw(value, Address(dest, 0)); > 1730: __ addi(dest, dest, 4); Leave a new line after this. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1735: > 1733: __ beqz(t0, L_fill_1); > 1734: __ sh(value, Address(dest, 0)); > 1735: __ addi(dest, dest, 2); Leave a new line after this. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1749: > 1747: __ bind(L_fill_elements); > 1748: { > 1749: Label L_fill_2, L_fill_1; You can declare a local `L_exit` and remove `L_exit2`. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1769: > 1767: __ beqz(t0, L_exit2); > 1768: __ sb(value, Address(dest, 0)); > 1769: __ addi(dest, dest, 1); No need to update `dest` here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2097576972 PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2097572103 PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2097573965 PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2097574187 PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2097572321 PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2097570585 From liach at openjdk.org Tue May 20 10:38:51 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 20 May 2025 10:38:51 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v6] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 06:07:07 GMT, Jaikiran Pai wrote: >> src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 47: >> >>> 45: * intrinsics necessary. >>> 46: *

>>> 47: * Intrinsification may never happen, or happen at any moment during execution. >> >> s/or happen/or may happen/ (easier to parse) > > Hello John, are there are any hotspot VM flags that can be enabled to check whether or not intrinsification happen for a particular method during the lifetime of an application? Should any of those flags be documented in this proposed text? I see there is a `ControlIntrinsic` flag in `globals.hpp`, but I am not sure how it actually interacts with intrinsics. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2097622562 From rcastanedalo at openjdk.org Tue May 20 11:22:04 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 20 May 2025 11:22:04 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v40] In-Reply-To: References: Message-ID: On Mon, 19 May 2025 14:42:40 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Offline review test/hotspot/jtreg/compiler/lib/template_framework/library/Hooks.java line 32: > 30: */ > 31: public abstract class Hooks { > 32: private Hooks() {} // Avoid instanciation and need for documentation. Suggestion: private Hooks() {} // Avoid instantiation and need for documentation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2097699129 From roland at openjdk.org Tue May 20 12:02:14 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 20 May 2025 12:02:14 GMT Subject: RFR: 8355230: Crash in fuzzer tests: assert(n != nullptr) failed: must not be null [v2] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 10:06:24 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - test comment >> - test cleanup > > @rwestrel Thanks for the explanation! > > I leave it up to you if you want to update the test with my suggestions, or not. @eme64 @TobiHartmann I cleaned up the test a bit. I also added a comment. > test/hotspot/jtreg/compiler/c2/TestNullRegionInputAtPhiMakePathDead.java line 28: > >> 26: * @bug 8355230 >> 27: * @summary Crash in fuzzer tests: assert(n != nullptr) failed: must not be null >> 28: * @run main/othervm -XX:CompileCommand=compileonly,TestNullRegionInputAtPhiMakePathDead::* -Xcomp TestNullRegionInputAtPhiMakePathDead > > What about a run without `Xcomp`? It's unclear to me anything would get compiled then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25268#issuecomment-2894140938 PR Review Comment: https://git.openjdk.org/jdk/pull/25268#discussion_r2097769351 From roland at openjdk.org Tue May 20 12:02:14 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 20 May 2025 12:02:14 GMT Subject: RFR: 8355230: Crash in fuzzer tests: assert(n != nullptr) failed: must not be null [v2] In-Reply-To: References: Message-ID: <2xnLET66HDNbYh1ZAnZhlACmgLmlg82yWKe6lkiLSYo=.d750f824-7afd-480b-b385-4bcc5fc6f47d@github.com> > During IGVN, `TypeNode::make_paths_from_here_dead()` follows data > nodes until a `Phi`. The `Region` input for the input that that logic > goes through to reach the `Phi` is `null` causing the crash. I propose > simply adding an extra check for that corner case. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - test comment - test cleanup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25268/files - new: https://git.openjdk.org/jdk/pull/25268/files/22058200..1ab90292 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25268&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25268&range=00-01 Stats: 40 lines in 1 file changed: 4 ins; 20 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/25268.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25268/head:pull/25268 PR: https://git.openjdk.org/jdk/pull/25268 From rehn at openjdk.org Tue May 20 12:02:43 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 20 May 2025 12:02:43 GMT Subject: RFR: 8357056: RISC-V: Asm fixes - load/store width [v2] In-Reply-To: References: Message-ID: > Hi, please consider. > > While working on https://github.com/openjdk/jdk/pull/25252, I notice: > - Major op code was just repeat > - Width coded in binary > - Stores have mixed up rs1 and rs2 > - Bonus, fsd used a macro for no reason > > I think this improves readability. > > Tested tier1 > > Thanks, Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Fixed flh/flw/fld - Merge branch 'master' into asm_fixes - Fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25253/files - new: https://git.openjdk.org/jdk/pull/25253/files/841f85da..2d658948 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25253&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25253&range=00-01 Stats: 18652 lines in 431 files changed: 6820 ins; 9951 del; 1881 mod Patch: https://git.openjdk.org/jdk/pull/25253.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25253/head:pull/25253 PR: https://git.openjdk.org/jdk/pull/25253 From rehn at openjdk.org Tue May 20 12:02:44 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 20 May 2025 12:02:44 GMT Subject: RFR: 8357056: RISC-V: Asm fixes - load/store width [v2] In-Reply-To: References: Message-ID: <4ESya1OjwS5ElGcfrcDDFFn44YZhnhMzqnr-yB1dZAI=.15a22244-545a-4770-b13c-05653f6e9edd@github.com> On Tue, 20 May 2025 01:33:33 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Fixed flh/flw/fld >> - Merge branch 'master' into asm_fixes >> - Fixes > > src/hotspot/cpu/riscv/assembler_riscv.hpp line 730: > >> 728: void _ld(Register Rd, Register Rs, const int32_t offset) { >> 729: load_base(Rd, Rs, offset); >> 730: } > > Question: Can we refactor and define `flh`, `flw` and `fld` with this `load_base` as well? > The definition of `fp_load` [1] looks quite similar as `load_base` here, so it could be factored out. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/assembler_riscv.hpp#L1352 Fixed ! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25253#discussion_r2097768887 From dnsimon at openjdk.org Tue May 20 12:14:07 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 20 May 2025 12:14:07 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v2] In-Reply-To: References: Message-ID: > As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: > > > Error occurred during initialization of VM > java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) > > > This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. > Instead of exiting the VM, the failure should be silent (unless `-XX:+PrintCompilation` is enabled) as the VM can continue without libgraal, albeit in a crippled state. This PR implements this solution. > > Alternative solutions include: > 1. Trying to adjust the values used with `ulimit -v` in the tests to accommodate the [virtual address reservations](https://github.com/oracle/graal/blob/69f10d3d658a6aeca3d5ce59c64af6a18336f14c/substratevm/src/com.oracle.svm.core.genscavenge/src/com/oracle/svm/core/genscavenge/AddressRangeCommittedMemoryProvider.java#L150) needed by libgraal. This is brittle as it assumes knowledge about how much address space is needed (which is turn depends on how many libgraal compiler threads are created). > 2. Add a `@requires !vm.libgraal.jit` guard to the tests so they are not run when libgraal is in use. > > I think the solution in this PR is the most robust for the long term. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: consolidate JVMCI eager initialization ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25307/files - new: https://git.openjdk.org/jdk/pull/25307/files/7eb259b9..32986d1a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25307&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25307&range=00-01 Stats: 41 lines in 5 files changed: 17 ins; 19 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25307.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25307/head:pull/25307 PR: https://git.openjdk.org/jdk/pull/25307 From yzheng at openjdk.org Tue May 20 12:27:53 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 20 May 2025 12:27:53 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v2] In-Reply-To: References: Message-ID: <3aLK-TCHFl8-YyAX6Ppjm458pXwA5jGq6qssypzvTw0=.8ad6de3e-63f4-4c1b-bac4-01c84549a7d7@github.com> On Tue, 20 May 2025 12:14:07 GMT, Doug Simon wrote: >> As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: >> >> >> Error occurred during initialization of VM >> java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) >> >> >> This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. >> Instead of exiting the VM, the failure should be silent (unless `-XX:+PrintCompilation` is enabled) as the VM can continue without libgraal, albeit in a crippled state. This PR implements this solution. >> >> Alternative solutions include: >> 1. Trying to adjust the values used with `ulimit -v` in the tests to accommodate the [virtual address reservations](https://github.com/oracle/graal/blob/69f10d3d658a6aeca3d5ce59c64af6a18336f14c/substratevm/src/com.oracle.svm.core.genscavenge/src/com/oracle/svm/core/genscavenge/AddressRangeCommittedMemoryProvider.java#L150) needed by libgraal. This is brittle as it assumes knowledge about how much address space is needed (which is turn depends on how many libgraal compiler threads are created). >> 2. Add a `@requires !vm.libgraal.jit` guard to the tests so they are not run when libgraal is in use. >> >> I think the solution in this PR is the most robust for the long term. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > consolidate JVMCI eager initialization LGTM ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/25307#pullrequestreview-2853970394 From duke at openjdk.org Tue May 20 12:45:34 2025 From: duke at openjdk.org (Anjian-Wen) Date: Tue, 20 May 2025 12:45:34 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v13] In-Reply-To: References: Message-ID: > From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time > > on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below > > before the patch > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op > MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op > MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op > MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op > MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op > MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op > MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op > MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op > MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op > MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op > MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op > MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op > MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op > MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op > MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op > MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op > MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op > MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op > MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/op > MemorySegmentZeroUnsafe.panama false 63 avgt 30... Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: update code format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23890/files - new: https://git.openjdk.org/jdk/pull/23890/files/af6995d7..4eb8c1d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=11-12 Stats: 10 lines in 1 file changed: 2 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/23890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23890/head:pull/23890 PR: https://git.openjdk.org/jdk/pull/23890 From duke at openjdk.org Tue May 20 12:50:08 2025 From: duke at openjdk.org (Anjian-Wen) Date: Tue, 20 May 2025 12:50:08 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v14] In-Reply-To: References: Message-ID: <-Ig2biJjwMoR79hyYfSNxJLqavcMzVyLFZvnV0J_t90=.4eb702b5-2cbd-40c6-81fd-744a2fe98acf@github.com> > From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time > > on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below > > before the patch > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op > MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op > MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op > MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op > MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op > MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op > MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op > MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op > MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op > MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op > MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op > MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op > MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op > MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op > MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op > MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op > MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op > MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op > MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/op > MemorySegmentZeroUnsafe.panama false 63 avgt 30... Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: change all the t0 with tmp_reg ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23890/files - new: https://git.openjdk.org/jdk/pull/23890/files/4eb8c1d7..ff8c134e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=12-13 Stats: 18 lines in 1 file changed: 0 ins; 0 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/23890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23890/head:pull/23890 PR: https://git.openjdk.org/jdk/pull/23890 From duke at openjdk.org Tue May 20 12:54:58 2025 From: duke at openjdk.org (Anjian-Wen) Date: Tue, 20 May 2025 12:54:58 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v12] In-Reply-To: References: <6pVNNL7kKbqHZP4Sj0tRGg1QCQbqrGZzIOMXd1jxGl4=.10d6ec8b-3661-4676-9224-de7f30f00c72@github.com> Message-ID: On Tue, 20 May 2025 10:13:29 GMT, Fei Yang wrote: >> Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> update code for optimize > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1696: > >> 1694: // One byte misalignment happens. >> 1695: __ test_bit(t0, dest, 0); >> 1696: __ beqz(t0, L_skip_align1); > > Can we use `tmp_reg` in places where `t0` is used in this function? Thanks for the advice, fixed it all above! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2097888061 From dnsimon at openjdk.org Tue May 20 12:58:28 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 20 May 2025 12:58:28 GMT Subject: RFR: 8357370: Export supported GCs in JVMCI In-Reply-To: References: Message-ID: On Tue, 20 May 2025 12:52:02 GMT, Roman Kennke wrote: > I need a way to detect in JVMCI if Shenandoah GC is supported (that is, built-in) by HotSpot. I need it for Shenandoah, because some vendors don't build it, but for cleanliness the relevant preprocessor constants should be exported for all GCs. > > Testing: > - [x] build/test https://github.com/oracle/graal/pull/10904 src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 498: > 496: declare_preprocessor_constant("ASSERT", DEBUG_ONLY(1) NOT_DEBUG(0)) \ > 497: \ > 498: declare_preprocessor_constant("INCLUDE_SERIALGC", INCLUDE_SERIALGC) \ Probably best to make the formatting consistent with how it's done for the `JVM_ACC_*` constants below (i.e., no alignment of values). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25325#discussion_r2097893655 From rkennke at openjdk.org Tue May 20 12:58:27 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 20 May 2025 12:58:27 GMT Subject: RFR: 8357370: Export supported GCs in JVMCI Message-ID: I need a way to detect in JVMCI if Shenandoah GC is supported (that is, built-in) by HotSpot. I need it for Shenandoah, because some vendors don't build it, but for cleanliness the relevant preprocessor constants should be exported for all GCs. Testing: - [x] build/test https://github.com/oracle/graal/pull/10904 ------------- Commit messages: - 8357370: Export supported GCs in JVMCI Changes: https://git.openjdk.org/jdk/pull/25325/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25325&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357370 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25325.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25325/head:pull/25325 PR: https://git.openjdk.org/jdk/pull/25325 From rkennke at openjdk.org Tue May 20 13:12:06 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 20 May 2025 13:12:06 GMT Subject: RFR: 8357370: Export supported GCs in JVMCI [v2] In-Reply-To: References: Message-ID: > I need a way to detect in JVMCI if Shenandoah GC is supported (that is, built-in) by HotSpot. I need it for Shenandoah, because some vendors don't build it, but for cleanliness the relevant preprocessor constants should be exported for all GCs. > > Testing: > - [x] build/test https://github.com/oracle/graal/pull/10904 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Don't align values ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25325/files - new: https://git.openjdk.org/jdk/pull/25325/files/7caef245..321a0940 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25325&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25325&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25325.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25325/head:pull/25325 PR: https://git.openjdk.org/jdk/pull/25325 From bkilambi at openjdk.org Tue May 20 13:17:26 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 20 May 2025 13:17:26 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v4] In-Reply-To: References: Message-ID: > This patch adds aarch64 backend (both Neon and SVE) for FP16 vector operations - add, mul, sub, div, min, max, sqrt and fma. > > Testing: > JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) pass on aarch64 which also includes the JTREG test to test the FP16 vector operations - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge master - Remove additional spaces in the aarch64_vector_ad.m4 file - Address review comments - 8355585: Aarch64: Add aarch64 backend for Float16 vector operations This patch adds aarch64 backend (both Neon and SVE) for FP16 vector operations - add, mul, sub, div, min, max, sqrt and fma. Testing: All JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) pass on aarch64 which also includes the JTREG test to test the FP16 vector operations - test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java ------------- Changes: https://git.openjdk.org/jdk/pull/25096/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25096&range=03 Stats: 1105 lines in 9 files changed: 426 ins; 0 del; 679 mod Patch: https://git.openjdk.org/jdk/pull/25096.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25096/head:pull/25096 PR: https://git.openjdk.org/jdk/pull/25096 From bkilambi at openjdk.org Tue May 20 13:22:52 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 20 May 2025 13:22:52 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v2] In-Reply-To: <7fFflMD9iyjyj_v2aGbJH9BD5ZzvHu7wW_NBeos2XBc=.451a7884-e077-459b-835b-d224a433ca48@github.com> References: <7fFflMD9iyjyj_v2aGbJH9BD5ZzvHu7wW_NBeos2XBc=.451a7884-e077-459b-835b-d224a433ca48@github.com> Message-ID: <5Hq0MaBbTHn_GsX8Sfwe2VN9BnyGV3-lBLu2NG8RhTI=.94bdf08d-232b-44dc-be46-b13763eb3021@github.com> On Wed, 14 May 2025 01:32:52 GMT, Xiaohong Gong wrote: >> Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > LGTM! Thanks! Thanks for all the reviews. I did a merge with master, resolved merge conflicts and tested the testcase as well and it passes successfully on aarch64. Can I please ask for a re-review? @XiaohongGong @shqking ------------- PR Comment: https://git.openjdk.org/jdk/pull/25096#issuecomment-2894382123 From bkilambi at openjdk.org Tue May 20 13:22:54 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 20 May 2025 13:22:54 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v4] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 13:17:26 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend (both Neon and SVE) for FP16 vector operations - add, mul, sub, div, min, max, sqrt and fma. >> >> Testing: >> JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) pass on aarch64 which also includes the JTREG test to test the FP16 vector operations - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge master > - Remove additional spaces in the aarch64_vector_ad.m4 file > - Address review comments > - 8355585: Aarch64: Add aarch64 backend for Float16 vector operations > > This patch adds aarch64 backend (both Neon and SVE) for FP16 vector > operations - add, mul, sub, div, min, max, sqrt and fma. > > Testing: > All JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) > pass on aarch64 which also includes the JTREG test to test the FP16 > vector operations - test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java Hi @theRealAph , Can I request you to please take a look at this PR? I need a "reviewer" to review my change as well for the PR to be ready for integration. Thanks in advance! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25096#issuecomment-2894388077 From rkennke at openjdk.org Tue May 20 13:35:31 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 20 May 2025 13:35:31 GMT Subject: RFR: 8357370: Export supported GCs in JVMCI [v3] In-Reply-To: References: Message-ID: > I need a way to detect in JVMCI if Shenandoah GC is supported (that is, built-in) by HotSpot. I need it for Shenandoah, because some vendors don't build it, but for cleanliness the relevant preprocessor constants should be exported for all GCs. > > Testing: > - [x] build/test https://github.com/oracle/graal/pull/10904 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Align most trailing \s ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25325/files - new: https://git.openjdk.org/jdk/pull/25325/files/321a0940..16d82e7b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25325&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25325&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25325.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25325/head:pull/25325 PR: https://git.openjdk.org/jdk/pull/25325 From dnsimon at openjdk.org Tue May 20 13:35:31 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 20 May 2025 13:35:31 GMT Subject: RFR: 8357370: Export supported GCs in JVMCI [v3] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 13:32:08 GMT, Roman Kennke wrote: >> I need a way to detect in JVMCI if Shenandoah GC is supported (that is, built-in) by HotSpot. I need it for Shenandoah, because some vendors don't build it, but for cleanliness the relevant preprocessor constants should be exported for all GCs. >> >> Testing: >> - [x] build/test https://github.com/oracle/graal/pull/10904 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Align most trailing \s LGTM and trivial. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25325#pullrequestreview-2854208149 From thartmann at openjdk.org Tue May 20 13:48:52 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 20 May 2025 13:48:52 GMT Subject: RFR: 8355230: Crash in fuzzer tests: assert(n != nullptr) failed: must not be null [v2] In-Reply-To: <2xnLET66HDNbYh1ZAnZhlACmgLmlg82yWKe6lkiLSYo=.d750f824-7afd-480b-b385-4bcc5fc6f47d@github.com> References: <2xnLET66HDNbYh1ZAnZhlACmgLmlg82yWKe6lkiLSYo=.d750f824-7afd-480b-b385-4bcc5fc6f47d@github.com> Message-ID: On Tue, 20 May 2025 12:02:14 GMT, Roland Westrelin wrote: >> During IGVN, `TypeNode::make_paths_from_here_dead()` follows data >> nodes until a `Phi`. The `Region` input for the input that that logic >> goes through to reach the `Phi` is `null` causing the crash. I propose >> simply adding an extra check for that corner case. > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - test comment > - test cleanup Thanks for improving the test. Still looks good to me. Ship it! :) ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25268#pullrequestreview-2854266264 From epeter at openjdk.org Tue May 20 13:55:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 13:55:12 GMT Subject: RFR: 8355094: Performance drop in auto-vectorized kernel due to split store [v2] In-Reply-To: References: Message-ID: On Mon, 19 May 2025 09:23:28 GMT, Tobias Hartmann wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/superword.cpp >> >> Co-authored-by: Manuel H?ssig > >> Impressive analysis, Emanuel! Very deep, thorough, and insightful. > > +1 to this. Great work, Emanuel! The fix looks good to me. @TobiHartmann @iwanowww @mhaessig Thanks for reviewing! I'll integrate now, but we can still continue the conversation @theRealAph @XiaohongGong @jatin-bhateja . ------------- PR Comment: https://git.openjdk.org/jdk/pull/25065#issuecomment-2894494967 From epeter at openjdk.org Tue May 20 13:55:14 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 13:55:14 GMT Subject: Integrated: 8355094: Performance drop in auto-vectorized kernel due to split store In-Reply-To: References: Message-ID: On Tue, 6 May 2025 13:21:30 GMT, Emanuel Peter wrote: > **Summary** > > Before [JDK-8325155](https://bugs.openjdk.org/browse/JDK-8325155) / https://github.com/openjdk/jdk/pull/18822, we used to prefer aligning to stores. But in that change, I removed that preference, and since then we have been aligning to loads instead (there is no preference, but since loads usually come before stores in the loop body, the load gets picked). This lead to a performance regression, especially on `x64`. > > Especially on `x64`, it is more important to align stores than aligning loads. This is because memory operations that cross a cacheline boundary are split. And `x64` CPU's generally have more throughput for loads than for stores, so splitting a store is worse than splitting a load. > > On `aarch64`, the results are less clear. On two machines, the differences were marginal, but surprisingly aligning to loads was marginally faster. On another machine, aligning to stores was significantly faster. I suspect performance depends on the exact `aarch64` implementation. I'm not an `aarch64` specialist, and only have access to a limited number of machines. > > **Fix**: make automatic alignment configurable with `SuperWordAutomaticAlignment` (no alignment, align to store, align to load). Default is align to store. > > For now, I will just align to stores on all platforms. If someone has various `aarch64` machines, they are welcome do do deeper investigations. Same for other platforms. We could always turn the flag into a platform dependent one, and set different defaults depending on the exact CPU. > > If you are interested you can read my investigations/benchmark results below. Therre are a lot of colorful plots ? ? > > **FYI about Vector API:** if you are working with the Vector API, you may also want to worry about **alignment**, because there can be a **significant performance impact** (30%+ in some cases). You may also want to know about **4k aliasing**, discussed below. > > **Shoutout:** > - @jatin-bhateja filed the regression, and explained that it was about split stores. > - @mhaessig helped me talk through some of the early benchmarks. > - @iwanowww pointed me to the 4k aliasing explanation. > > -------------------- > > **Introduction** > > I had long lived with the **theory that on modern CPUs, misalignment has no consequence, especially no performance impact**. When you google, many sources say that misalignment used to be an issue on older CPUs, but not any more. > > That may **technically** be true: > - A misaligned load or store that does not cross a cacheline b... This pull request has now been integrated. Changeset: 277bb208 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/277bb208a2c6de888c57285854b6f5d030021f94 Stats: 341 lines in 4 files changed: 340 ins; 0 del; 1 mod 8355094: Performance drop in auto-vectorized kernel due to split store Reviewed-by: vlivanov, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/25065 From epeter at openjdk.org Tue May 20 14:16:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 14:16:52 GMT Subject: RFR: 8355230: Crash in fuzzer tests: assert(n != nullptr) failed: must not be null [v2] In-Reply-To: <2xnLET66HDNbYh1ZAnZhlACmgLmlg82yWKe6lkiLSYo=.d750f824-7afd-480b-b385-4bcc5fc6f47d@github.com> References: <2xnLET66HDNbYh1ZAnZhlACmgLmlg82yWKe6lkiLSYo=.d750f824-7afd-480b-b385-4bcc5fc6f47d@github.com> Message-ID: On Tue, 20 May 2025 12:02:14 GMT, Roland Westrelin wrote: >> During IGVN, `TypeNode::make_paths_from_here_dead()` follows data >> nodes until a `Phi`. The `Region` input for the input that that logic >> goes through to reach the `Phi` is `null` causing the crash. I propose >> simply adding an extra check for that corner case. > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - test comment > - test cleanup Thanks for the updates! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25268#pullrequestreview-2854392707 From roland at openjdk.org Tue May 20 14:21:01 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 20 May 2025 14:21:01 GMT Subject: RFR: 8355230: Crash in fuzzer tests: assert(n != nullptr) failed: must not be null [v2] In-Reply-To: References: <2xnLET66HDNbYh1ZAnZhlACmgLmlg82yWKe6lkiLSYo=.d750f824-7afd-480b-b385-4bcc5fc6f47d@github.com> Message-ID: On Tue, 20 May 2025 13:45:58 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - test comment >> - test cleanup > > Thanks for improving the test. Still looks good to me. Ship it! :) @TobiHartmann @eme64 thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/25268#issuecomment-2894606710 From roland at openjdk.org Tue May 20 14:21:02 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 20 May 2025 14:21:02 GMT Subject: Integrated: 8355230: Crash in fuzzer tests: assert(n != nullptr) failed: must not be null In-Reply-To: References: Message-ID: On Fri, 16 May 2025 14:16:29 GMT, Roland Westrelin wrote: > During IGVN, `TypeNode::make_paths_from_here_dead()` follows data > nodes until a `Phi`. The `Region` input for the input that that logic > goes through to reach the `Phi` is `null` causing the crash. I propose > simply adding an extra check for that corner case. This pull request has now been integrated. Changeset: 62d155e8 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/62d155e8c3b952ecf1f615666c7d71996ba43d74 Stats: 88 lines in 2 files changed: 87 ins; 0 del; 1 mod 8355230: Crash in fuzzer tests: assert(n != nullptr) failed: must not be null Reviewed-by: thartmann, epeter ------------- PR: https://git.openjdk.org/jdk/pull/25268 From epeter at openjdk.org Tue May 20 14:41:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 14:41:04 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v41] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: DataName / StructuralName refactoring ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/aedc5095..ec7aab9e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=40 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=39-40 Stats: 1259 lines in 9 files changed: 910 ins; 120 del; 229 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Tue May 20 14:41:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 14:41:05 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 13:40:02 GMT, Roberto Casta?eda Lozano wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation fixes > > A few more documentation suggestions, will continue reviewing this changeset over the next days. @robcasloz Thanks for your review suggestion, I will address them shortly. I have just pushed the `DataName / StructuralName` refactor that I had discussed with @chhagedorn and @mhaessig . There are still a few smaller things missing: - A test for adding names with weights that are too small or too large -> expect exception. - Duplicate tests I have for `DataName`, so that we also test `StructuralName` sufficiently. - More examples with how `DataName` interacts with `Hook.insert`. - Renaming `Hook.set` to `Hook.anchor` - Verify that there is no duplication of names with the same `name()`, both up and down the scopes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2894670454 From kvn at openjdk.org Tue May 20 14:41:38 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 20 May 2025 14:41:38 GMT Subject: RFR: 8357250: assert(shift >= 0 && shift < 4) failed: unexpected compressd klass shift! Message-ID: <2Y9-e32nTG2rdV_V8fY2Y3jJoiIk_jiAJLqWfNS5mYM=.4574090a-c880-48ab-8aba-2186894ee412@github.com> Several fixes for AOT code generation: - Use `CompressedKlassPointers::max_shift()` in asserts to take into account Compact Object Headers (the asserts are present only in aarch64 code) - Increase table stub size on aarch64 when AOT specialized code is used for klass decoding (hit assert there otherwise) - Fix "copy-paste" typo in `RelocIterator()` which incorrectly overwrite the start address of instruction section (restored relocations in AOTed exception blob was wrong and AOT tests failed when deoptimization happened) - Removed `vm.flagless` from AOT code tests to increase testing coverage (otherwise they were run only in tier1) - Add additions `@requires` to tests for expected execution configuration Tested hs-tier1-10, Xcomp, stress ------------- Commit messages: - 8357250: assert(shift >= 0 && shift < 4) failed: unexpected compressd klass shift! Changes: https://git.openjdk.org/jdk/pull/25330/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25330&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357250 Stats: 24 lines in 5 files changed: 13 ins; 2 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/25330.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25330/head:pull/25330 PR: https://git.openjdk.org/jdk/pull/25330 From kvn at openjdk.org Tue May 20 14:41:39 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 20 May 2025 14:41:39 GMT Subject: RFR: 8357250: assert(shift >= 0 && shift < 4) failed: unexpected compressd klass shift! In-Reply-To: <2Y9-e32nTG2rdV_V8fY2Y3jJoiIk_jiAJLqWfNS5mYM=.4574090a-c880-48ab-8aba-2186894ee412@github.com> References: <2Y9-e32nTG2rdV_V8fY2Y3jJoiIk_jiAJLqWfNS5mYM=.4574090a-c880-48ab-8aba-2186894ee412@github.com> Message-ID: On Tue, 20 May 2025 14:34:20 GMT, Vladimir Kozlov wrote: > Several fixes for AOT code generation: > - Use `CompressedKlassPointers::max_shift()` in asserts to take into account Compact Object Headers (the asserts are present only in aarch64 code) > - Increase table stub size on aarch64 when AOT specialized code is used for klass decoding (hit assert there otherwise) > - Fix "copy-paste" typo in `RelocIterator()` which incorrectly overwrite the start address of instruction section (restored relocations in AOTed exception blob was wrong and AOT tests failed when deoptimization happened) > - Removed `vm.flagless` from AOT code tests to increase testing coverage (otherwise they were run only in tier1) > - Add additions `@requires` to tests for expected execution configuration > > Tested hs-tier1-10, Xcomp, stress @ashu-mehra and @adinn please review. @TheRealMDoerr please look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25330#issuecomment-2894669650 PR Comment: https://git.openjdk.org/jdk/pull/25330#issuecomment-2894679142 From roland at openjdk.org Tue May 20 14:52:58 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 20 May 2025 14:52:58 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v5] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: <6qes6PQ7RtzBVWkPkTopPLs5E9E5SGOekPi_qwBMu1A=.26952f0f-2050-4431-a8f4-0949202510d4@github.com> On Mon, 19 May 2025 13:47:18 GMT, Roberto Casta?eda Lozano wrote: > I still think it would be good to include test cases to confirm that these are not only theoretical concerns, but that should not block the progress of this PR. Here is a test case: import java.util.Arrays; public class TestAllocNoUseBadMemoryState { private static volatile int volatileField; public static void main(String[] args) { boolean[] allTrue = new boolean[3]; Arrays.fill(allTrue, true); A a = new A(); boolean[] allFalse = new boolean[3]; for (int i = 0; i < 20_000; i++) { a.field1 = 0; test1(a, allTrue); test1(a, allFalse); if (a.field1 != 42) { throw new RuntimeException("Lost Store"); } } } private static void test1(A otherA, boolean[] flags) { if (flags == null) { } otherA.field1 = 42; for (int i = 0; i < 3; i++) { A a = new A(); if (flags[i]) { break; } } } private static class A { int field1; } } where all the damage is done early on when EA runs. A pass of loop opts before EA fully unrolls the loop and creates memory `Phi`s with incorrect `adr_type` (raw memory). Then EA removes the allocation. All that keeps the `Store` to `field1` alive then is uncommon traps from template predicates. Once they are removed, the `Store` goes away (first round of loop opts after EA). I'll add that test case to the PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2894719871 From thartmann at openjdk.org Tue May 20 14:54:52 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 20 May 2025 14:54:52 GMT Subject: RFR: 8357250: assert(shift >= 0 && shift < 4) failed: unexpected compressd klass shift! In-Reply-To: <2Y9-e32nTG2rdV_V8fY2Y3jJoiIk_jiAJLqWfNS5mYM=.4574090a-c880-48ab-8aba-2186894ee412@github.com> References: <2Y9-e32nTG2rdV_V8fY2Y3jJoiIk_jiAJLqWfNS5mYM=.4574090a-c880-48ab-8aba-2186894ee412@github.com> Message-ID: On Tue, 20 May 2025 14:34:20 GMT, Vladimir Kozlov wrote: > Several fixes for AOT code generation: > - Use `CompressedKlassPointers::max_shift()` in asserts to take into account Compact Object Headers (the asserts are present only in aarch64 code) > - Increase table stub size on aarch64 when AOT specialized code is used for klass decoding (hit assert there otherwise) > - Fix "copy-paste" typo in `RelocIterator()` which incorrectly overwrite the start address of instruction section (restored relocations in AOTed exception blob was wrong and AOT tests failed when deoptimization happened) > - Removed `vm.flagless` from AOT code tests to increase testing coverage (otherwise they were run only in tier1) > - Add additions `@requires` to tests for expected execution configuration > > Tested hs-tier1-10, Xcomp, stress Changes requested by thartmann (Reviewer). src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5352: > 5350: // not the shift because it is not allowed to change > 5351: int shift = CompressedKlassPointers::shift(); > 5352: assert(shift >= 0 && shift <= CompressedKlassPointers::max_shift(), "unexpected compressd klass shift!"); Suggestion: assert(shift >= 0 && shift <= CompressedKlassPointers::max_shift(), "unexpected compressed klass shift!"); src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5418: > 5416: // not the shift because it is not allowed to change > 5417: int shift = CompressedKlassPointers::shift(); > 5418: assert(shift >= 0 && shift <= CompressedKlassPointers::max_shift(), "unexpected compressd klass shift!"); Suggestion: assert(shift >= 0 && shift <= CompressedKlassPointers::max_shift(), "unexpected compressed klass shift!"); ------------- PR Review: https://git.openjdk.org/jdk/pull/25330#pullrequestreview-2854545412 PR Review Comment: https://git.openjdk.org/jdk/pull/25330#discussion_r2098191379 PR Review Comment: https://git.openjdk.org/jdk/pull/25330#discussion_r2098191589 From kvn at openjdk.org Tue May 20 15:27:32 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 20 May 2025 15:27:32 GMT Subject: RFR: 8357250: assert(shift >= 0 && shift < 4) failed: unexpected compressd klass shift! [v2] In-Reply-To: <2Y9-e32nTG2rdV_V8fY2Y3jJoiIk_jiAJLqWfNS5mYM=.4574090a-c880-48ab-8aba-2186894ee412@github.com> References: <2Y9-e32nTG2rdV_V8fY2Y3jJoiIk_jiAJLqWfNS5mYM=.4574090a-c880-48ab-8aba-2186894ee412@github.com> Message-ID: > Several fixes for AOT code generation: > - Use `CompressedKlassPointers::max_shift()` in asserts to take into account Compact Object Headers (the asserts are present only in aarch64 code) > - Increase table stub size on aarch64 when AOT specialized code is used for klass decoding (hit assert there otherwise) > - Fix "copy-paste" typo in `RelocIterator()` which incorrectly overwrite the start address of instruction section (restored relocations in AOTed exception blob was wrong and AOT tests failed when deoptimization happened) > - Removed `vm.flagless` from AOT code tests to increase testing coverage (otherwise they were run only in tier1) > - Add additions `@requires` to tests for expected execution configuration > > Tested hs-tier1-10, Xcomp, stress Vladimir Kozlov has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp Co-authored-by: Tobias Hartmann - Update src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25330/files - new: https://git.openjdk.org/jdk/pull/25330/files/c4dc91f0..d2afc0b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25330&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25330&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25330.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25330/head:pull/25330 PR: https://git.openjdk.org/jdk/pull/25330 From kvn at openjdk.org Tue May 20 15:27:32 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 20 May 2025 15:27:32 GMT Subject: RFR: 8357250: assert(shift >= 0 && shift < 4) failed: unexpected compressd klass shift! [v2] In-Reply-To: References: <2Y9-e32nTG2rdV_V8fY2Y3jJoiIk_jiAJLqWfNS5mYM=.4574090a-c880-48ab-8aba-2186894ee412@github.com> Message-ID: On Tue, 20 May 2025 14:52:14 GMT, Tobias Hartmann wrote: >> Vladimir Kozlov has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp >> >> Co-authored-by: Tobias Hartmann >> - Update src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp >> >> Co-authored-by: Tobias Hartmann > > Changes requested by thartmann (Reviewer). Thank you, @TobiHartmann for look ------------- PR Comment: https://git.openjdk.org/jdk/pull/25330#issuecomment-2894858167 From epeter at openjdk.org Tue May 20 15:44:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 15:44:51 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v42] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with three additional commits since the last revision: - Update test/hotspot/jtreg/compiler/lib/template_framework/Template.java Co-authored-by: Roberto Casta?eda Lozano - Update test/hotspot/jtreg/compiler/lib/template_framework/Template.java Co-authored-by: Roberto Casta?eda Lozano - Update test/hotspot/jtreg/compiler/lib/template_framework/library/Hooks.java Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/ec7aab9e..05e96837 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=41 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=40-41 Stats: 5 lines in 2 files changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Tue May 20 15:44:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 15:44:51 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v40] In-Reply-To: References: <2TRfqPePF5VnERueckcKG9YeMKZaulJ_t1JjAIoCmso=.9c6b2139-0a59-43e6-81ce-b5bc5c649744@github.com> Message-ID: On Tue, 20 May 2025 09:00:05 GMT, Roberto Casta?eda Lozano wrote: >> @robcasloz I saw that as well. But IDE / neovim still thinks its ok. Do you know an alternative that works for `javadoc`? > > I guess it is a matter of including `test/hotspot/jtreg/testlibrary_tests/template_framework/examples` into the source path when invoking `javadoc`. But I think it is fine to treat `Test*.java` as external sources and just use `{@code TestTutorial.java}` etc. as you do elsewhere in this file. Updated :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2098300689 From asmehra at openjdk.org Tue May 20 15:49:53 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Tue, 20 May 2025 15:49:53 GMT Subject: RFR: 8357250: assert(shift >= 0 && shift < 4) failed: unexpected compressd klass shift! [v2] In-Reply-To: References: <2Y9-e32nTG2rdV_V8fY2Y3jJoiIk_jiAJLqWfNS5mYM=.4574090a-c880-48ab-8aba-2186894ee412@github.com> Message-ID: <51-9faxuUS_QCeLvmwg7fu1Z0qGi-jfSHEpNTPZSACs=.b6417873-7120-4c5d-b682-2c07e62fb52a@github.com> On Tue, 20 May 2025 15:27:32 GMT, Vladimir Kozlov wrote: >> Several fixes for AOT code generation: >> - Use `CompressedKlassPointers::max_shift()` in asserts to take into account Compact Object Headers (the asserts are present only in aarch64 code) >> - Increase table stub size on aarch64 when AOT specialized code is used for klass decoding (hit assert there otherwise) >> - Fix "copy-paste" typo in `RelocIterator()` which incorrectly overwrite the start address of instruction section (restored relocations in AOTed exception blob was wrong and AOT tests failed when deoptimization happened) >> - Removed `vm.flagless` from AOT code tests to increase testing coverage (otherwise they were run only in tier1) >> - Add additions `@requires` to tests for expected execution configuration >> >> Tested hs-tier1-10, Xcomp, stress > > Vladimir Kozlov has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > > Co-authored-by: Tobias Hartmann > - Update src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > > Co-authored-by: Tobias Hartmann looks good! ------------- Marked as reviewed by asmehra (Committer). PR Review: https://git.openjdk.org/jdk/pull/25330#pullrequestreview-2854744491 From kvn at openjdk.org Tue May 20 15:54:03 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 20 May 2025 15:54:03 GMT Subject: RFR: 8357250: assert(shift >= 0 && shift < 4) failed: unexpected compressd klass shift! [v2] In-Reply-To: <51-9faxuUS_QCeLvmwg7fu1Z0qGi-jfSHEpNTPZSACs=.b6417873-7120-4c5d-b682-2c07e62fb52a@github.com> References: <2Y9-e32nTG2rdV_V8fY2Y3jJoiIk_jiAJLqWfNS5mYM=.4574090a-c880-48ab-8aba-2186894ee412@github.com> <51-9faxuUS_QCeLvmwg7fu1Z0qGi-jfSHEpNTPZSACs=.b6417873-7120-4c5d-b682-2c07e62fb52a@github.com> Message-ID: On Tue, 20 May 2025 15:47:31 GMT, Ashutosh Mehra wrote: >> Vladimir Kozlov has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp >> >> Co-authored-by: Tobias Hartmann >> - Update src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp >> >> Co-authored-by: Tobias Hartmann > > looks good! Thank you, @ashu-mehra ------------- PR Comment: https://git.openjdk.org/jdk/pull/25330#issuecomment-2894985007 From epeter at openjdk.org Tue May 20 15:55:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 15:55:13 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v40] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 09:48:19 GMT, Roberto Casta?eda Lozano wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Offline review > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 184: > >> 182: * template can be changed when we {@code render()} it (e.g. {@link ZeroArgs#render(float)}) and the default >> 183: * fuel cost with {@link #setFuelCost}) when defining the {@link #body(Object...)}. Recursive templates are >> 184: * supposed to terminate once the {@link #fuel} is depleted (i.e. reaches zero). > > This sentence is a bit vague, maybe you could state explicitly who is responsible for ensuring termination (e.g. "Once the {@link #fuel} is depleted (i.e. reaches zero), the writer of a recursive template should ensure ..."). I updated the section a little. I hope it is better now? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2098326231 From epeter at openjdk.org Tue May 20 16:02:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 16:02:04 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 13:40:02 GMT, Roberto Casta?eda Lozano wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation fixes > > A few more documentation suggestions, will continue reviewing this changeset over the next days. @robcasloz I addressed all your comments :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2895024975 From epeter at openjdk.org Tue May 20 16:02:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 16:02:02 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v43] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more for Roberto ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/05e96837..f9457b47 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=42 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=41-42 Stats: 24 lines in 5 files changed: 4 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Tue May 20 16:02:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 16:02:05 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v40] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 09:35:47 GMT, Roberto Casta?eda Lozano wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Offline review > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 252: > >> 250: * @param function The {@link Function} that creates the {@link TemplateBody} given the template argument. >> 251: */ >> 252: record OneArgs(String arg0Name, Function function) implements Template { > > Suggestion: rename to `OneArg`. Updated! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2098335507 From epeter at openjdk.org Tue May 20 16:16:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 16:16:30 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v44] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Hook.set -> Hook.anchor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/f9457b47..ab4f358c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=43 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=42-43 Stats: 63 lines in 7 files changed: 0 ins; 0 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From sparasa at openjdk.org Tue May 20 16:28:01 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 20 May 2025 16:28:01 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v27] In-Reply-To: References: Message-ID: <_NKUzWKraGjXca9jq2QEAdYOC8tDVVB_d2ZS3hQ7VRs=.c079f2d0-ff2f-453a-bdef-ae711a0249ab@github.com> On Fri, 16 May 2025 18:18:01 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Update x86-asmtest.py to enable demotion by default and make test generation optional > > src/hotspot/cpu/x86/assembler_x86.cpp line 12948: > >> 12946: int encode = is_prefixq ? prefixq_and_encode(src_enc, dst_enc, is_map1) : prefix_and_encode(src_enc, dst_enc, is_map1); >> 12947: emit_opcode_prefix_and_encoding((unsigned char)byte1, 0xC0, encode, imm8); >> 12948: } else { > > FTR, existing demotion w.r.t to first operand is safe for all kinds to instructions, for commutative instructions, add, mul, xor, and, or, max , min etc, we can check against the second operand by passing is_commutative flags from top level assembler instruction. I am ok to handle this as part of https://bugs.openjdk.org/browse/JDK-8354348 As suggested, the demotion for commutative operations will be addressed in JDK-8354348. > src/hotspot/cpu/x86/assembler_x86.cpp line 12958: > >> 12956: void Assembler::evex_opcode_prefix_and_encode(int dst_enc, int nds_enc, int src_enc, VexSimdPrefix pre, VexOpcode opc, >> 12957: int size, int byte1, bool no_flags, bool is_map1) { >> 12958: bool is_prefixq = (size == EVEX_64bit); > > Nit pick, on line https://github.com/openjdk/jdk/pull/24431/files#diff-e3576e9c22db89236cdb906f032ff00748ff6d1c21b05277d991d80af75daf3aR12944 > you are using a conditional operator to select b/w true and false., Lets follow one convention. Please see this fixed in the updated code. > src/hotspot/cpu/x86/assembler_x86.cpp line 12962: > >> 12960: if (size == EVEX_16bit) { >> 12961: emit_int8(0x66); >> 12962: } > > I cannot find a caller that passes EVEX_16bit for the size argument. Please see the code block removed in the updated code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2098396495 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2098392314 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2098393013 From sparasa at openjdk.org Tue May 20 16:31:06 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 20 May 2025 16:31:06 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v27] In-Reply-To: References: Message-ID: On Sun, 18 May 2025 01:31:14 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Update x86-asmtest.py to enable demotion by default and make test generation optional > > src/hotspot/cpu/x86/assembler_x86.cpp line 13002: > >> 13000: } >> 13001: >> 13002: int Assembler::evex_prefix_and_encode_ndd(int dst_enc, int nds_enc, VexSimdPrefix pre, VexOpcode opc, > > Suggestion: > > int Assembler::emit_eevex_prefix_or_demote_ndd(int dst_enc, int nds_enc, VexSimdPrefix pre, VexOpcode opc, > > > Nameing suggetion There are different overloaded functions with the name evex_prefix_and_encode_ndd. Just want to confirm that this renaming suggestion must be applied to all of them with the same name (which also do demotion), right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2098403723 From epeter at openjdk.org Tue May 20 16:35:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 16:35:44 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v45] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: test for small and large weights -> expect exception ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/ab4f358c..880908aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=44 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=43-44 Stats: 71 lines in 2 files changed: 67 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From aph at openjdk.org Tue May 20 16:51:54 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 20 May 2025 16:51:54 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v4] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 13:17:26 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend (both Neon and SVE) for FP16 vector operations - add, mul, sub, div, min, max, sqrt and fma. >> >> Testing: >> JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) pass on aarch64 which also includes the JTREG test to test the FP16 vector operations - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge master > - Remove additional spaces in the aarch64_vector_ad.m4 file > - Address review comments > - 8355585: Aarch64: Add aarch64 backend for Float16 vector operations > > This patch adds aarch64 backend (both Neon and SVE) for FP16 vector > operations - add, mul, sub, div, min, max, sqrt and fma. > > Testing: > All JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) > pass on aarch64 which also includes the JTREG test to test the FP16 > vector operations - test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java Looks good. I'm assuming you've tested both SVE and Neon. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25096#pullrequestreview-2854939316 From rcastanedalo at openjdk.org Tue May 20 17:09:06 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 20 May 2025 17:09:06 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 13:40:02 GMT, Roberto Casta?eda Lozano wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more documentation fixes > > A few more documentation suggestions, will continue reviewing this changeset over the next days. > @robcasloz I addressed all your comments :) Thanks @eme64! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2895225586 From rcastanedalo at openjdk.org Tue May 20 17:09:07 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 20 May 2025 17:09:07 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v40] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 15:52:11 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 184: >> >>> 182: * template can be changed when we {@code render()} it (e.g. {@link ZeroArgs#render(float)}) and the default >>> 183: * fuel cost with {@link #setFuelCost}) when defining the {@link #body(Object...)}. Recursive templates are >>> 184: * supposed to terminate once the {@link #fuel} is depleted (i.e. reaches zero). >> >> This sentence is a bit vague, maybe you could state explicitly who is responsible for ensuring termination (e.g. "Once the {@link #fuel} is depleted (i.e. reaches zero), the writer of a recursive template should ensure ..."). > > I updated the section a little. I hope it is better now? Yes, thanks! For better readability, I suggest merging the paragraph ending with "With the indirection of such a binding, a Template can reference itself." and the paragraph starting with "The writer of recursive {@link Template}s must ensure that this recursion terminates" into a single paragraph. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2098476090 From epeter at openjdk.org Tue May 20 17:15:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 17:15:00 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v4] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 13:19:50 GMT, Bhavana Kilambi wrote: >> Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: >> >> - Merge master >> - Remove additional spaces in the aarch64_vector_ad.m4 file >> - Address review comments >> - 8355585: Aarch64: Add aarch64 backend for Float16 vector operations >> >> This patch adds aarch64 backend (both Neon and SVE) for FP16 vector >> operations - add, mul, sub, div, min, max, sqrt and fma. >> >> Testing: >> All JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) >> pass on aarch64 which also includes the JTREG test to test the FP16 >> vector operations - test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java > > Hi @theRealAph , Can I request you to please take a look at this PR? I need a "reviewer" to review my change as well for the PR to be ready for integration. Thanks in advance! @Bhavana-Kilambi I'd like to run some testing before integration, please ping me again in 24h for results :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25096#issuecomment-2895238712 From epeter at openjdk.org Tue May 20 17:20:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 17:20:48 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v46] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more tests for StructuralName ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/880908aa..7c400757 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=45 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=44-45 Stats: 271 lines in 1 file changed: 259 ins; 5 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Tue May 20 17:25:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 17:25:04 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v40] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 17:05:59 GMT, Roberto Casta?eda Lozano wrote: >> I updated the section a little. I hope it is better now? > > Yes, thanks! For better readability, I suggest merging the paragraph ending with "With the indirection of such a binding, a Template can reference itself." and the paragraph starting with "The writer of recursive {@link Template}s must ensure that this recursion terminates" into a single paragraph. Hmm. One can have recursive templates also without the Binding. That is why I though I might want to separate them. Basically, you an have a Template factory, that pumps out the same copy over and over. That way, the indirection goes over that factory, rather than over the binding. But recursion none the less, just recursion through the Template factory, rather the binding. But I also don't necessarily want to explain all of that... What do you think? Is it ok to leave separately, or do you have any other suggestions? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2098503912 From epeter at openjdk.org Tue May 20 17:49:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 20 May 2025 17:49:23 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v47] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Add verification to avoid duplicate names ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/7c400757..8d3318d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=46 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=45-46 Stats: 41 lines in 1 file changed: 40 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From jbhateja at openjdk.org Tue May 20 19:12:43 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 20 May 2025 19:12:43 GMT Subject: RFR: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry [v3] In-Reply-To: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: > PR adds missing EVEX compressed displacement attributes used for computing the scale factor (N) of compressed displacement. > AVX512 memory operand instructions use compressed disp8 encoding if the displacement is a multiple of scale (N), which depends on Vector Length, embedded broadcasting, and lane size. Please refer to section 2.7.5 of Intel SDM for more details. > > e.g., Consider two instructions, one with displacement 0x10203040 and the other with displacement 0x40, instruction operates over full 64-byte vector hence scale N = 64. Displacement of latter instruction is a multiple of scale, thus can be represented by 1 byte displacement encoding, while the former requires 4 bytes to represent displacement in instruction encoding. > > > 1) vpternlogq $0xff,0x10203040(%r20,%r21,8),%zmm23,%zmm24 > EVEX OP MR SIB DISP IMM > --------------|----|----|----|---------------|-----| > 62 6b c1 40 25 84 ec 40 30 20 10 ff > > 2) vpternlogq $0xff,0x40(%r20,%r21,8),%zmm23,%zmm24 > For full vector width operation, scalar matches with vector size, hence scale N = 64 > effective displacement / compressed DISP8 = OFFSET(64) / 64 = 0x1 > EVEX OP MR SIB DISP IMM > -------------|----|---|---|-----------|---| > 62 6b c1 40 25 44 ec 01 ff > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25021/files - new: https://git.openjdk.org/jdk/pull/25021/files/3e0f0410..0eead21e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25021&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25021&range=01-02 Stats: 15 lines in 1 file changed: 2 ins; 2 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/25021.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25021/head:pull/25021 PR: https://git.openjdk.org/jdk/pull/25021 From mdoerr at openjdk.org Tue May 20 20:14:56 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 20 May 2025 20:14:56 GMT Subject: RFR: 8357250: assert(shift >= 0 && shift < 4) failed: unexpected compressd klass shift! [v2] In-Reply-To: References: <2Y9-e32nTG2rdV_V8fY2Y3jJoiIk_jiAJLqWfNS5mYM=.4574090a-c880-48ab-8aba-2186894ee412@github.com> Message-ID: On Tue, 20 May 2025 15:27:32 GMT, Vladimir Kozlov wrote: >> Several fixes for AOT code generation: >> - Use `CompressedKlassPointers::max_shift()` in asserts to take into account Compact Object Headers (the asserts are present only in aarch64 code) >> - Increase table stub size on aarch64 when AOT specialized code is used for klass decoding (hit assert there otherwise) >> - Fix "copy-paste" typo in `RelocIterator()` which incorrectly overwrite the start address of instruction section (restored relocations in AOTed exception blob was wrong and AOT tests failed when deoptimization happened) >> - Removed `vm.flagless` from AOT code tests to increase testing coverage (otherwise they were run only in tier1) >> - Add additions `@requires` to tests for expected execution configuration >> >> Tested hs-tier1-10, Xcomp, stress > > Vladimir Kozlov has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > > Co-authored-by: Tobias Hartmann > - Update src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp > > Co-authored-by: Tobias Hartmann LGTM and tier1-2 have passed on MacOS aarch64. Thanks for fixing it! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25330#pullrequestreview-2855447162 From kvn at openjdk.org Tue May 20 20:14:56 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 20 May 2025 20:14:56 GMT Subject: RFR: 8357250: assert(shift >= 0 && shift < 4) failed: unexpected compressd klass shift! [v2] In-Reply-To: References: <2Y9-e32nTG2rdV_V8fY2Y3jJoiIk_jiAJLqWfNS5mYM=.4574090a-c880-48ab-8aba-2186894ee412@github.com> Message-ID: <9TAUaw3ybPH5ownxRH9OhzcREFU1tXaA_Li6yLc8hsM=.3a8f80cb-c192-4df8-acbd-75a9503a0bc3@github.com> On Tue, 20 May 2025 20:09:37 GMT, Martin Doerr wrote: >> Vladimir Kozlov has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp >> >> Co-authored-by: Tobias Hartmann >> - Update src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp >> >> Co-authored-by: Tobias Hartmann > > LGTM and tier1-2 have passed on MacOS aarch64. Thanks for fixing it! Thank you. @TheRealMDoerr ------------- PR Comment: https://git.openjdk.org/jdk/pull/25330#issuecomment-2895713046 From kvn at openjdk.org Tue May 20 20:14:57 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 20 May 2025 20:14:57 GMT Subject: Integrated: 8357250: assert(shift >= 0 && shift < 4) failed: unexpected compressd klass shift! In-Reply-To: <2Y9-e32nTG2rdV_V8fY2Y3jJoiIk_jiAJLqWfNS5mYM=.4574090a-c880-48ab-8aba-2186894ee412@github.com> References: <2Y9-e32nTG2rdV_V8fY2Y3jJoiIk_jiAJLqWfNS5mYM=.4574090a-c880-48ab-8aba-2186894ee412@github.com> Message-ID: On Tue, 20 May 2025 14:34:20 GMT, Vladimir Kozlov wrote: > Several fixes for AOT code generation: > - Use `CompressedKlassPointers::max_shift()` in asserts to take into account Compact Object Headers (the asserts are present only in aarch64 code) > - Increase table stub size on aarch64 when AOT specialized code is used for klass decoding (hit assert there otherwise) > - Fix "copy-paste" typo in `RelocIterator()` which incorrectly overwrite the start address of instruction section (restored relocations in AOTed exception blob was wrong and AOT tests failed when deoptimization happened) > - Removed `vm.flagless` from AOT code tests to increase testing coverage (otherwise they were run only in tier1) > - Add additions `@requires` to tests for expected execution configuration > > Tested hs-tier1-10, Xcomp, stress This pull request has now been integrated. Changeset: cedd1a53 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/cedd1a5343dceb5394b8ed5ea78bb717f05c8caf Stats: 24 lines in 5 files changed: 13 ins; 2 del; 9 mod 8357250: assert(shift >= 0 && shift < 4) failed: unexpected compressd klass shift! Reviewed-by: asmehra, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/25330 From sviswanathan at openjdk.org Tue May 20 20:33:53 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 20 May 2025 20:33:53 GMT Subject: RFR: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry [v3] In-Reply-To: References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: On Tue, 20 May 2025 19:12:43 GMT, Jatin Bhateja wrote: >> PR adds missing EVEX compressed displacement attributes used for computing the scale factor (N) of compressed displacement. >> AVX512 memory operand instructions use compressed disp8 encoding if the displacement is a multiple of scale (N), which depends on Vector Length, embedded broadcasting, and lane size. Please refer to section 2.7.5 of Intel SDM for more details. >> >> e.g., Consider two instructions, one with displacement 0x10203040 and the other with displacement 0x40, instruction operates over full 64-byte vector hence scale N = 64. Displacement of latter instruction is a multiple of scale, thus can be represented by 1 byte displacement encoding, while the former requires 4 bytes to represent displacement in instruction encoding. >> >> >> 1) vpternlogq $0xff,0x10203040(%r20,%r21,8),%zmm23,%zmm24 >> EVEX OP MR SIB DISP IMM >> --------------|----|----|----|---------------|-----| >> 62 6b c1 40 25 84 ec 40 30 20 10 ff >> >> 2) vpternlogq $0xff,0x40(%r20,%r21,8),%zmm23,%zmm24 >> For full vector width operation, scalar matches with vector size, hence scale N = 64 >> effective displacement / compressed DISP8 = OFFSET(64) / 64 = 0x1 >> EVEX OP MR SIB DISP IMM >> -------------|----|---|---|-----------|---| >> 62 6b c1 40 25 44 ec 01 ff >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions Thanks for the update. It looks like you missed changing the input_size_in_bits to EVEX_64bit for evgatherdpd. src/hotspot/cpu/x86/assembler_x86.cpp line 5355: > 5353: assert(dst != xnoreg, "sanity"); > 5354: InstructionAttr attributes(vector_len, /* rex_w */ false, /* legacy_mode */ _legacy_mode_bw, /* no_mask_reg */ true, /* uses_vl */ true); > 5355: attributes.set_address_attributes(/* tuple_type */ EVEX_QVM, /* input_size_in_bits */ EVEX_NObit); This should have remained as EVEX_HVM. src/hotspot/cpu/x86/assembler_x86.cpp line 5384: > 5382: InstructionMark im(this); > 5383: InstructionAttr attributes(vector_len, /* rex_w */ false, /* legacy_mode */ _legacy_mode_bw, /* no_mask_reg */ false, /* uses_vl */ true); > 5384: attributes.set_address_attributes(/* tuple_type */ EVEX_QVM, /* input_size_in_bits */ EVEX_NObit); This should have remained as EVEX_HVM. src/hotspot/cpu/x86/assembler_x86.cpp line 6089: > 6087: InstructionMark im(this); > 6088: InstructionAttr attributes(AVX_128bit, /* rex_w */ false, /* legacy_mode */ _legacy_mode_vlbw, /* no_mask_reg */ true, /* uses_vl */ true); > 6089: attributes.set_address_attributes(/* tuple_type */ EVEX_FV, /* input_size_in_bits */ EVEX_NObit); This should have remained as EVEX_FVM. src/hotspot/cpu/x86/assembler_x86.cpp line 11395: > 11393: assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), ""); > 11394: InstructionAttr attributes(vector_len, /* vex_w */ false,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true); > 11395: attributes.set_address_attributes(/* tuple_type */ EVEX_FV,/* input_size_in_bits */ EVEX_NObit); This should have remained as EVEX_FVM. src/hotspot/cpu/x86/assembler_x86.cpp line 11423: > 11421: assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), ""); > 11422: InstructionAttr attributes(vector_len, /* vex_w */ false,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true); > 11423: attributes.set_address_attributes(/* tuple_type */ EVEX_FV,/* input_size_in_bits */ EVEX_NObit); This should have remained as EVEX_FVM. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25021#issuecomment-2895758593 PR Review Comment: https://git.openjdk.org/jdk/pull/25021#discussion_r2098798618 PR Review Comment: https://git.openjdk.org/jdk/pull/25021#discussion_r2098801371 PR Review Comment: https://git.openjdk.org/jdk/pull/25021#discussion_r2098809418 PR Review Comment: https://git.openjdk.org/jdk/pull/25021#discussion_r2098815794 PR Review Comment: https://git.openjdk.org/jdk/pull/25021#discussion_r2098815336 From sparasa at openjdk.org Tue May 20 20:41:44 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 20 May 2025 20:41:44 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v30] In-Reply-To: References: Message-ID: <9mmlYHt_XCnr5VCccYljnfeIm4MIxKuPrOVp1-NILhk=.e8cdbb3d-e607-4280-94db-bf90efd6224a@github.com> > Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. > > For example: > > `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding > `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding Srinivas Vamsi Parasa has updated the pull request incrementally with two additional commits since the last revision: - refactor evex_opcode_prefix_and_encode_swap to a single fuction; rename functions - emit_eevex_prefix_or_demote_arith_ndd ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/c0086590..d46d40f9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=28-29 Stats: 115 lines in 2 files changed: 11 ins; 16 del; 88 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From sparasa at openjdk.org Tue May 20 20:48:58 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 20 May 2025 20:48:58 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v27] In-Reply-To: References: Message-ID: On Sun, 18 May 2025 01:27:31 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Update x86-asmtest.py to enable demotion by default and make test generation optional > > src/hotspot/cpu/x86/assembler_x86.cpp line 12973: > >> 12971: } >> 12972: >> 12973: void Assembler::evex_opcode_prefix_and_encode_swap(int dst_enc, int nds_enc, int src_enc, VexSimdPrefix pre, VexOpcode opc, > > Can we also not unify this one with evex_opcode_prefix_and_encode by passing additional swap argument The two separate functions were unified in the updated code. > src/hotspot/cpu/x86/assembler_x86.cpp line 12991: > >> 12989: >> 12990: int Assembler::evex_prefix_and_encode_ndd(int dst_enc, int nds_enc, int src_enc, VexSimdPrefix pre, VexOpcode opc, >> 12991: InstructionAttr *attributes, bool no_flags, bool use_prefixq) { > > evex_prefix_and_encode_ndd => emit_eevex_prefix_or_demote_ndd > > Naming suggestion. Changed as suggested in the updated code. > src/hotspot/cpu/x86/assembler_x86.cpp line 13024: > >> 13022: } >> 13023: >> 13024: void Assembler::evex_prefix_arith(Register dst, Register nds, int32_t imm32, VexSimdPrefix pre, VexOpcode opc, > > Suggestion: > > void Assembler::emit_eevex_prefix_or_demote_arith_ndd(Register dst, Register nds, int32_t imm32, VexSimdPrefix pre, VexOpcode opc, Please see this renaming suggestion incorporated into the latest update. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2098846041 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2098845187 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2098844309 From sparasa at openjdk.org Tue May 20 20:48:59 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 20 May 2025 20:48:59 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v27] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 16:28:26 GMT, Srinivas Vamsi Parasa wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 13002: >> >>> 13000: } >>> 13001: >>> 13002: int Assembler::evex_prefix_and_encode_ndd(int dst_enc, int nds_enc, VexSimdPrefix pre, VexOpcode opc, >> >> Suggestion: >> >> int Assembler::emit_eevex_prefix_or_demote_ndd(int dst_enc, int nds_enc, VexSimdPrefix pre, VexOpcode opc, >> >> >> Nameing suggetion > > There are different overloaded functions with the name evex_prefix_and_encode_ndd. Just want to confirm that this renaming suggestion must be applied to all of them with the same name (which also do demotion), right? This suggestion was incorporated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2098846647 From sparasa at openjdk.org Tue May 20 20:51:58 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 20 May 2025 20:51:58 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v27] In-Reply-To: References: Message-ID: On Fri, 16 May 2025 18:49:53 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Update x86-asmtest.py to enable demotion by default and make test generation optional > > src/hotspot/cpu/x86/assembler_x86.cpp line 6935: > >> 6933: InstructionAttr attributes(AVX_128bit, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); >> 6934: attributes.set_address_attributes(/* tuple_type */ EVEX_NOSCALE, /* input_size_in_bits */ EVEX_32bit); >> 6935: evex_prefix_ndd(src, dst->encoding(), 0, VEX_SIMD_NONE, VEX_OPCODE_0F_3C /* MAP4 */, &attributes, no_flags); > > To comply with existing convention like below > https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/assembler_x86.hpp#L762 > > We should use eevex instead of evex as the prefix There are some functions starting with `evex_prefix_*`; should all of those be replaced with `eevex_prefix_*` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2098851965 From dlong at openjdk.org Tue May 20 21:53:55 2025 From: dlong at openjdk.org (Dean Long) Date: Tue, 20 May 2025 21:53:55 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll In-Reply-To: References: Message-ID: On Tue, 20 May 2025 08:27:35 GMT, Marc Chevalier wrote: >> src/hotspot/share/opto/loopTransform.cpp line 1903: >> >>> 1901: jlong trip_count = (limit_con - init_con + stride_m)/new_stride_con; >>> 1902: // New trip count should satisfy next conditions. >>> 1903: assert(trip_count > 0 && (julong)trip_count <= (julong)1 << (sizeof(juint)*BitsPerByte-1), "sanity"); >> >> Suggestion: >> >> assert((julong)trip_count * 2 <= max_juint, "sanity"); >> >> This should catch negative values and any value that would make new_trip_count*2 below overflow. > > I'm not convinced this is relaxed enough or that it shouldn't overflow. Where we set trip_count: > https://github.com/openjdk/jdk/blob/e961b13cd68bc352b86af17c7e53df8537519beb/src/hotspot/share/opto/loopTransform.cpp#L133-L141 > we have a check that trip count is `< 2^32 - 1`, but it seems to me that the value of `trip_count` there might be 2^32 or 2^32-1 (same computation as the code I'm fixing). It's fine: if it would not fit in the `uint` we don't record, fine, I guess. In the code I'm touching, `old_trip_count` is the value stored in the loop head previously. In the case where the new `trip_count` is 2^31, the old_trip_count haven't been set since construction, so it's still `2^32 - 1` but without the exact flag (not sure what it means). So in the case new trip_count is 2^31, old_trip_count is 2^32-1: the `*2` overflows and we get `adjust_min_trip == true`. Which I presume is harmless (or maybe necessary?). With the version you suggest, we would guard against the overflow and allow `trip_count == 2^31-1`, but at the cost of crashing in the case of `trip_count == 2^31`, which seems possible to me (and still have the overflow happens in product). OK, I was thinking we needed to prevent the *2 below from overflowing. If we allow the *2 to overflow, then what's left is making sure the cast to uint doesn't change the value (overflow). To do that, we could relax the assert above to <= max_juint, or even better, use checked_cast to convert to uint below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25295#discussion_r2098932684 From sparasa at openjdk.org Tue May 20 23:13:46 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 20 May 2025 23:13:46 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v31] In-Reply-To: References: Message-ID: > Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. > > For example: > > `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding > `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding Srinivas Vamsi Parasa has updated the pull request incrementally with two additional commits since the last revision: - refactor evex_prefix_int8_operand - flip the swap to true ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/d46d40f9..d47ee0b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=29-30 Stats: 43 lines in 2 files changed: 3 ins; 3 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From sparasa at openjdk.org Tue May 20 23:17:57 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 20 May 2025 23:17:57 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v27] In-Reply-To: References: Message-ID: On Sat, 17 May 2025 11:54:29 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Update x86-asmtest.py to enable demotion by default and make test generation optional > > src/hotspot/cpu/x86/assembler_x86.cpp line 1657: > >> 1655: void Assembler::eandl(Register dst, Register src1, Address src2, bool no_flags) { >> 1656: InstructionMark im(this); >> 1657: evex_prefix_int8_operand(dst, src1, src2, VEX_SIMD_NONE, VEX_OPCODE_0F_3C /* MAP4 */, EVEX_32bit, 0x23, no_flags); > > Nomenclature of routine is not very clear here, what do you mean by int8 operand, this is operating over 32bit word. Please see the name changed to `evex_opcode_prefix_and_encode` in the updated code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2099012422 From sviswanathan at openjdk.org Tue May 20 23:30:59 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 20 May 2025 23:30:59 GMT Subject: RFR: 8349138: Optimize Math.copySign API for Intel e-core targets [v4] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 17:28:48 GMT, Jatin Bhateja wrote: >> Math.copySign is only intrinsified on x86 targets supporting the AVX512 feature. >> Intel E-core Xeons support only the AVX2 feature set and still compile Java implementation which is composed of logical operations. >> >> Since there is a 3-cycle penalty for copying incoming float/double values to GPRs before being operated upon by logical operation there is an opportunity to optimize this using an efficient instruction sequence. >> >> Patch uses ANDPS and ANDPD logical instruction to generate efficient instruction sequences to absorb domain copy over penalty. Also, performs minor tuning for existing AVX512 instruction sequence based on VPTERNLOG instruction. >> >> Following are the performance numbers of the following existing microbenchmark >> https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/vm/compiler/Signum.java >> >> Patch passes following validation test >> [test/jdk/java/lang/Math/IeeeRecommendedTests.java >> ](https://github.com/openjdk/jdk/blob/master/test/jdk/java/lang/Math/IeeeRecommendedTests.java) >> >> >> Granite Rapids-AP (P-core Xeon) >> Baseline AVX512: >> Benchmark Mode Cnt Score Error Units >> Signum._5_copySignFloatTest thrpt 2 1296.141 ops/ns >> Signum._7_copySignDoubleTest thrpt 2 838.954 ops/ns >> >> Withopt : >> Benchmark Mode Cnt Score Error Units >> Signum._5_copySignFloatTest thrpt 2 940.240 ops/ns >> Signum._7_copySignDoubleTest thrpt 2 967.370 ops/ns >> >> Baseline AVX2: >> Benchmark Mode Cnt Score Error Units >> Signum._5_copySignFloatTest thrpt 2 63.673 ops/ns >> Signum._7_copySignDoubleTest thrpt 2 26.898 ops/ns >> >> Withopt : >> Benchmark Mode Cnt Score Error Units >> Signum._5_copySignFloatTest thrpt 2 785.801 ops/ns >> Signum._7_copySignDoubleTest thrpt 2 558.710 ops/ns >> >> Sierra Forest (E-core Xeon) >> Baseline: >> Benchmark (seed) Mode Cnt Score Error Units >> o.o.b.vm.compiler.Signum._5_copySignFloatTest N/A thrpt 2 40.528 ops/ns >> o.o.b.vm.compiler.Signum._7_copySignDoubleTest N/A thrpt 2 25.101 ops/ns >> >> Withopt: >> Benchmark (seed) Mode Cnt Score Error Units >> o.o.b.vm.compiler.Signum._5_copySignFloatTest N/A thrpt 2 676.... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Review comments resolutions > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8349138 > - Adding vector support along with some refactoring. > - Adding IR framework verification test > - 8349138: Optimize Math.copySign API for Intel e-core and p-core targets src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 7157: > 7155: vpsllw(dst, src, shift, vlen_enc); > 7156: } else if (elem_sz == 4) { > 7157: vpslld(dst, src, shift, vlen_enc); AVX 1 supports 256-bit float/double vector and only128-bit vpsll, vpsrl, vpor for integer vectors. So you will have issues on AVX 1 platform for 256bit float/double vector copysign implementation using vpsll, vpsrl, vpor. src/hotspot/cpu/x86/x86.ad line 6525: > 6523: %} > 6524: > 6525: #ifdef _LP64 _LP64 ifdef no more needed in .ad file (32 bit support has been removed). src/hotspot/cpu/x86/x86.ad line 6551: > 6549: #endif // _LP64 > 6550: > 6551: instruct copySignF_reg_avx(regF dst, regF src, regF xtmp) %{ These should be vlRegF. src/hotspot/cpu/x86/x86.ad line 6562: > 6560: %} > 6561: > 6562: instruct copySignD_imm_avx(regD dst, regD src, regD xtmp, immD zero) %{ These should be vlRegD. src/hotspot/cpu/x86/x86.ad line 6577: > 6575: match(Set dst (CopySignVF dst src)); > 6576: match(Set dst (CopySignVD dst src)); > 6577: effect(TEMP xtmp); vector_copy_sign_avx needs TEMP dst so may need two different instruct rules. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23386#discussion_r2098998098 PR Review Comment: https://git.openjdk.org/jdk/pull/23386#discussion_r2099009546 PR Review Comment: https://git.openjdk.org/jdk/pull/23386#discussion_r2098976333 PR Review Comment: https://git.openjdk.org/jdk/pull/23386#discussion_r2098977882 PR Review Comment: https://git.openjdk.org/jdk/pull/23386#discussion_r2098980097 From sparasa at openjdk.org Wed May 21 00:10:57 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 21 May 2025 00:10:57 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v27] In-Reply-To: References: Message-ID: <2NaspRzl10B9GKvX6njeN0Gr757FOTmyHiTLxA1NVK4=.7309dbee-6733-45bc-a5d8-6ab624b6a03d@github.com> On Sun, 18 May 2025 01:20:30 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Update x86-asmtest.py to enable demotion by default and make test generation optional > > src/hotspot/cpu/x86/assembler_x86.cpp line 12942: > >> 12940: } >> 12941: >> 12942: void Assembler::evex_opcode_prefix_and_encode(int dst_enc, int nds_enc, int src_enc, int8_t imm8, VexSimdPrefix pre, VexOpcode opc, > > Only difference b/w this method and one below is that it accepts an immediate shift count and modifies the opcode if we domete the instruction to legacy / REX2 variant. > > Demotion logic and rest of the logic is exaclty same. Should we merge these into one and then based on the incoming opcode i.e. if its 0x24 or 0x2C we chonsider immediate shift and associated opcode pruning if demoted. The method below this got merged with the swap version and got bulkier. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2099063256 From sparasa at openjdk.org Wed May 21 00:16:37 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 21 May 2025 00:16:37 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v32] In-Reply-To: References: Message-ID: > Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. > > For example: > > `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding > `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: rename evex_opcode_prefix_and_encode with imm8 to emit_eevex_or_demote ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/d47ee0b5..95190a1e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=30-31 Stats: 10 lines in 2 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From sparasa at openjdk.org Wed May 21 00:16:37 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 21 May 2025 00:16:37 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v27] In-Reply-To: <4xANhUPLFW33T1AmImCqr0L7LjQxdtmzdBULh6T5bEk=.95918caf-b803-42b6-8f16-4daa0f0028d0@github.com> References: <4xANhUPLFW33T1AmImCqr0L7LjQxdtmzdBULh6T5bEk=.95918caf-b803-42b6-8f16-4daa0f0028d0@github.com> Message-ID: <9HFEMXFnfxK0fLcjWx1eID8NZYWQ3WLhg5p7e5OopJI=.c75b4e9b-32b3-4ba3-b34c-9537dbd8d021@github.com> On Sun, 18 May 2025 01:07:06 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 6809: >> >>> 6807: >>> 6808: void Assembler::eshldl(Register dst, Register src1, Register src2, int8_t imm8, bool no_flags) { >>> 6809: evex_opcode_prefix_and_encode(dst->encoding(), src1->encoding(), src2->encoding(), imm8, VEX_SIMD_NONE, VEX_OPCODE_0F_3C /* MAP4 */, EVEX_32bit, 0xA4, no_flags, true /* is_map1 */); >> >> Here the opcode is set to 0xA4 which is correct for demotion case, in case we don't demote then opcode is 0x24. This is only relevant to NDD flavours of shldl/shldq with imm8 shift values. I will suggest adding a comment here giving clear explanation as it not intiutive at first glance and manul clearly specify 0x24 as the opcode. > > Better idea to pass 0x24 opcode from the top level which is what manual says and do the appropriate adjustments by oring with 0x80 if we demote it to REX2 / REX prefixed instruction Please see this suggestion incorporated in the latest update. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2099066350 From sparasa at openjdk.org Wed May 21 00:16:38 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 21 May 2025 00:16:38 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v27] In-Reply-To: References: Message-ID: <5xHb0NcnnM3Ym0XDZnPpEC6zwjq6CT7iw6OMPdZfdV0=.cca6c187-a4bf-4a2e-8e36-0ed72bf44825@github.com> On Sun, 18 May 2025 01:14:21 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Update x86-asmtest.py to enable demotion by default and make test generation optional > > src/hotspot/cpu/x86/assembler_x86.cpp line 6827: > >> 6825: >> 6826: void Assembler::eshrdl(Register dst, Register src1, Register src2, int8_t imm8, bool no_flags) { >> 6827: evex_opcode_prefix_and_encode(dst->encoding(), src1->encoding(), src2->encoding(), imm8, VEX_SIMD_NONE, VEX_OPCODE_0F_3C /* MAP4 */, EVEX_32bit, 0xAC, no_flags, true /* is_map1 */); > > Suggestion: > > emit_eevex_or_demote(dst->encoding(), src1->encoding(), src2->encoding(), imm8, VEX_SIMD_NONE, VEX_OPCODE_0F_3C /* MAP4 */, EVEX_32bit, 0xAC, no_flags, true /* is_map1 */); > > > Above nomenclature looks more appropriate here. Please see the name changed in the latest update. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2099066617 From sparasa at openjdk.org Wed May 21 00:46:32 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 21 May 2025 00:46:32 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v33] In-Reply-To: References: Message-ID: > Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. > > For example: > > `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding > `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: Rename single argument evex_opcode_prefix_encode to emit_eevex ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/95190a1e..9a517c2f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=31-32 Stats: 6 lines in 2 files changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From sparasa at openjdk.org Wed May 21 00:52:03 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 21 May 2025 00:52:03 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v23] In-Reply-To: References: <6aZaHfVvUJFLz83fyZ42bnoSGseaRBYd0jEg_VLdS2Q=.4c681def-ee7c-4fcd-b147-348d317ac58f@github.com> <1e-92EcDWshsTiFbEmJt8z5SAVfhf5vpr8sgbEq3BbQ=.25d6d5f7-48d3-4a13-ac7d-8844844490fa@github.com> Message-ID: On Fri, 16 May 2025 16:07:32 GMT, Jatin Bhateja wrote: >>> We are only handling first variant of NDD instruction[1] in python test script , please extend the script to cover second variant[2] also. eaddq(Register dst, Register src1, Address src2, bool no_flags) - [1] eaddq(Register dst, Address src1, Register src2, bool no_flags) - [2] >> >> Hank's script is already handling the variant[2] in the `RegMemRegNddInstruction` class, for which no demotion is enabled. The demotion is enabled only for variant[1]. > >> > Hi @vamsi-parasa , I don't see demotion tests being generated with full mode gtest, i.e. python3 x86-asmtest.py --full >> >> Please see the updated `x86-asmtest.py` refactored to work with full set (`--full`). Please let me know if anything is missing. > > Hi @vamsi-parasa , > I am seeing some failures with --full mode when ENABLE_DEMOTION=False > /home/jatinbha/sandboxes/apx-release/jdk/test/hotspot/gtest/x86/test_assembler_x86.cpp:61: Failure > Failed > __ ecmovq (Assembler::Condition::greater, r31, r31, Address(rcx, rdx, (Address::ScaleFactor)0, +0x3c8d1915)); > OpenJDK: cc cc cc cc cc cc cc cc cc cc cc > GNU Assembler: 62 64 84 10 4f bc 11 15 19 8d 3c > [ FAILED ] AssemblerX86.validate_vm (13562 ms) > [----------] 1 test from AssemblerX86 (13708 ms total) Hi Jatin (@jatin-bhateja), Could you please look at the latest changes with almost all of your suggestions? Please let me know if anything else is missing. Thanks, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2896151024 From jbhateja at openjdk.org Wed May 21 01:05:24 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 21 May 2025 01:05:24 GMT Subject: RFR: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry [v4] In-Reply-To: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: > PR adds missing EVEX compressed displacement attributes used for computing the scale factor (N) of compressed displacement. > AVX512 memory operand instructions use compressed disp8 encoding if the displacement is a multiple of scale (N), which depends on Vector Length, embedded broadcasting, and lane size. Please refer to section 2.7.5 of Intel SDM for more details. > > e.g., Consider two instructions, one with displacement 0x10203040 and the other with displacement 0x40, instruction operates over full 64-byte vector hence scale N = 64. Displacement of latter instruction is a multiple of scale, thus can be represented by 1 byte displacement encoding, while the former requires 4 bytes to represent displacement in instruction encoding. > > > 1) vpternlogq $0xff,0x10203040(%r20,%r21,8),%zmm23,%zmm24 > EVEX OP MR SIB DISP IMM > --------------|----|----|----|---------------|-----| > 62 6b c1 40 25 84 ec 40 30 20 10 ff > > 2) vpternlogq $0xff,0x40(%r20,%r21,8),%zmm23,%zmm24 > For full vector width operation, scalar matches with vector size, hence scale N = 64 > effective displacement / compressed DISP8 = OFFSET(64) / 64 = 0x1 > EVEX OP MR SIB DISP IMM > -------------|----|---|---|-----------|---| > 62 6b c1 40 25 44 ec 01 ff > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25021/files - new: https://git.openjdk.org/jdk/pull/25021/files/0eead21e..030b5dc0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25021&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25021&range=02-03 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25021.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25021/head:pull/25021 PR: https://git.openjdk.org/jdk/pull/25021 From jbhateja at openjdk.org Wed May 21 01:05:25 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 21 May 2025 01:05:25 GMT Subject: RFR: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry [v3] In-Reply-To: References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: On Tue, 20 May 2025 20:21:07 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > src/hotspot/cpu/x86/assembler_x86.cpp line 5355: > >> 5353: assert(dst != xnoreg, "sanity"); >> 5354: InstructionAttr attributes(vector_len, /* rex_w */ false, /* legacy_mode */ _legacy_mode_bw, /* no_mask_reg */ true, /* uses_vl */ true); >> 5355: attributes.set_address_attributes(/* tuple_type */ EVEX_QVM, /* input_size_in_bits */ EVEX_NObit); > > This should have remained as EVEX_HVM. Correct. https://www.felixcloutier.com/x86/pmovzx#:~:text=VPMOVZXBW%20xmm1%20%7Bk1%7D%7Bz%7D%2C%20xmm2/m64-,B,-V/V https://www.felixcloutier.com/x86/pmovzx#:~:text=N/A-,B,Half%20Mem,-ModRM%3Areg%20(w Good. > src/hotspot/cpu/x86/assembler_x86.cpp line 5384: > >> 5382: InstructionMark im(this); >> 5383: InstructionAttr attributes(vector_len, /* rex_w */ false, /* legacy_mode */ _legacy_mode_bw, /* no_mask_reg */ false, /* uses_vl */ true); >> 5384: attributes.set_address_attributes(/* tuple_type */ EVEX_QVM, /* input_size_in_bits */ EVEX_NObit); > > This should have remained as EVEX_HVM. https://www.felixcloutier.com/x86/pmovzx#:~:text=VPMOVZXBW%20zmm1%20%7Bk1%7D%7Bz%7D%2C%20ymm2/m256-,B,-V/V https://www.felixcloutier.com/x86/pmovzx#:~:text=N/A-,B,Half%20Mem,-ModRM%3Areg%20(w Good > src/hotspot/cpu/x86/assembler_x86.cpp line 6089: > >> 6087: InstructionMark im(this); >> 6088: InstructionAttr attributes(AVX_128bit, /* rex_w */ false, /* legacy_mode */ _legacy_mode_vlbw, /* no_mask_reg */ true, /* uses_vl */ true); >> 6089: attributes.set_address_attributes(/* tuple_type */ EVEX_FV, /* input_size_in_bits */ EVEX_NObit); > > This should have remained as EVEX_FVM. Good > src/hotspot/cpu/x86/assembler_x86.cpp line 11395: > >> 11393: assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), ""); >> 11394: InstructionAttr attributes(vector_len, /* vex_w */ false,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true); >> 11395: attributes.set_address_attributes(/* tuple_type */ EVEX_FV,/* input_size_in_bits */ EVEX_NObit); > > This should have remained as EVEX_FVM. Correct, https://www.felixcloutier.com/x86/pabsb:pabsw:pabsd:pabsq#:~:text=EVEX.512.66.0F38.WIG%201C%20/r%20VPABSB%20zmm1%20%7Bk1%7D%7Bz%7D%2C%20zmm2/m512 Good. > src/hotspot/cpu/x86/assembler_x86.cpp line 11423: > >> 11421: assert(VM_Version::supports_avx512bw() && (vector_len == AVX_512bit || VM_Version::supports_avx512vl()), ""); >> 11422: InstructionAttr attributes(vector_len, /* vex_w */ false,/* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true); >> 11423: attributes.set_address_attributes(/* tuple_type */ EVEX_FV,/* input_size_in_bits */ EVEX_NObit); > > This should have remained as EVEX_FVM. Correct, It does not have an embedded broadcasting variant. https://www.felixcloutier.com/x86/pabsb:pabsw:pabsd:pabsq#:~:text=EVEX.512.66.0F38.WIG%201D%20/r%20VPABSW%20zmm1%20%7Bk1%7D%7Bz%7D%2C%20zmm2/m512 Good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25021#discussion_r2099102197 PR Review Comment: https://git.openjdk.org/jdk/pull/25021#discussion_r2099102266 PR Review Comment: https://git.openjdk.org/jdk/pull/25021#discussion_r2099102793 PR Review Comment: https://git.openjdk.org/jdk/pull/25021#discussion_r2099102382 PR Review Comment: https://git.openjdk.org/jdk/pull/25021#discussion_r2099102325 From xgong at openjdk.org Wed May 21 01:33:57 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Wed, 21 May 2025 01:33:57 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v4] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 13:17:26 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend (both Neon and SVE) for FP16 vector operations - add, mul, sub, div, min, max, sqrt and fma. >> >> Testing: >> JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) pass on aarch64 which also includes the JTREG test to test the FP16 vector operations - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` > > Bhavana Kilambi has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Merge master > - Remove additional spaces in the aarch64_vector_ad.m4 file > - Address review comments > - 8355585: Aarch64: Add aarch64 backend for Float16 vector operations > > This patch adds aarch64 backend (both Neon and SVE) for FP16 vector > operations - add, mul, sub, div, min, max, sqrt and fma. > > Testing: > All JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) > pass on aarch64 which also includes the JTREG test to test the FP16 > vector operations - test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java LGTM. Thanks! ------------- Marked as reviewed by xgong (Committer). PR Review: https://git.openjdk.org/jdk/pull/25096#pullrequestreview-2855968108 From jbhateja at openjdk.org Wed May 21 01:57:35 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 21 May 2025 01:57:35 GMT Subject: RFR: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry [v5] In-Reply-To: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: > PR adds missing EVEX compressed displacement attributes used for computing the scale factor (N) of compressed displacement. > AVX512 memory operand instructions use compressed disp8 encoding if the displacement is a multiple of scale (N), which depends on Vector Length, embedded broadcasting, and lane size. Please refer to section 2.7.5 of Intel SDM for more details. > > e.g., Consider two instructions, one with displacement 0x10203040 and the other with displacement 0x40, instruction operates over full 64-byte vector hence scale N = 64. Displacement of latter instruction is a multiple of scale, thus can be represented by 1 byte displacement encoding, while the former requires 4 bytes to represent displacement in instruction encoding. > > > 1) vpternlogq $0xff,0x10203040(%r20,%r21,8),%zmm23,%zmm24 > EVEX OP MR SIB DISP IMM > --------------|----|----|----|---------------|-----| > 62 6b c1 40 25 84 ec 40 30 20 10 ff > > 2) vpternlogq $0xff,0x40(%r20,%r21,8),%zmm23,%zmm24 > For full vector width operation, scalar matches with vector size, hence scale N = 64 > effective displacement / compressed DISP8 = OFFSET(64) / 64 = 0x1 > EVEX OP MR SIB DISP IMM > -------------|----|---|---|-----------|---| > 62 6b c1 40 25 44 ec 01 ff > > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review Resoultions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25021/files - new: https://git.openjdk.org/jdk/pull/25021/files/030b5dc0..ca4e714c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25021&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25021&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25021.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25021/head:pull/25021 PR: https://git.openjdk.org/jdk/pull/25021 From fjiang at openjdk.org Wed May 21 02:00:55 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 21 May 2025 02:00:55 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v14] In-Reply-To: <-Ig2biJjwMoR79hyYfSNxJLqavcMzVyLFZvnV0J_t90=.4eb702b5-2cbd-40c6-81fd-744a2fe98acf@github.com> References: <-Ig2biJjwMoR79hyYfSNxJLqavcMzVyLFZvnV0J_t90=.4eb702b5-2cbd-40c6-81fd-744a2fe98acf@github.com> Message-ID: On Tue, 20 May 2025 12:50:08 GMT, Anjian-Wen wrote: >> From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time >> >> on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below >> >> before the patch >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op >> MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op >> MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op >> MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op >> MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op >> MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op >> MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op >> MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op >> MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op >> MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op >> MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op >> MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op >> MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op >> MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op >> MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op >> MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op >> MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op >> MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op >> MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/o... > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > change all the t0 with tmp_reg Looks good! Thanks! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/23890#pullrequestreview-2856006591 From fyang at openjdk.org Wed May 21 03:03:54 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 21 May 2025 03:03:54 GMT Subject: RFR: 8357056: RISC-V: Asm fixes - load/store width [v2] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 12:02:43 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> While working on https://github.com/openjdk/jdk/pull/25252, I notice: >> - Major op code was just repeat >> - Width coded in binary >> - Stores have mixed up rs1 and rs2 >> - Bonus, fsd used a macro for no reason >> >> I think this improves readability. >> >> Tested tier1 >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Fixed flh/flw/fld > - Merge branch 'master' into asm_fixes > - Fixes Latest version looks great! Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25253#pullrequestreview-2856098290 From chagedorn at openjdk.org Wed May 21 05:14:51 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 21 May 2025 05:14:51 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence In-Reply-To: References: Message-ID: <_y8LMhDg7b8EuWLikdsmgK0nUCCh0Y2PH0LX_-TpsD4=.fa9bb975-a850-4223-825e-726dcc5a74f2@github.com> On Tue, 20 May 2025 00:49:49 GMT, Vladimir Ivanov wrote: > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks src/hotspot/share/opto/phasetype.hpp line 88: > 86: flags(PHASEIDEALLOOP2, "PhaseIdealLoop 2") \ > 87: flags(PHASEIDEALLOOP3, "PhaseIdealLoop 3") \ > 88: flags(OPTIMIZE_RF, "Optimize Reachability Fences") \ Another drive-by comment: I suggest to use the full word since most people are probably not aware of this abbreviation when looking at graph dumps in IGV. You should also add this phase to the IR framework [CompilePhases](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/CompilePhase.java). Suggestion: flags(OPTIMIZE_REACHABILITY_FENCES, "Optimize Reachability Fences") \ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2099361108 From roland at openjdk.org Wed May 21 07:45:56 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 21 May 2025 07:45:56 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v22] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 14:59:18 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request incrementally with 11 additional commits since the last revision: > > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/graphKit.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - ... and 1 more: https://git.openjdk.org/jdk/compare/b0129598...2164c15f > I also looked back at the results, I think it was this: [#21630 (comment)](https://github.com/openjdk/jdk/pull/21630#issuecomment-2587016221) > > Nice to see that your patch is faster for small iterations. But what I am missing: where do the lines cross? I.e. at what iteration count does the loop-nest become profitable? The lines never cross. There's a constant benefit from running with this change: HeapMismatchManualLoopTest.segment_mismatch 4 avgt 30 2.922 ? 0.004 ns/op vs HeapMismatchManualLoopTest.segment_mismatch 4 avgt 30 5.287 ? 0.005 ns/op around 2.5 ns/op For a larger size: HeapMismatchManualLoopTest.segment_mismatch 1024 avgt 30 165.201 ? 0.077 ns/op vs HeapMismatchManualLoopTest.segment_mismatch 1024 avgt 30 168.175 ? 0.131 ns/op The ~2.5 ns/op difference is still there but doesn't weight much anymore. For an ever larger size: HeapMismatchManualLoopTest.segment_mismatch 131072 avgt 30 20922.112 ? 124.683 ns/op vs HeapMismatchManualLoopTest.segment_mismatch 131072 avgt 30 20995.956 ? 191.092 ns/op it should still be there but gets lost in the noise. The only reason there was a `ShortLoopIter` in the initial patch was because settingto the value of `LoopStripMiningIter` allows removable of the outer strip mined loop as well. But as @merykitty remarked what we can win from doing this we could loose elsewhere because the simplification would not trigger if profile data is polluted. The current patch skips the loop nest for a number of iterations that is as large as possible and that should be a very large number of iterations (hundreds of millions). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2896932626 From roland at openjdk.org Wed May 21 07:45:58 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 21 May 2025 07:45:58 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v22] In-Reply-To: References: Message-ID: On Mon, 19 May 2025 09:08:41 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/c2_globals.hpp line 839: >> >>> 837: product(bool, ShortRunningLongLoop, true, DIAGNOSTIC, \ >>> 838: "long counted loop/long range checks: don't create loop nest if " \ >>> 839: "loop runs for small enough number of iterations.") \ >> >> It sounds like we are doing this: >> Disable an exception, which disables an optimization. >> >> This double negation can be a little confusing / ambiguous. >> >> I wonder if we should instead have a limit here, which we can move to 0, or higher. >> That would allow us to benchmark with different levels more easily too. > > It could be named `ShortRunningLongLoopIterationLimit`. See https://github.com/openjdk/jdk/pull/21630#issuecomment-2896932626 I don't think this makes sense. As long as we can avoid the loop nest, that should beneficial. There's no benefit to the loop nest but it can be required for correctness. So I don't expect we want to tune anything. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099594700 From bkilambi at openjdk.org Wed May 21 07:52:00 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 21 May 2025 07:52:00 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v4] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 16:49:20 GMT, Andrew Haley wrote: > Looks good. I'm assuming you've tested both SVE and Neon. Yes, this was tested on both SVE and Neon (N1/V1/V2 architectures). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25096#issuecomment-2896952785 From roland at openjdk.org Wed May 21 07:52:55 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 21 May 2025 07:52:55 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v23] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 74 commits: - more - compilation fix - new benchmark - Add benchmark for manual mismatch loop - review - Merge branch 'master' into JDK-8342692 - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/graphKit.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.hpp Co-authored-by: Christian Hagedorn - ... and 64 more: https://git.openjdk.org/jdk/compare/d1032d71...a0726ac7 ------------- Changes: https://git.openjdk.org/jdk/pull/21630/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=22 Stats: 1485 lines in 25 files changed: 1431 ins; 13 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From roland at openjdk.org Wed May 21 07:52:56 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 21 May 2025 07:52:56 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v22] In-Reply-To: References: Message-ID: On Mon, 19 May 2025 08:10:14 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with 11 additional commits since the last revision: >> >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/graphKit.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.hpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.hpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - ... and 1 more: https://git.openjdk.org/jdk/compare/b0129598...2164c15f > > src/hotspot/share/opto/loopnode.hpp line 219: > >> 217: >> 218: virtual void set_trip_count(julong tc) = 0; >> 219: virtual julong trip_count() = 0; > > GitHub Actions seems to disagree with something here ;) > > /home/runner/work/jdk/jdk/src/hotspot/share/opto/loopnode.hpp:219:18: error: ?virtual julong BaseCountedLoopNode::trip_count()? was hidden [-Werror=overloaded-virtual] > 219 | virtual julong trip_count() = 0; > | ^~~~~~~~~~ > /home/runner/work/jdk/jdk/src/hotspot/share/opto/loopnode.hpp:310:10: note: by ?julong CountedLoopNode::trip_count() const? > 310 | julong trip_count() const { return _trip_count; } > | ^~~~~~~~~~ Right. I wasn't done with @chhagedorn 's review. I was waiting for the answer to: https://github.com/openjdk/jdk/pull/21630#discussion_r2091476799 I pushed those changes now (but I still Christian to clarify his comment) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099606003 From fjiang at openjdk.org Wed May 21 07:54:00 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 21 May 2025 07:54:00 GMT Subject: RFR: 8356924: RISC-V: Clean up cost for vector instructions [v2] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 08:40:35 GMT, Dingli Zhang wrote: >> As mentioned in https://bugs.openjdk.org/browse/JDK-8285790 regarding the ARM64 vector instruct modifications: >> Since the new rules are unique and setting different "ins_cost" makes no sense, we have switched to using the default cost. >> >> Currently, there is a similar situation on RISC-V. Over half of the instructions in riscv_v.ad do not include ins_cost definitions. Additionally, as RVV nodes are also unique, we can unify the format by removing these ins_cost entries from riscv_v.ad. >> >> ### Testing >> qemu-system 9.2.3 with UseRVV (ubuntu24.10): >> * [x] Run test/jdk/jdk/incubator/vector (fastdebug) >> * [x] Run test/hotspot/jtreg/compiler/vectorapi (fastdebug) > > Dingli Zhang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Remove added ins_cost(VEC_COST) due to merging the main branch > - Merge branch 'master' into master-remove-ins_cost > - 8356924: RISC-V: Clean up cost for vector instructions Looks fine. ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/25221#pullrequestreview-2856679495 From roland at openjdk.org Wed May 21 08:00:56 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 21 May 2025 08:00:56 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v24] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 75 commits: - Merge branch 'master' into JDK-8342692 - more - compilation fix - new benchmark - Add benchmark for manual mismatch loop - review - Merge branch 'master' into JDK-8342692 - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/graphKit.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - ... and 65 more: https://git.openjdk.org/jdk/compare/50a7755f...6100f757 ------------- Changes: https://git.openjdk.org/jdk/pull/21630/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=23 Stats: 1485 lines in 25 files changed: 1431 ins; 13 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From roland at openjdk.org Wed May 21 08:00:57 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 21 May 2025 08:00:57 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v16] In-Reply-To: References: Message-ID: On Mon, 19 May 2025 09:01:53 GMT, Maurizio Cimadamore wrote: > > Right. But it's Maurizio's benchmark. I think it would make sense to integrate it separately. What do you think @mcimadamore ? > > I tend to agree that it would be better to have the benchmark checked in as part of this change. Sure. Sounds good to me. I've added the test I have. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2896974911 From epeter at openjdk.org Wed May 21 08:05:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 May 2025 08:05:51 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v4] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 07:49:10 GMT, Bhavana Kilambi wrote: >> Looks good. I'm assuming you've tested both SVE and Neon. > >> Looks good. I'm assuming you've tested both SVE and Neon. > > Yes, this was tested on both SVE and Neon (N1/V1/V2 architectures). @Bhavana-Kilambi I'm getting timeouts with your new test: `compiler/vectorization/TestFloat16VectorOperations.java` At least on `linux-aarch64-debug` and `windows-x64-debug`, but not all tests have completed yet, so more could be failing. Not sure if it is relevant, but both had extra flag `-XX:-UseTLAB`, we add this flag in our additional stress testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25096#issuecomment-2896991985 From epeter at openjdk.org Wed May 21 08:09:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 May 2025 08:09:22 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v48] In-Reply-To: References: Message-ID: <_-o-2dMr_bJwXL4RIPC_v36vS1c9AI3RmpDyFI3qohY=.c28e53a3-4638-4956-916c-712792a4bcfe@github.com> > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: tests for name duplication ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/8d3318d7..dc70aed9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=47 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=46-47 Stats: 156 lines in 1 file changed: 154 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From shade at openjdk.org Wed May 21 08:28:27 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 21 May 2025 08:28:27 GMT Subject: RFR: 8357434: x86: Simplify Interpreter::profile_taken_branch Message-ID: Noticed that `Interpreter::profile_taken_branch` has the same `sbbptr` pattern we have eliminated with [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946). The same logic applies here: the counter is 64-bit, never practically overflows, and no other code cares about it. Also tidied up some comments around it. Additional testing; - [x] Linux x86_64 server fastdebug, `tier1 tier2` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/25343/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25343&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357434 Stats: 16 lines in 3 files changed: 0 ins; 11 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25343.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25343/head:pull/25343 PR: https://git.openjdk.org/jdk/pull/25343 From epeter at openjdk.org Wed May 21 08:38:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 May 2025 08:38:56 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v49] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: generateWithHashtagAndDollarReplacements2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/dc70aed9..4206d647 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=48 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=47-48 Stats: 44 lines in 1 file changed: 38 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From chagedorn at openjdk.org Wed May 21 08:40:09 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 21 May 2025 08:40:09 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v20] In-Reply-To: References: <8qCb90dePuowml3bmtaa-dWvdY57rYEg1MfFHRIRAro=.da8f8cc2-96aa-486c-a616-dbbdb123a003@github.com> Message-ID: On Thu, 15 May 2025 15:31:57 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopnode.cpp line 1172: >> >>> 1170: // Only process if we are in the correct Predicate Block. >>> 1171: return; >>> 1172: } >> >> Do we really need this check? Could we not just clone all Template Assertion Predicates that we find? I think with the recent Assertion Predicate changes, we are sure that all Template Assertion Predicates found belong to this loop. Otherwise, they would already be marked useless and `visit()` is not called on them. > > Well, I trust you on that. Things have changed quite a bit recently with Assertion Predicates and it's hard to keep up! That's true, a lot has changed. But I think it should be safe. Are you planning to integrate this in JDK 25 or could it also go into JDK 26? In the latter case, we have some more baking time. >> src/hotspot/share/opto/loopnode.cpp line 1191: >> >>> 1189: loop->compute_trip_count(this, bt); >>> 1190: // Loop must run for no more than iter_limits as it guarantees no overflow of scale * iv in long range checks. >>> 1191: bool known_short_running_loop = head->trip_count() <= iters_limit / ABS(stride_con); >> >> Can you also add a comment about the decision of the hardcoded `iters_limit / ABS(stride_con)` limit to indicate a short running long loop? > > Rather than `ShortLoopIter`? Or is it something else that you'd like to be better explained? Yes, rather than `ShortLoopIter`. One could think that it could be a good idea to have such a flag. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099701484 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099704551 From roland at openjdk.org Wed May 21 08:40:10 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 21 May 2025 08:40:10 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v20] In-Reply-To: References: <8qCb90dePuowml3bmtaa-dWvdY57rYEg1MfFHRIRAro=.da8f8cc2-96aa-486c-a616-dbbdb123a003@github.com> Message-ID: On Wed, 21 May 2025 08:34:47 GMT, Christian Hagedorn wrote: >> Well, I trust you on that. Things have changed quite a bit recently with Assertion Predicates and it's hard to keep up! > > That's true, a lot has changed. But I think it should be safe. Are you planning to integrate this in JDK 25 or could it also go into JDK 26? In the latter case, we have some more baking time. JDK 26 sounds ok to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099704534 From chagedorn at openjdk.org Wed May 21 08:40:12 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 21 May 2025 08:40:12 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v22] In-Reply-To: References: Message-ID: <9rahRXZYfhJgwGH3fBJ-2riIuIcKJmMh4brJAmRDmN4=.d1b4bcaa-4886-415f-9a64-cb15570483c3@github.com> On Wed, 21 May 2025 07:49:01 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopnode.hpp line 219: >> >>> 217: >>> 218: virtual void set_trip_count(julong tc) = 0; >>> 219: virtual julong trip_count() = 0; >> >> GitHub Actions seems to disagree with something here ;) >> >> /home/runner/work/jdk/jdk/src/hotspot/share/opto/loopnode.hpp:219:18: error: ?virtual julong BaseCountedLoopNode::trip_count()? was hidden [-Werror=overloaded-virtual] >> 219 | virtual julong trip_count() = 0; >> | ^~~~~~~~~~ >> /home/runner/work/jdk/jdk/src/hotspot/share/opto/loopnode.hpp:310:10: note: by ?julong CountedLoopNode::trip_count() const? >> 310 | julong trip_count() const { return _trip_count; } >> | ^~~~~~~~~~ > > Right. I wasn't done with @chhagedorn 's review. I was waiting for the answer to: https://github.com/openjdk/jdk/pull/21630#discussion_r2091476799 > I pushed those changes now (but I still need Christian to clarify his comment) Sorry, lacking behind a little. Answered above :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099707530 From fjiang at openjdk.org Wed May 21 08:41:03 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 21 May 2025 08:41:03 GMT Subject: RFR: 8357056: RISC-V: Asm fixes - load/store width [v2] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 12:02:43 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> While working on https://github.com/openjdk/jdk/pull/25252, I notice: >> - Major op code was just repeat >> - Width coded in binary >> - Stores have mixed up rs1 and rs2 >> - Bonus, fsd used a macro for no reason >> >> I think this improves readability. >> >> Tested tier1 >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Fixed flh/flw/fld > - Merge branch 'master' into asm_fixes > - Fixes Overall looks good! With one minor suggestion! src/hotspot/cpu/riscv/assembler_riscv.hpp line 3307: > 3305: > 3306: // -------------------------- > 3307: void sd(Register Rd, Register Rs, const int32_t offset) { We can rename `Rd`/`Rs` to `Rs2`/`Rs1` to be more consistent with the specification. src/hotspot/cpu/riscv/assembler_riscv.hpp line 3322: > 3320: > 3321: // -------------------------- > 3322: void sw(Register Rd, Register Rs, const int32_t offset) { Same here. src/hotspot/cpu/riscv/assembler_riscv.hpp line 3337: > 3335: > 3336: // -------------------------- > 3337: void fsd(FloatRegister Rd, Register Rs, const int32_t offset) { And here. ------------- PR Review: https://git.openjdk.org/jdk/pull/25253#pullrequestreview-2856821785 PR Review Comment: https://git.openjdk.org/jdk/pull/25253#discussion_r2099704221 PR Review Comment: https://git.openjdk.org/jdk/pull/25253#discussion_r2099704656 PR Review Comment: https://git.openjdk.org/jdk/pull/25253#discussion_r2099707612 From chagedorn at openjdk.org Wed May 21 08:43:00 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 21 May 2025 08:43:00 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v20] In-Reply-To: References: <8qCb90dePuowml3bmtaa-dWvdY57rYEg1MfFHRIRAro=.da8f8cc2-96aa-486c-a616-dbbdb123a003@github.com> Message-ID: <4YJGCFNQiMY-je1xnPrWML9F8S0ys5UG312pQh3m5uo=.71599aa3-2e69-4a2c-bf42-ca8cdf7b0db4@github.com> On Thu, 15 May 2025 15:25:16 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopnode.cpp line 1225: >> >>> 1223: // Predicate). The current limit could, itself, be dependent on an existing predicate. Clone parse and template >>> 1224: // assertion predicates below existing predicates to get proper ordering of predicates when walking from the loop >>> 1225: // up: future predicates, Short Running Long Loop Predicate, existing predicates. >> >> Maybe you missed the visualization I've added in a comment for an earlier commit. I would find it quite useful to quickly grasp the idea, what do you think? >> >> >> // >> // Existing Hoisted >> // Check Predicates >> // | >> // New Short Running Long >> // Loop Predicate >> // | >> // Cloned Parse Predicates and >> // Template Assertion Predicates >> // | >> // Loop > > I must have missed it. Sorry about that. No worries! Thanks for adding it ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099718602 From chagedorn at openjdk.org Wed May 21 08:47:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 21 May 2025 08:47:01 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v20] In-Reply-To: References: <8qCb90dePuowml3bmtaa-dWvdY57rYEg1MfFHRIRAro=.da8f8cc2-96aa-486c-a616-dbbdb123a003@github.com> Message-ID: On Thu, 15 May 2025 15:26:34 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopnode.cpp line 1269: >> >>> 1267: } >>> 1268: #endif >>> 1269: entry_control = head->skip_strip_mined()->in(LoopNode::EntryControl); >> >> It looks like this line rather belongs to the `Predicate` on L1275? Might have been moved here by accident. > > I don't think that's the case. Predicates were added so `entry_control` needs to be refreshed. You're right, I thought we need the skipping of strip mined loop below as well. But we don't have a strip mined loop for the long case - so all good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099725506 From epeter at openjdk.org Wed May 21 08:48:03 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 May 2025 08:48:03 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v49] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 08:38:56 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > generateWithHashtagAndDollarReplacements2 A quick update on the remaining TODOs: - More examples with how `DataName` interacts with `Hook.insert`. - Add some documentation/comments to `TestTemplate.java`, so users can use it as a reference. - Document the allowed characters for `$name` and `#name`. Add validation and tests. - Idea from @robcasloz: Extend the `$name` and `#name` patterns, to allow `${name}` and `#{name}`, so it is easier to format things like `#{type}_CON`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2897123252 From fyang at openjdk.org Wed May 21 08:50:53 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 21 May 2025 08:50:53 GMT Subject: RFR: 8356924: RISC-V: Clean up cost for vector instructions [v2] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 08:40:35 GMT, Dingli Zhang wrote: >> As mentioned in https://bugs.openjdk.org/browse/JDK-8285790 regarding the ARM64 vector instruct modifications: >> Since the new rules are unique and setting different "ins_cost" makes no sense, we have switched to using the default cost. >> >> Currently, there is a similar situation on RISC-V. Over half of the instructions in riscv_v.ad do not include ins_cost definitions. Additionally, as RVV nodes are also unique, we can unify the format by removing these ins_cost entries from riscv_v.ad. >> >> ### Testing >> qemu-system 9.2.3 with UseRVV (ubuntu24.10): >> * [x] Run test/jdk/jdk/incubator/vector (fastdebug) >> * [x] Run test/hotspot/jtreg/compiler/vectorapi (fastdebug) > > Dingli Zhang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Remove added ins_cost(VEC_COST) due to merging the main branch > - Merge branch 'master' into master-remove-ins_cost > - 8356924: RISC-V: Clean up cost for vector instructions Thanks! ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25221#pullrequestreview-2856863293 From dnsimon at openjdk.org Wed May 21 08:56:59 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 21 May 2025 08:56:59 GMT Subject: RFR: 8334717: Add JVMCI support for APX EGPRs [v2] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 06:14:09 GMT, Yudi Zheng wrote: >> This PR marks extra general purpose registers introduced by Intel APX as Graal allocatables. It also drops AMD64/AArch64/RISCV64.flags and RegisterArray > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > fix tests Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23159#pullrequestreview-2856879701 From yzheng at openjdk.org Wed May 21 08:56:59 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 21 May 2025 08:56:59 GMT Subject: RFR: 8334717: Add JVMCI support for APX EGPRs [v2] In-Reply-To: References: Message-ID: <8B0cGaejoT19Paf9ccpOje3O6DccoOuE2nm8G6o0gVY=.abd66730-d1c7-4819-9bae-ecdf78fa8e9b@github.com> On Tue, 20 May 2025 06:14:09 GMT, Yudi Zheng wrote: >> This PR marks extra general purpose registers introduced by Intel APX as Graal allocatables. It also drops AMD64/AArch64/RISCV64.flags and RegisterArray > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > fix tests thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23159#issuecomment-2897150185 From yzheng at openjdk.org Wed May 21 08:56:59 2025 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 21 May 2025 08:56:59 GMT Subject: Integrated: 8334717: Add JVMCI support for APX EGPRs In-Reply-To: References: Message-ID: On Thu, 16 Jan 2025 16:01:32 GMT, Yudi Zheng wrote: > This PR marks extra general purpose registers introduced by Intel APX as Graal allocatables. It also drops AMD64/AArch64/RISCV64.flags and RegisterArray This pull request has now been integrated. Changeset: 735c7899 Author: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/735c7899d124a4e0c9579ea7802c9475eaedda10 Stats: 561 lines in 21 files changed: 44 ins; 334 del; 183 mod 8334717: Add JVMCI support for APX EGPRs Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/23159 From roland at openjdk.org Wed May 21 09:21:26 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 21 May 2025 09:21:26 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v7] In-Reply-To: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: > An `Initialize` node for an `Allocate` node is created with a memory > `Proj` of adr type raw memory. In order for stores to be captured, the > memory state out of the allocation is a `MergeMem` with slices for the > various object fields/array element set to the raw memory `Proj` of > the `Initialize` node. If `Phi`s need to be created during later > transformations from this memory state, The `Phi` for a particular > slice gets its adr type from the type of the `Proj` which is raw > memory. If during macro expansion, the `Allocate` is found to have no > use and so can be removed, the `Proj` out of the `Initialize` is > replaced by the memory state on input to the `Allocate`. A `Phi` for > some slice for a field of an object will end up with the raw memory > state on input to the `Allocate` node. As a result, memory state at > the `Phi` is incorrect and incorrect execution can happen. > > The fix I propose is, rather than have a single `Proj` for the memory > state out of the `Initialize` with adr type raw memory, to use one > `Proj` per slice added to the memory state after the `Initalize`. Each > of the `Proj` should return the right adr type for its slice. For that > I propose having a new type of `Proj`: `NarrowMemProj` that captures > the right adr type. > > Logic for the construction of the `Allocate`/`Initialize` subgraph is > tweaked so the right adr type captured in is own `NarrowMemProj` is > added to the memory sugraph. Code that removes an allocation or moves > it also has to be changed so it correctly takes the multiple memory > projections out of the `Initialize` node into account. > > One tricky issue is that when EA split types for a scalar replaceable > `Allocate` node: > > 1- the adr type captured in the `NarrowMemProj` becomes out of sync > with the type of the slices for the allocation > > 2- before EA, the memory state for one particular field out of the > `Initialize` node can be used for a `Store` to the just allocated > object or some other. So we can have a chain of `Store`s, some to > the newly allocated object, some to some other objects, all of them > using the state of `NarrowMemProj` out of the `Initialize`. After > split unique types, the `NarrowMemProj` is for the slice of a > particular allocation. So `Store`s to some other objects shouldn't > use that memory state but the memory state before the `Allocate`. > > For that, I added logic to update the adr type of `NarrowMemProj` > during split unique types and update the memory input of `Store`s that > don't depend on the memory state ... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 33 additional commits since the last revision: - new test tweak - new test - Merge branch 'master' into JDK-8327963 - Merge branch 'master' into JDK-8327963 - typo - more - more - more - more - more - ... and 23 more: https://git.openjdk.org/jdk/compare/9b8450bf...a6c6c044 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24570/files - new: https://git.openjdk.org/jdk/pull/24570/files/af8480c0..a6c6c044 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=05-06 Stats: 329883 lines in 3787 files changed: 107563 ins; 204497 del; 17823 mod Patch: https://git.openjdk.org/jdk/pull/24570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24570/head:pull/24570 PR: https://git.openjdk.org/jdk/pull/24570 From roland at openjdk.org Wed May 21 09:21:26 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 21 May 2025 09:21:26 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v5] In-Reply-To: <6qes6PQ7RtzBVWkPkTopPLs5E9E5SGOekPi_qwBMu1A=.26952f0f-2050-4431-a8f4-0949202510d4@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <6qes6PQ7RtzBVWkPkTopPLs5E9E5SGOekPi_qwBMu1A=.26952f0f-2050-4431-a8f4-0949202510d4@github.com> Message-ID: On Tue, 20 May 2025 14:50:36 GMT, Roland Westrelin wrote: > I'll add that test case to the PR. Done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2897223161 From roland at openjdk.org Wed May 21 09:21:27 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 21 May 2025 09:21:27 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v6] In-Reply-To: <8BJorsTgiK1pTElabu0NZFko5n4mlpAhadlt87w_v2s=.f19e86d1-d646-46eb-860f-cbbadf37ada3@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <8BJorsTgiK1pTElabu0NZFko5n4mlpAhadlt87w_v2s=.f19e86d1-d646-46eb-860f-cbbadf37ada3@github.com> Message-ID: On Thu, 15 May 2025 12:09:23 GMT, Roland Westrelin wrote: >> An `Initialize` node for an `Allocate` node is created with a memory >> `Proj` of adr type raw memory. In order for stores to be captured, the >> memory state out of the allocation is a `MergeMem` with slices for the >> various object fields/array element set to the raw memory `Proj` of >> the `Initialize` node. If `Phi`s need to be created during later >> transformations from this memory state, The `Phi` for a particular >> slice gets its adr type from the type of the `Proj` which is raw >> memory. If during macro expansion, the `Allocate` is found to have no >> use and so can be removed, the `Proj` out of the `Initialize` is >> replaced by the memory state on input to the `Allocate`. A `Phi` for >> some slice for a field of an object will end up with the raw memory >> state on input to the `Allocate` node. As a result, memory state at >> the `Phi` is incorrect and incorrect execution can happen. >> >> The fix I propose is, rather than have a single `Proj` for the memory >> state out of the `Initialize` with adr type raw memory, to use one >> `Proj` per slice added to the memory state after the `Initalize`. Each >> of the `Proj` should return the right adr type for its slice. For that >> I propose having a new type of `Proj`: `NarrowMemProj` that captures >> the right adr type. >> >> Logic for the construction of the `Allocate`/`Initialize` subgraph is >> tweaked so the right adr type captured in is own `NarrowMemProj` is >> added to the memory sugraph. Code that removes an allocation or moves >> it also has to be changed so it correctly takes the multiple memory >> projections out of the `Initialize` node into account. >> >> One tricky issue is that when EA split types for a scalar replaceable >> `Allocate` node: >> >> 1- the adr type captured in the `NarrowMemProj` becomes out of sync >> with the type of the slices for the allocation >> >> 2- before EA, the memory state for one particular field out of the >> `Initialize` node can be used for a `Store` to the just allocated >> object or some other. So we can have a chain of `Store`s, some to >> the newly allocated object, some to some other objects, all of them >> using the state of `NarrowMemProj` out of the `Initialize`. After >> split unique types, the `NarrowMemProj` is for the slice of a >> particular allocation. So `Store`s to some other objects shouldn't >> use that memory state but the memory state before the `Allocate`. >> >> For that, I added logic to update the adr type of `NarrowMemProj` >> during split uni... > > Roland Westrelin has updated the pull request incrementally with 14 additional commits since the last revision: > > - typo > - more > - more > - more > - more > - more > - more > - more > - more > - review > - ... and 4 more: https://git.openjdk.org/jdk/compare/7afc47e4...af8480c0 > I also think it would be good to investigate, separately, early elimination of dead array allocations, even after the integration of this work. Dead allocations may inhibit later optimizations so it would be good to eliminate them as early as possible anyway. One difficulty (not addressed in [c28f81a](https://github.com/openjdk/jdk/commit/c28f81a7ef2a4f3d3cb761ea23a80c09276e7e58)) is that early array elimination should still generate the nonnegative array size check code. That makes sense. It would be useful to have a bugs to track that one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2897225462 From mli at openjdk.org Wed May 21 09:22:54 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 21 May 2025 09:22:54 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v14] In-Reply-To: <-Ig2biJjwMoR79hyYfSNxJLqavcMzVyLFZvnV0J_t90=.4eb702b5-2cbd-40c6-81fd-744a2fe98acf@github.com> References: <-Ig2biJjwMoR79hyYfSNxJLqavcMzVyLFZvnV0J_t90=.4eb702b5-2cbd-40c6-81fd-744a2fe98acf@github.com> Message-ID: On Tue, 20 May 2025 12:50:08 GMT, Anjian-Wen wrote: >> From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time >> >> on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below >> >> before the patch >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op >> MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op >> MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op >> MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op >> MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op >> MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op >> MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op >> MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op >> MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op >> MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op >> MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op >> MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op >> MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op >> MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op >> MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op >> MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op >> MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op >> MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op >> MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/o... > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > change all the t0 with tmp_reg > @feilongjiang @RealFYang MemorySegmentFillUnsafe.unsafe Test show that the time reduce from `29.728 ? 0.294` to `23.747 ? 0.215` when the count is 7. which produce very good effects, thanks for commit!! below is the jmh test result Based on the performance data after unroll, the comparison of unligned and aligned data of `MemorySegmentFillUnsafe.unsafe` suggests that it could bring some benefit to merge these 2 pieces of code, i.e. keep only unalinged one and remove the aligned one. But I'm not sure, maybe it's worth a try? At least it can reduce the generated code size. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1724: > 1722: } > 1723: > 1724: // Remaining count is less than 8 bytes and address is heapword aligned. remove this aligned code. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1748: > 1746: } > 1747: > 1748: // Handle copies less than 8 bytes keep this unaligned code. ------------- PR Review: https://git.openjdk.org/jdk/pull/23890#pullrequestreview-2856957909 PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2099796385 PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2099796462 From roland at openjdk.org Wed May 21 09:35:20 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 21 May 2025 09:35:20 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v25] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21630/files - new: https://git.openjdk.org/jdk/pull/21630/files/6100f757..e55393ee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=23-24 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From roland at openjdk.org Wed May 21 09:35:20 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 21 May 2025 09:35:20 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v20] In-Reply-To: References: <8qCb90dePuowml3bmtaa-dWvdY57rYEg1MfFHRIRAro=.da8f8cc2-96aa-486c-a616-dbbdb123a003@github.com> Message-ID: On Wed, 21 May 2025 08:36:17 GMT, Christian Hagedorn wrote: >> Rather than `ShortLoopIter`? Or is it something else that you'd like to be better explained? > > Yes, rather than `ShortLoopIter`. One could think that it could be a good idea to have such a flag. Done in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099826541 From shade at openjdk.org Wed May 21 09:39:13 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 21 May 2025 09:39:13 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v18] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: - Spin lock induces false sharing - Merge branch 'master' into JDK-8231269-compile-task-weaks - Merge branch 'master' into JDK-8231269-compile-task-weaks - Rename CompilerTask::is_unloaded back to avoid losing comment context - Simplify select_for_compilation - Merge branch 'master' into JDK-8231269-compile-task-weaks - More touchups - Fix release builds - More thorough locking and redefinition escape hatch - Fix build failures: add more headers - ... and 22 more: https://git.openjdk.org/jdk/compare/a0cdf36b...51390bc2 ------------- Changes: https://git.openjdk.org/jdk/pull/24018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=17 Stats: 427 lines in 12 files changed: 384 ins; 19 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From duke at openjdk.org Wed May 21 09:40:52 2025 From: duke at openjdk.org (Anjian-Wen) Date: Wed, 21 May 2025 09:40:52 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v14] In-Reply-To: <-Ig2biJjwMoR79hyYfSNxJLqavcMzVyLFZvnV0J_t90=.4eb702b5-2cbd-40c6-81fd-744a2fe98acf@github.com> References: <-Ig2biJjwMoR79hyYfSNxJLqavcMzVyLFZvnV0J_t90=.4eb702b5-2cbd-40c6-81fd-744a2fe98acf@github.com> Message-ID: On Tue, 20 May 2025 12:50:08 GMT, Anjian-Wen wrote: >> From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time >> >> on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below >> >> before the patch >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op >> MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op >> MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op >> MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op >> MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op >> MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op >> MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op >> MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op >> MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op >> MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op >> MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op >> MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op >> MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op >> MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op >> MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op >> MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op >> MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op >> MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op >> MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/o... > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > change all the t0 with tmp_reg Thanks for your review! I think the above test results may not fully reflect the difference in the impact of aligned and unaligned on the tail? I understand that if the dest address is aligned, the above aligned section has 0 to 4 less store instructions than the following section. I can remove it and test jmh to see how it performs > > @feilongjiang @RealFYang MemorySegmentFillUnsafe.unsafe Test show that the time reduce from `29.728 ? 0.294` to `23.747 ? 0.215` when the count is 7. which produce very good effects, thanks for commit!! below is the jmh test result > > Based on the performance data after unroll, the comparison of unligned and aligned data of `MemorySegmentFillUnsafe.unsafe` suggests that it could bring some benefit to merge these 2 pieces of code, i.e. keep only unalinged one and remove the aligned one. But I'm not sure, maybe it's worth a try? At least it can reduce the generated code size. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2897287517 From mli at openjdk.org Wed May 21 09:51:58 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 21 May 2025 09:51:58 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v14] In-Reply-To: References: <-Ig2biJjwMoR79hyYfSNxJLqavcMzVyLFZvnV0J_t90=.4eb702b5-2cbd-40c6-81fd-744a2fe98acf@github.com> Message-ID: On Wed, 21 May 2025 09:38:06 GMT, Anjian-Wen wrote: > Thanks for your review! I think the above test results may not fully reflect the difference in the impact of aligned and unaligned on the tail? I understand that if the dest address is aligned, the above aligned section has 0 to 4 less store instructions than the following section. I can remove it and test jmh to see how it performs Based on your last jmh data (check `MemorySegmentFillUnsafe.unsafe true 7` and `MemorySegmentFillUnsafe.unsafe false 7`, and others <= 7, they're the same. I guess the pipeline and store buffer deal with this continuous stores well enough. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2897318832 From rcastanedalo at openjdk.org Wed May 21 10:05:56 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 21 May 2025 10:05:56 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v5] In-Reply-To: <6qes6PQ7RtzBVWkPkTopPLs5E9E5SGOekPi_qwBMu1A=.26952f0f-2050-4431-a8f4-0949202510d4@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <6qes6PQ7RtzBVWkPkTopPLs5E9E5SGOekPi_qwBMu1A=.26952f0f-2050-4431-a8f4-0949202510d4@github.com> Message-ID: On Tue, 20 May 2025 14:50:36 GMT, Roland Westrelin wrote: > > I still think it would be good to include test cases to confirm that these are not only theoretical concerns, but that should not block the progress of this PR. > > Here is a test case: > > ``` > import java.util.Arrays; > > public class TestAllocNoUseBadMemoryState { > private static volatile int volatileField; > > public static void main(String[] args) { > boolean[] allTrue = new boolean[3]; > Arrays.fill(allTrue, true); > A a = new A(); > boolean[] allFalse = new boolean[3]; > for (int i = 0; i < 20_000; i++) { > a.field1 = 0; > test1(a, allTrue); > test1(a, allFalse); > if (a.field1 != 42) { > throw new RuntimeException("Lost Store"); > } > } > } > > private static void test1(A otherA, boolean[] flags) { > if (flags == null) { > } > otherA.field1 = 42; > for (int i = 0; i < 3; i++) { > A a = new A(); > if (flags[i]) { > break; > } > } > } > > private static class A { > int field1; > } > } > ``` > > where all the damage is done early on when EA runs. A pass of loop opts before EA fully unrolls the loop and creates memory `Phi`s with incorrect `adr_type` (raw memory). Then EA removes the allocation. All that keeps the `Store` to `field1` alive then is uncommon traps from template predicates. Once they are removed, the `Store` goes away (first round of loop opts after EA). > > I'll add that test case to the PR. Thanks Roland for taking the time to research this, this failure really illustrates why the general solution proposed by this PR is needed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2897374573 From rcastanedalo at openjdk.org Wed May 21 10:15:01 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 21 May 2025 10:15:01 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v7] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: On Wed, 21 May 2025 09:21:26 GMT, Roland Westrelin wrote: >> An `Initialize` node for an `Allocate` node is created with a memory >> `Proj` of adr type raw memory. In order for stores to be captured, the >> memory state out of the allocation is a `MergeMem` with slices for the >> various object fields/array element set to the raw memory `Proj` of >> the `Initialize` node. If `Phi`s need to be created during later >> transformations from this memory state, The `Phi` for a particular >> slice gets its adr type from the type of the `Proj` which is raw >> memory. If during macro expansion, the `Allocate` is found to have no >> use and so can be removed, the `Proj` out of the `Initialize` is >> replaced by the memory state on input to the `Allocate`. A `Phi` for >> some slice for a field of an object will end up with the raw memory >> state on input to the `Allocate` node. As a result, memory state at >> the `Phi` is incorrect and incorrect execution can happen. >> >> The fix I propose is, rather than have a single `Proj` for the memory >> state out of the `Initialize` with adr type raw memory, to use one >> `Proj` per slice added to the memory state after the `Initalize`. Each >> of the `Proj` should return the right adr type for its slice. For that >> I propose having a new type of `Proj`: `NarrowMemProj` that captures >> the right adr type. >> >> Logic for the construction of the `Allocate`/`Initialize` subgraph is >> tweaked so the right adr type captured in is own `NarrowMemProj` is >> added to the memory sugraph. Code that removes an allocation or moves >> it also has to be changed so it correctly takes the multiple memory >> projections out of the `Initialize` node into account. >> >> One tricky issue is that when EA split types for a scalar replaceable >> `Allocate` node: >> >> 1- the adr type captured in the `NarrowMemProj` becomes out of sync >> with the type of the slices for the allocation >> >> 2- before EA, the memory state for one particular field out of the >> `Initialize` node can be used for a `Store` to the just allocated >> object or some other. So we can have a chain of `Store`s, some to >> the newly allocated object, some to some other objects, all of them >> using the state of `NarrowMemProj` out of the `Initialize`. After >> split unique types, the `NarrowMemProj` is for the slice of a >> particular allocation. So `Store`s to some other objects shouldn't >> use that memory state but the memory state before the `Allocate`. >> >> For that, I added logic to update the adr type of `NarrowMemProj` >> during split uni... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 33 additional commits since the last revision: > > - new test tweak > - new test > - Merge branch 'master' into JDK-8327963 > - Merge branch 'master' into JDK-8327963 > - typo > - more > - more > - more > - more > - more > - ... and 23 more: https://git.openjdk.org/jdk/compare/4f595bfb...a6c6c044 test/hotspot/jtreg/compiler/macronodes/TestEarlyEliminationOfAllocationWithoutUse.java line 54: > 52: private static void test1(A otherA, boolean[] flags) { > 53: if (flags == null) { > 54: } Consider removing these two lines, which do not seem essential to reproduce the issue. Removing them also gets rid of the deoptimization, which simplifies the failure analysis. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2099904747 From chagedorn at openjdk.org Wed May 21 10:26:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 21 May 2025 10:26:01 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v25] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 09:35:20 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for all the updates! I like the new documentation. Some final comments, then I think it's good to go from my side. > JDK 26 sounds ok to me. Sounds good. We might want to run some more testing after the fork when it's fully reviewed. src/hotspot/share/opto/predicates.hpp line 83: > 81: * int counted loops with long range checks for which a loop nest also needs to be created > 82: * in the general case (so the transformation of long range checks to int range checks is > 83: * legal). Nice, thanks for adding it! test/hotspot/jtreg/compiler/c2/irTests/TestLongRangeChecks.java line 41: > 39: public class TestLongRangeChecks { > 40: public static void main(String[] args) { > 41: TestFramework.runWithFlags("-XX:-ShortRunningLongLoop", "-XX:+TieredCompilation", "-XX:-UseCountedLoopSafepoints", "-XX:LoopUnrollLimit=0"); You can probably update the copyright year to 2025. test/hotspot/jtreg/compiler/longcountedloops/TestShortLoopLostLimit.java line 2: > 1: /* > 2: * Copyright (c) 2024, Red Hat, Inc. All rights reserved. You can either update it to 2025 or just add 2025 additionally since you already proposed the initial PR in 2024. Same for the other files below. test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningIntLoopWithLongChecksPredicates.java line 36: > 34: > 35: // int RC is first eliminated by predication which causes assert > 36: // predicate to be added. Then the loop is transformed to make it Suggestion: // int RC is first eliminated by predication which causes Assertion // Predicates to be added. Then the loop is transformed to make it test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningIntLoopWithLongChecksPredicates.java line 38: > 36: // predicate to be added. Then the loop is transformed to make it > 37: // possible to optimize long RC. Finally unrolling happen which > 38: // require the assert predicate to have been properly copied when the Suggestion: // require the Assert Predicates to have been properly copied when the ------------- PR Review: https://git.openjdk.org/jdk/pull/21630#pullrequestreview-2856875349 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099906356 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099907906 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099908542 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099911193 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099912075 From chagedorn at openjdk.org Wed May 21 10:26:06 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 21 May 2025 10:26:06 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v24] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 08:00:56 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 75 commits: > > - Merge branch 'master' into JDK-8342692 > - more > - compilation fix > - new benchmark > - Add benchmark for manual mismatch loop > - review > - Merge branch 'master' into JDK-8342692 > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/graphKit.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - ... and 65 more: https://git.openjdk.org/jdk/compare/50a7755f...6100f757 src/hotspot/share/opto/castnode.cpp line 35: > 33: #include "opto/type.hpp" > 34: #include "castnode.hpp" > 35: #include "loopnode.hpp" You should add a `opto` prefix. `castnode.hpp` can be removed since it's already added above. src/hotspot/share/opto/castnode.cpp line 328: > 326: } > 327: > 328: bool CastLLNode::inner_loop_backedge(Node* proj) { `inner_loop_backedge()` and `cmp_used_at_inner_loop_exit_test()` can be made `static` and `used_at_inner_loop_exit_test()` `const`. src/hotspot/share/opto/loopnode.cpp line 1140: > 1138: }; > 1139: > 1140: class CloneShortLoopPredicatesVisitor : public PredicateVisitor { Maybe add a quick comment here that we clone and insert at the same loop. src/hotspot/share/opto/loopnode.cpp line 1148: > 1146: const NodeInShortLoopBody& node_in_loop_body, > 1147: PhaseIdealLoop* phase) > 1148: : _clone_predicate_to_loop(loop_head, node_in_loop_body, phase), To be more explicit: Suggestion: CloneShortLoopPredicatesVisitor(LoopNode* target_loop_head, const NodeInShortLoopBody& node_in_loop_body, PhaseIdealLoop* phase) : _clone_predicate_to_loop(target_loop_head, node_in_loop_body, phase), src/hotspot/share/opto/loopnode.cpp line 1156: > 1154: > 1155: void visit(const ParsePredicate& parse_predicate) override { > 1156: _clone_predicate_to_loop.clone_parse_predicate(parse_predicate, true); `clone_parse_predicate()` has its second parameter named `is_false_path_loop`. I think that no longer makes sense because we are now reusing the method outside Loop Unswitching.. Maybe we should just rename the parameter to `rewire_uncommon_proj_phi_inputs` which is eventually the name in `create_new_if_for_predicate()`. Additionally, we should rename `ParsePredicate::clone_to_unswitched_loop()` to `clone_to_loop()`. What do you think? src/hotspot/share/opto/loopnode.cpp line 1159: > 1157: parse_predicate.kill(_phase->igvn()); > 1158: } > 1159: void visit(const TemplateAssertionPredicate& template_assertion_predicate) override { Suggestion: void visit(const TemplateAssertionPredicate& template_assertion_predicate) override { src/hotspot/share/opto/loopnode.cpp line 1164: > 1162: } > 1163: }; > 1164: // If the loop is either statically known to run for a small enough number of iterations or if profile data indicates Suggestion: // If the loop is either statically known to run for a small enough number of iterations or if profile data indicates ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099744340 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099742420 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099794431 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099795386 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099780513 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099777107 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2099760430 From rcastanedalo at openjdk.org Wed May 21 10:34:53 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 21 May 2025 10:34:53 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v6] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <8BJorsTgiK1pTElabu0NZFko5n4mlpAhadlt87w_v2s=.f19e86d1-d646-46eb-860f-cbbadf37ada3@github.com> Message-ID: On Wed, 21 May 2025 09:18:01 GMT, Roland Westrelin wrote: > > I also think it would be good to investigate, separately, early elimination of dead array allocations, even after the integration of this work. Dead allocations may inhibit later optimizations so it would be good to eliminate them as early as possible anyway. One difficulty (not addressed in [c28f81a](https://github.com/openjdk/jdk/commit/c28f81a7ef2a4f3d3cb761ea23a80c09276e7e58)) is that early array elimination should still generate the nonnegative array size check code. > > That makes sense. It would be useful to have a bugs to track that one. Turns out there is one already: [JDK-8180290](https://bugs.openjdk.org/browse/JDK-8180290), I just added a comment there. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2897452323 From shade at openjdk.org Wed May 21 11:10:15 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 21 May 2025 11:10:15 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v19] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: More touchups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24018/files - new: https://git.openjdk.org/jdk/pull/24018/files/51390bc2..f6bbc8d9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=17-18 Stats: 20 lines in 2 files changed: 12 ins; 3 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From rkennke at openjdk.org Wed May 21 11:14:59 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 21 May 2025 11:14:59 GMT Subject: RFR: 8357370: Export supported GCs in JVMCI [v3] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 13:35:31 GMT, Roman Kennke wrote: >> I need a way to detect in JVMCI if Shenandoah GC is supported (that is, built-in) by HotSpot. I need it for Shenandoah, because some vendors don't build it, but for cleanliness the relevant preprocessor constants should be exported for all GCs. >> >> Testing: >> - [x] build/test https://github.com/oracle/graal/pull/10904 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Align most trailing \s Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25325#issuecomment-2897553017 From rkennke at openjdk.org Wed May 21 11:15:00 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 21 May 2025 11:15:00 GMT Subject: Integrated: 8357370: Export supported GCs in JVMCI In-Reply-To: References: Message-ID: <3NC5H23jy_ZVGiP7FHXskbyotZZcJlEvU7O3idP0zk8=.b52e774d-fdf4-4ba9-86a6-dd158ffc9ead@github.com> On Tue, 20 May 2025 12:52:02 GMT, Roman Kennke wrote: > I need a way to detect in JVMCI if Shenandoah GC is supported (that is, built-in) by HotSpot. I need it for Shenandoah, because some vendors don't build it, but for cleanliness the relevant preprocessor constants should be exported for all GCs. > > Testing: > - [x] build/test https://github.com/oracle/graal/pull/10904 This pull request has now been integrated. Changeset: 2c126f19 Author: Roman Kennke URL: https://git.openjdk.org/jdk/commit/2c126f1954435a5b4d6cdc367b7b5e8c91cfae63 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod 8357370: Export supported GCs in JVMCI Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/25325 From mchevalier at openjdk.org Wed May 21 11:20:57 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 21 May 2025 11:20:57 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll In-Reply-To: References: Message-ID: On Tue, 20 May 2025 21:51:07 GMT, Dean Long wrote: >> I'm not convinced this is relaxed enough or that it shouldn't overflow. Where we set trip_count: >> https://github.com/openjdk/jdk/blob/e961b13cd68bc352b86af17c7e53df8537519beb/src/hotspot/share/opto/loopTransform.cpp#L133-L141 >> we have a check that trip count is `< 2^32 - 1`, but it seems to me that the value of `trip_count` there might be 2^32 or 2^32-1 (same computation as the code I'm fixing). It's fine: if it would not fit in the `uint` we don't record, fine, I guess. In the code I'm touching, `old_trip_count` is the value stored in the loop head previously. In the case where the new `trip_count` is 2^31, the old_trip_count haven't been set since construction, so it's still `2^32 - 1` but without the exact flag (not sure what it means). So in the case new trip_count is 2^31, old_trip_count is 2^32-1: the `*2` overflows and we get `adjust_min_trip == true`. Which I presume is harmless (or maybe necessary?). With the version you suggest, we would guard against the overflow and allow `trip_count == 2^31-1`, but at the cost of crashing in the case of `trip_count == 2^31`, which seems possible to me (and still have the overflow happens in product). > > OK, I was thinking we needed to prevent the *2 below from overflowing. If we allow the *2 to overflow, then what's left is making sure the cast to uint doesn't change the value (overflow). To do that, we could relax the assert above to <= max_juint, or even better, use checked_cast to convert to uint below. Overflowing is probably not a good idea, but I don't think it's impossible, and my understanding is that it would nevertheless behave correctly. We can relax the assert to `<= max_juint` but I think that changes the intent of the assert. In my understanding, the point was just to make sure that the arithmetic seems reasonable: we start with a integer counter that can cover at most 2^32 values, so after unrolling, it can cover at most 2^31 values, if we find something else, we did something wrong. But we are bounding it by 2^31-2. Maybe it is ok to be more strict in the assert than what could happen in real life, and bound trip_count by `<= (julong)max_juint/2` (instead of `<`). We could see if we find (with stress flags) the case of `trip_count == 2^31`, and see then how it behaves, and in the meantime the bound would be tight enough not to let overflow happen. I think this case is possible, but maybe some global (possibly accidental) invariant forbid it for now, and indeed I never s een it happening so far. In short, we can just replace `<` by `<=` in the original code, and see if it's enough. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25295#discussion_r2100019389 From duke at openjdk.org Wed May 21 11:23:55 2025 From: duke at openjdk.org (Anjian-Wen) Date: Wed, 21 May 2025 11:23:55 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v14] In-Reply-To: References: <-Ig2biJjwMoR79hyYfSNxJLqavcMzVyLFZvnV0J_t90=.4eb702b5-2cbd-40c6-81fd-744a2fe98acf@github.com> Message-ID: On Wed, 21 May 2025 09:48:49 GMT, Hamlin Li wrote: > > Thanks for your review! I think the above test results may not fully reflect the difference in the impact of aligned and unaligned on the tail? I understand that if the dest address is aligned, the above aligned section has 0 to 4 less store instructions than the following section. I can remove it and test jmh to see how it performs > > Based on your last jmh data (check `MemorySegmentFillUnsafe.unsafe true 7` and `MemorySegmentFillUnsafe.unsafe false 7`, and others <= 7, they're the same. I guess the pipeline and store buffer deal with this continuous stores well enough. The last change on the code after L_fill_elements make the byte store 1 by 1, it seems that 'true' or 'false' may not affect it when the count is less or equal 7? I think the above align section is like a fast path which can reduce instruction for the count is large than 7, so we may should check the result when count > 7 ? besides, I'm testing on that right now(delete the align tail part), we can find it out later ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2897579925 From roland at openjdk.org Wed May 21 11:24:49 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 21 May 2025 11:24:49 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v26] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request incrementally with five additional commits since the last revision: - Update test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningIntLoopWithLongChecksPredicates.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningIntLoopWithLongChecksPredicates.java Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21630/files - new: https://git.openjdk.org/jdk/pull/21630/files/e55393ee..af1d12bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=24-25 Stats: 7 lines in 2 files changed: 2 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From dzhang at openjdk.org Wed May 21 11:41:08 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 21 May 2025 11:41:08 GMT Subject: RFR: 8356924: RISC-V: Clean up cost for vector instructions [v2] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 08:40:35 GMT, Dingli Zhang wrote: >> As mentioned in https://bugs.openjdk.org/browse/JDK-8285790 regarding the ARM64 vector instruct modifications: >> Since the new rules are unique and setting different "ins_cost" makes no sense, we have switched to using the default cost. >> >> Currently, there is a similar situation on RISC-V. Over half of the instructions in riscv_v.ad do not include ins_cost definitions. Additionally, as RVV nodes are also unique, we can unify the format by removing these ins_cost entries from riscv_v.ad. >> >> ### Testing >> qemu-system 9.2.3 with UseRVV (ubuntu24.10): >> * [x] Run test/jdk/jdk/incubator/vector (fastdebug) >> * [x] Run test/hotspot/jtreg/compiler/vectorapi (fastdebug) > > Dingli Zhang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Remove added ins_cost(VEC_COST) due to merging the main branch > - Merge branch 'master' into master-remove-ins_cost > - 8356924: RISC-V: Clean up cost for vector instructions Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25221#issuecomment-2897622644 From duke at openjdk.org Wed May 21 11:41:08 2025 From: duke at openjdk.org (duke) Date: Wed, 21 May 2025 11:41:08 GMT Subject: RFR: 8356924: RISC-V: Clean up cost for vector instructions [v2] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 08:40:35 GMT, Dingli Zhang wrote: >> As mentioned in https://bugs.openjdk.org/browse/JDK-8285790 regarding the ARM64 vector instruct modifications: >> Since the new rules are unique and setting different "ins_cost" makes no sense, we have switched to using the default cost. >> >> Currently, there is a similar situation on RISC-V. Over half of the instructions in riscv_v.ad do not include ins_cost definitions. Additionally, as RVV nodes are also unique, we can unify the format by removing these ins_cost entries from riscv_v.ad. >> >> ### Testing >> qemu-system 9.2.3 with UseRVV (ubuntu24.10): >> * [x] Run test/jdk/jdk/incubator/vector (fastdebug) >> * [x] Run test/hotspot/jtreg/compiler/vectorapi (fastdebug) > > Dingli Zhang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Remove added ins_cost(VEC_COST) due to merging the main branch > - Merge branch 'master' into master-remove-ins_cost > - 8356924: RISC-V: Clean up cost for vector instructions @DingliZhang Your change (at version 44521d8a4e65224d29714103850f4359d59b8b75) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25221#issuecomment-2897626358 From epeter at openjdk.org Wed May 21 11:43:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 May 2025 11:43:27 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v50] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: verify hashtag and dollar names, add tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/4206d647..b4193e8b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=49 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=48-49 Stats: 76 lines in 3 files changed: 73 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From dzhang at openjdk.org Wed May 21 11:45:56 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 21 May 2025 11:45:56 GMT Subject: Integrated: 8356924: RISC-V: Clean up cost for vector instructions In-Reply-To: References: Message-ID: <84G0n5YPazONEViKBCV_Pqi9knrWN5icJoX9iNHSx5A=.1a3c04d0-7e69-4139-be5f-25c28375941d@github.com> On Wed, 14 May 2025 04:04:11 GMT, Dingli Zhang wrote: > As mentioned in https://bugs.openjdk.org/browse/JDK-8285790 regarding the ARM64 vector instruct modifications: > Since the new rules are unique and setting different "ins_cost" makes no sense, we have switched to using the default cost. > > Currently, there is a similar situation on RISC-V. Over half of the instructions in riscv_v.ad do not include ins_cost definitions. Additionally, as RVV nodes are also unique, we can unify the format by removing these ins_cost entries from riscv_v.ad. > > ### Testing > qemu-system 9.2.3 with UseRVV (ubuntu24.10): > * [x] Run test/jdk/jdk/incubator/vector (fastdebug) > * [x] Run test/hotspot/jtreg/compiler/vectorapi (fastdebug) This pull request has now been integrated. Changeset: 108e454a Author: Dingli Zhang Committer: Feilong Jiang URL: https://git.openjdk.org/jdk/commit/108e454a042aaca2a36cd0d8087e7668e3cac29c Stats: 173 lines in 1 file changed: 0 ins; 173 del; 0 mod 8356924: RISC-V: Clean up cost for vector instructions Reviewed-by: fjiang, fyang, gcao ------------- PR: https://git.openjdk.org/jdk/pull/25221 From roland at openjdk.org Wed May 21 12:08:20 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 21 May 2025 12:08:20 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v27] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21630/files - new: https://git.openjdk.org/jdk/pull/21630/files/af1d12bf..66e960aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=25-26 Stats: 17 lines in 7 files changed: 2 ins; 1 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From roland at openjdk.org Wed May 21 12:08:21 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 21 May 2025 12:08:21 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v24] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 09:10:44 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 75 commits: >> >> - Merge branch 'master' into JDK-8342692 >> - more >> - compilation fix >> - new benchmark >> - Add benchmark for manual mismatch loop >> - review >> - Merge branch 'master' into JDK-8342692 >> - Update src/hotspot/share/opto/loopnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/graphKit.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - ... and 65 more: https://git.openjdk.org/jdk/compare/50a7755f...6100f757 > > src/hotspot/share/opto/loopnode.cpp line 1156: > >> 1154: >> 1155: void visit(const ParsePredicate& parse_predicate) override { >> 1156: _clone_predicate_to_loop.clone_parse_predicate(parse_predicate, true); > > `clone_parse_predicate()` has its second parameter named `is_false_path_loop`. I think that no longer makes sense because we are now reusing the method outside Loop Unswitching.. Maybe we should just rename the parameter to `rewire_uncommon_proj_phi_inputs` which is eventually the name in `create_new_if_for_predicate()`. Additionally, we should rename `ParsePredicate::clone_to_unswitched_loop()` to `clone_to_loop()`. What do you think? Sounds good. New commit has this renaming. Question now is what we do with `ParsePredicate::trace_cloned_parse_predicate()` that wouldn't always print a message that makes sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2100117924 From bkilambi at openjdk.org Wed May 21 12:16:55 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 21 May 2025 12:16:55 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v4] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 07:49:10 GMT, Bhavana Kilambi wrote: >> Looks good. I'm assuming you've tested both SVE and Neon. > >> Looks good. I'm assuming you've tested both SVE and Neon. > > Yes, this was tested on both SVE and Neon (N1/V1/V2 architectures). > @Bhavana-Kilambi I'm getting timeouts with your new test: `compiler/vectorization/TestFloat16VectorOperations.java` > > At least on `linux-aarch64-debug` and `windows-x64-debug`, but not all tests have completed yet, so more could be failing. > > Not sure if it is relevant, but both had extra flag `-XX:-UseTLAB`, we add this flag in our additional stress testing. Thanks for letting me know. This test by default takes a very long time to finish. As i added more flags (to test various vector sizes) to be tested, it probably ran way too long than anticipated and resulted in the timeout error. I will update with a patch to remove testing these extra flags for now. I did try to increase the default timeout value but the test continued to run for more than 20 min after which I had to terminate. I feel the best for now is to remove the additional tests for various vector sizes that I have added. I will update the patch soon. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25096#issuecomment-2897732641 From mli at openjdk.org Wed May 21 12:46:53 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 21 May 2025 12:46:53 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v14] In-Reply-To: References: <-Ig2biJjwMoR79hyYfSNxJLqavcMzVyLFZvnV0J_t90=.4eb702b5-2cbd-40c6-81fd-744a2fe98acf@github.com> Message-ID: On Wed, 21 May 2025 11:20:50 GMT, Anjian-Wen wrote: > The last change on the code after L_fill_elements make the byte store 1 by 1, it seems that 'true' or 'false' may not affect it when the count is less or equal 7? Ah, I see. > I think the above align section is like a fast path which can reduce instruction for the count is large than 7, so we may should check the result when count > 7 ? besides, I'm testing on that right now(delete the align tail part), we can find it out later Thanks, let's see the test result. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2897835228 From chagedorn at openjdk.org Wed May 21 12:50:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 21 May 2025 12:50:01 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v24] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 12:05:06 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopnode.cpp line 1156: >> >>> 1154: >>> 1155: void visit(const ParsePredicate& parse_predicate) override { >>> 1156: _clone_predicate_to_loop.clone_parse_predicate(parse_predicate, true); >> >> `clone_parse_predicate()` has its second parameter named `is_false_path_loop`. I think that no longer makes sense because we are now reusing the method outside Loop Unswitching.. Maybe we should just rename the parameter to `rewire_uncommon_proj_phi_inputs` which is eventually the name in `create_new_if_for_predicate()`. Additionally, we should rename `ParsePredicate::clone_to_unswitched_loop()` to `clone_to_loop()`. What do you think? > > Sounds good. New commit has this renaming. Question now is what we do with `ParsePredicate::trace_cloned_parse_predicate()` that wouldn't always print a message that makes sense. Good catch. That is now off as well. Additionally, it should probably be `TraceLoopUnswitching` and not `TraceLoopPredicate`. We could return the `ParsePredicate` from `clone_parse_predicate()` which is called from `CloneUnswitchedLoopPredicatesVisitor::visit()` and then call it from there. Maybe something like below?

Patch Suggestion (untested) Index: src/hotspot/share/opto/predicates.hpp IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== diff --git a/src/hotspot/share/opto/predicates.hpp b/src/hotspot/share/opto/predicates.hpp --- a/src/hotspot/share/opto/predicates.hpp (revision a0cdf36bdfeca9cd8b669859700d63d5ee627458) +++ b/src/hotspot/share/opto/predicates.hpp (date 1747831252516) @@ -288,8 +288,6 @@ } static ParsePredicateNode* init_parse_predicate(const Node* parse_predicate_proj, Deoptimization::DeoptReason deopt_reason); - NOT_PRODUCT(static void trace_cloned_parse_predicate(bool is_false_path_loop, - const ParsePredicateSuccessProj* success_proj);) public: ParsePredicate(Node* parse_predicate_proj, Deoptimization::DeoptReason deopt_reason) @@ -320,8 +318,8 @@ return _success_proj; } - ParsePredicate clone_to_unswitched_loop(Node* new_control, bool is_false_path_loop, - PhaseIdealLoop* phase) const; + ParsePredicate clone_to_unswitched_loop(Node* new_control, bool is_false_path_loop, PhaseIdealLoop* phase) const; + NOT_PRODUCT(void trace_cloned_parse_predicate(bool is_false_path_loop) const;) void kill(PhaseIterGVN& igvn) const; }; @@ -1158,10 +1156,11 @@ ClonePredicateToTargetLoop(LoopNode* target_loop_head, const NodeInLoopBody& node_in_loop_body, PhaseIdealLoop* phase); // Clones the provided Parse Predicate to the head of the current predicate chain at the target loop. - void clone_parse_predicate(const ParsePredicate& parse_predicate, bool is_false_path_loop) { + ParsePredicate clone_parse_predicate(const ParsePredicate& parse_predicate, bool is_false_path_loop) { ParsePredicate cloned_parse_predicate = parse_predicate.clone_to_unswitched_loop(_old_target_loop_entry, is_false_path_loop, _phase); _target_loop_predicate_chain.insert_predicate(cloned_parse_predicate); + return cloned_parse_predicate; } void clone_template_assertion_predicate(const TemplateAssertionPredicate& template_assertion_predicate); @@ -1189,6 +1188,8 @@ using PredicateVisitor::visit; void visit(const ParsePredicate& parse_predicate) override; + void CloneUnswitchedLoopPredicatesVisitor::clone_parse_predicate(const ParsePredicate& parse_predicate, + bool is_false_path_loop); void visit(const TemplateAssertionPredicate& template_assertion_predicate) override; }; Index: src/hotspot/share/opto/predicates.cpp IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 =================================================================== diff --git a/src/hotspot/share/opto/predicates.cpp b/src/hotspot/share/opto/predicates.cpp --- a/src/hotspot/share/opto/predicates.cpp (revision a0cdf36bdfeca9cd8b669859700d63d5ee627458) +++ b/src/hotspot/share/opto/predicates.cpp (date 1747831362294) @@ -87,7 +87,6 @@ ParsePredicateSuccessProj* success_proj = phase->create_new_if_for_predicate(_success_proj, new_control, _parse_predicate_node->deopt_reason(), Op_ParsePredicate, is_false_path_loop); - NOT_PRODUCT(trace_cloned_parse_predicate(is_false_path_loop, success_proj)); return ParsePredicate(success_proj, _parse_predicate_node->deopt_reason()); } @@ -97,11 +96,10 @@ } #ifndef PRODUCT -void ParsePredicate::trace_cloned_parse_predicate(const bool is_false_path_loop, - const ParsePredicateSuccessProj* success_proj) { - if (TraceLoopPredicate) { +void ParsePredicate::trace_cloned_parse_predicate(const bool is_false_path_loop) const { + if (TraceLoopUnswitching) { tty->print("Parse Predicate cloned to %s path loop: ", is_false_path_loop ? "false" : "true"); - success_proj->in(0)->dump(); + head()->dump(); } } #endif // NOT PRODUCT @@ -1108,11 +1106,18 @@ if (_is_counted_loop && deopt_reason == Deoptimization::Reason_loop_limit_check) { return; } - _clone_predicate_to_true_path_loop.clone_parse_predicate(parse_predicate, false); - _clone_predicate_to_false_path_loop.clone_parse_predicate(parse_predicate, true); + clone_parse_predicate(parse_predicate, false); + clone_parse_predicate(parse_predicate, true); parse_predicate.kill(_phase->igvn()); } +void CloneUnswitchedLoopPredicatesVisitor::clone_parse_predicate(const ParsePredicate& parse_predicate, + const bool is_false_path_loop) { + const ParsePredicate cloned_parse_predicate = + _clone_predicate_to_true_path_loop.clone_parse_predicate(parse_predicate, false); + NOT_PRODUCT(cloned_parse_predicate.trace_cloned_parse_predicate(is_false_path_loop);) +} + // Clone the Template Assertion Predicate, which is currently found before the newly added unswitched loop selector, // to the true path and false path loop. void CloneUnswitchedLoopPredicatesVisitor::visit(const TemplateAssertionPredicate& template_assertion_predicate) {
The only downside is that we actually would not need the return value in product. But I guess that's negligible. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2100204407 From epeter at openjdk.org Wed May 21 13:11:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 May 2025 13:11:22 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v51] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: implement brackets for dollar and hashtag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/b4193e8b..4363a92a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=50 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=49-50 Stats: 87 lines in 4 files changed: 81 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From duke at openjdk.org Wed May 21 13:11:34 2025 From: duke at openjdk.org (David Linus Briemann) Date: Wed, 21 May 2025 13:11:34 GMT Subject: RFR: 8357304: [PPC64] C2: Implement MinV, MaxV and Reduction nodes Message-ID: The following nodes are added: - MinV / MaxV - AndV / OrV / XorV - MinReductionV / MaxReductionV / AndReductionV / OrReductionV / XorReductionV - AddReductionVI / MulReductionVI ------------- Commit messages: - also exclude float/double from MaxV, MinV - filter subword types for MaxV, MinV - move vmax_reg code - add size to vand, vor, vxor nodes - compact switch case - correct opcodes - switch vsumsws to vadduwm - reductions only support int - remove long nodes - add MinL / MaxL nodes - ... and 11 more: https://git.openjdk.org/jdk/compare/3acfa9e4...e3c5f9f1 Changes: https://git.openjdk.org/jdk/pull/25318/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25318&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357304 Stats: 225 lines in 6 files changed: 225 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25318.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25318/head:pull/25318 PR: https://git.openjdk.org/jdk/pull/25318 From mdoerr at openjdk.org Wed May 21 13:11:35 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 21 May 2025 13:11:35 GMT Subject: RFR: 8357304: [PPC64] C2: Implement MinV, MaxV and Reduction nodes In-Reply-To: References: Message-ID: On Tue, 20 May 2025 06:47:45 GMT, David Linus Briemann wrote: > The following nodes are added: > - MinV / MaxV > - AndV / OrV / XorV > - MinReductionV / MaxReductionV / AndReductionV / OrReductionV / XorReductionV > - AddReductionVI / MulReductionVI Very nice! Needs some adaptations. src/hotspot/cpu/ppc/assembler_ppc.hpp line 597: > 595: XVMINDP_OPCODE = (60u << OPCODE_SHIFT | 232u << 2), > 596: XVMAXSP_OPCODE = (60u << OPCODE_SHIFT | 192u << 2), > 597: XVMAXDP_OPCODE = (60u << OPCODE_SHIFT | 224u << 2), Should be "<< 3". src/hotspot/cpu/ppc/c2_MacroAssembler_ppc.cpp line 656: > 654: vmaxsw(dst, a, b); > 655: break; > 656: default: assert(false, "wrong opcode"); Maybe put each case in one line? Would be more compact. Up to you. src/hotspot/cpu/ppc/ppc.ad line 14307: > 14305: instruct vand(vecX dst, vecX src1, vecX src2) %{ > 14306: match(Set dst (AndV src1 src2)); > 14307: format %{ "VAND $dst,$src1,$src2\t// and vectors" %} Each of these nodes should have a size. src/hotspot/cpu/ppc/ppc.ad line 14391: > 14389: %} > 14390: ins_pipe(pipe_class_default); > 14391: %} Better move up to vmin_reg. ------------- PR Review: https://git.openjdk.org/jdk/pull/25318#pullrequestreview-2855537618 PR Review Comment: https://git.openjdk.org/jdk/pull/25318#discussion_r2098838125 PR Review Comment: https://git.openjdk.org/jdk/pull/25318#discussion_r2098842435 PR Review Comment: https://git.openjdk.org/jdk/pull/25318#discussion_r2098844852 PR Review Comment: https://git.openjdk.org/jdk/pull/25318#discussion_r2098847728 From duke at openjdk.org Wed May 21 13:11:35 2025 From: duke at openjdk.org (David Linus Briemann) Date: Wed, 21 May 2025 13:11:35 GMT Subject: RFR: 8357304: [PPC64] C2: Implement MinV, MaxV and Reduction nodes In-Reply-To: References: Message-ID: <9jpqwEJ4w2uozz_77pq58Rt5bhJfS9rYvAXi0xM7viw=.879184a5-de94-48d8-a2d0-2bc2aa605e83@github.com> On Tue, 20 May 2025 20:42:48 GMT, Martin Doerr wrote: >> The following nodes are added: >> - MinV / MaxV >> - AndV / OrV / XorV >> - MinReductionV / MaxReductionV / AndReductionV / OrReductionV / XorReductionV >> - AddReductionVI / MulReductionVI > > src/hotspot/cpu/ppc/c2_MacroAssembler_ppc.cpp line 656: > >> 654: vmaxsw(dst, a, b); >> 655: break; >> 656: default: assert(false, "wrong opcode"); > > Maybe put each case in one line? Would be more compact. Up to you. Good idea. Looks better when compacted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25318#discussion_r2099599445 From epeter at openjdk.org Wed May 21 13:26:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 May 2025 13:26:25 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v52] In-Reply-To: References: Message-ID: <5hpiaBAqkmEAwK_96jLCl6ysDeVSgQyl-FABPKyNvjY=.2963d254-4f6b-4578-87e4-e487f9de4ed8@github.com> > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: document syntax of dollar and hashtag ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/4363a92a..28b4eaa7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=51 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=50-51 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Wed May 21 13:42:19 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 May 2025 13:42:19 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v53] In-Reply-To: References: Message-ID: <2WOGu4F76zeKo3VTQe4GNuA1rTZKVbyyhUEcAhrmjt4=.66c348ca-0016-4c9b-8af4-3007bde64c71@github.com> > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: improve documentation in TestTemplate.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/28b4eaa7..cb17dfb7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=52 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=51-52 Stats: 38 lines in 1 file changed: 32 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Wed May 21 13:46:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 May 2025 13:46:54 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v4] In-Reply-To: References: Message-ID: <5OezxGXLCvvauaNiX7FkOacjbwvvB-sc3k8MdEjKmwo=.8d69862a-feee-4dbe-bcf9-b53620f823f7@github.com> On Wed, 21 May 2025 12:13:53 GMT, Bhavana Kilambi wrote: >>> Looks good. I'm assuming you've tested both SVE and Neon. >> >> Yes, this was tested on both SVE and Neon (N1/V1/V2 architectures). > >> @Bhavana-Kilambi I'm getting timeouts with your new test: `compiler/vectorization/TestFloat16VectorOperations.java` >> >> At least on `linux-aarch64-debug` and `windows-x64-debug`, but not all tests have completed yet, so more could be failing. >> >> Not sure if it is relevant, but both had extra flag `-XX:-UseTLAB`, we add this flag in our additional stress testing. > > Thanks for letting me know. This test by default takes a very long time to finish. As i added more flags (to test various vector sizes) to be tested, it probably ran way too long than anticipated and resulted in the timeout error. I will update with a patch to remove testing these extra flags for now. I did try to increase the default timeout value but the test continued to run for more than 20 min after which I had to terminate. I feel the best for now is to remove the additional tests for various vector sizes that I have added. I will update the patch soon. @Bhavana-Kilambi Ok, yes, 20min is a bit excessive ? Generally, we should periodically run all vector tests with various `MaxVectorSize` settings. But doing that all the time is often too time consuming. For some specific tests, it can make sense though to iterate over multiple sizes. I wonder if you could also reduce the runtime of the test in other ways? Maybe reduce the warmup? It seems a bit excessive to do `10000` warmup iterations, which each execute a loop with many iterations themselves. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25096#issuecomment-2898027287 From zzambers at openjdk.org Wed May 21 13:57:07 2025 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Wed, 21 May 2025 13:57:07 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option [v2] In-Reply-To: References: Message-ID: > This adds `@requires vm.compiler2.enabled` to tests, which fail with `Unrecognized VM option` on client VM. Zdenek Zambersky has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: Fix of compiler tests for client VM ------------- Changes: https://git.openjdk.org/jdk/pull/24262/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24262&range=01 Stats: 161 lines in 71 files changed: 53 ins; 0 del; 108 mod Patch: https://git.openjdk.org/jdk/pull/24262.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24262/head:pull/24262 PR: https://git.openjdk.org/jdk/pull/24262 From zzambers at openjdk.org Wed May 21 14:02:59 2025 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Wed, 21 May 2025 14:02:59 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: <_Lde8-U9gDYDIMGDWg5aQKrWTq2suhAmkz97-xy54QU=.b73a1b9b-64dc-49f7-95d5-d00bd98a67eb@github.com> References: <_Lde8-U9gDYDIMGDWg5aQKrWTq2suhAmkz97-xy54QU=.b73a1b9b-64dc-49f7-95d5-d00bd98a67eb@github.com> Message-ID: On Tue, 29 Apr 2025 16:33:41 GMT, Emanuel Peter wrote: >>> Why not just add the compile flag `-XX:-IgnoreUnrecognizedVMOptions`? That could be a good alternative for most cases, I think. >> >> I saw that approach sometimes used as well. (My little, probably unfounded concern, would be, that typos in args could then be silently ignored.) >> >> I can change my PR to use `-XX:-IgnoreUnrecognizedVMOptions` instead, if that approach is preferable. > > @zzambers Are you still working on this? @eme64 Sorry for late response, I updated change set to use approach with `-XX:+IgnoreUnrecognizedVMOptions`. Tested locally with `hotspot_compiler` tests. Rebased to current master. `LateInlinePrinting.java` tests keeps using `@requires`, because output printed by ClientVM is different (with unrecognized options are ignored). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2898080867 From jbhateja at openjdk.org Wed May 21 14:09:07 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 21 May 2025 14:09:07 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill Message-ID: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Patch spills APX EGPRs across runtime calls to slow-path barriers using PUSH2P/POP2 instructions with PPX hints. These instructions operate over a pair of registers resulting into an smaller save/restoration JIT code, on the hind side they have hard alignment and balancing constraints, as they operate at 16-byte aligned stack address. ZRuntimeCallSpill is agnostic to live register, thus resulting SPILL sequence should not modify the contents of the register. Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill Changes: https://git.openjdk.org/jdk/pull/25351/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25351&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357267 Stats: 63 lines in 1 file changed: 46 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/25351.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25351/head:pull/25351 PR: https://git.openjdk.org/jdk/pull/25351 From duke at openjdk.org Wed May 21 14:12:17 2025 From: duke at openjdk.org (Anjian-Wen) Date: Wed, 21 May 2025 14:12:17 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v15] In-Reply-To: References: Message-ID: > From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time > > on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below > > before the patch > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op > MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op > MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op > MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op > MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op > MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op > MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op > MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op > MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op > MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op > MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op > MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op > MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op > MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op > MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op > MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op > MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op > MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op > MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/op > MemorySegmentZeroUnsafe.panama false 63 avgt 30... Anjian-Wen has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8351140 - change all the t0 with tmp_reg - update code format - update code for optimize - change the name of dest from 'to' to 'dest' - RISC-V: Intrinsify Unsafe::setMemory ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23890/files - new: https://git.openjdk.org/jdk/pull/23890/files/ff8c134e..4021ff5e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=13-14 Stats: 28576 lines in 591 files changed: 11238 ins; 14443 del; 2895 mod Patch: https://git.openjdk.org/jdk/pull/23890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23890/head:pull/23890 PR: https://git.openjdk.org/jdk/pull/23890 From chagedorn at openjdk.org Wed May 21 14:50:52 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 21 May 2025 14:50:52 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll In-Reply-To: References: Message-ID: <8q08MdGP6Oo_brVI0kxsPYm1XSH1wYEguhJbnD0i1LI=.ea512821-a21c-453b-96e6-b64f7e5fb94d@github.com> On Mon, 19 May 2025 06:43:38 GMT, Marc Chevalier wrote: > This assert seems a bit too tight. See the JBS issue to check the math: the bound of `trip_count` should be `<= 2^31`, while the current bound is ` < (julong)max_juint/2` = floor((2^32-1)/2) = (2^32-2) / 2 = 2^31-1. Drive-by comment: Were you able to extract a regression test that does not require the stress peeling flag? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25295#issuecomment-2898241519 From epeter at openjdk.org Wed May 21 15:15:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 May 2025 15:15:24 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v54] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: generateWithDataNamesAndScopes tutorial ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/cb17dfb7..3705c855 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=53 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=52-53 Stats: 223 lines in 1 file changed: 203 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From mdoerr at openjdk.org Wed May 21 15:18:00 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 21 May 2025 15:18:00 GMT Subject: RFR: 8357304: [PPC64] C2: Implement MinV, MaxV and Reduction nodes In-Reply-To: References: Message-ID: On Tue, 20 May 2025 06:47:45 GMT, David Linus Briemann wrote: > The following nodes are added: > - MinV / MaxV > - AndV / OrV / XorV > - MinReductionV / MaxReductionV / AndReductionV / OrReductionV / XorReductionV > - AddReductionVI / MulReductionVI src/hotspot/cpu/ppc/c2_MacroAssembler_ppc.cpp line 682: > 680: vsldoi(vTmp1, vTmp2, vTmp2, 4); // vTmp1 <- [op(i1,i3), op(i2,i0), op(i3,i1), op(i0,i2)] > 681: fn_vec_op(opcode, vTmp1, vTmp1, vTmp2); // vTmp1 <- [op(i0,i1,i2,i3), op(i0,i1,i2,i3), op(i0,i1,i2,i3), op(i0,i1,i2,i3)] > 682: mfvsrwz(dst, vTmp1.to_vsr()); // Maybe use R0 and avoid TEMP_DEF effect? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25318#discussion_r2100555225 From epeter at openjdk.org Wed May 21 15:18:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 May 2025 15:18:09 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v25] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 14:48:39 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix typo > > Thank you for the refactoring and your patience. I like the result and its simplicity a lot. > > I found a few typos, but otherwise it looks excellent. @mhaessig @chhagedorn @robcasloz I'm done with all of my TODOs from above ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2898334663 From epeter at openjdk.org Wed May 21 15:24:18 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 May 2025 15:24:18 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v55] In-Reply-To: References: Message-ID: <0jRlRRNQVdveMPwuxSXuX69ZZk5w5BpHeNtbMP03C2k=.b0830b58-f72e-43c0-bdaa-890748f962e7@github.com> > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: move order in tutorial ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/3705c855..62d4c499 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=54 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=53-54 Stats: 429 lines in 1 file changed: 201 ins; 200 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Wed May 21 15:37:04 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 May 2025 15:37:04 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 17:06:34 GMT, Roberto Casta?eda Lozano wrote: >> A few more documentation suggestions, will continue reviewing this changeset over the next days. > >> @robcasloz I addressed all your comments :) > > Thanks @eme64! Especially thanks for doing some hands-on testing, which brought up the necessity of `#{name}` for cases like `#{TYPE}_CON`, where `#TYPE_CON` did not work nicely, because it uses the name `TYPE_CON` rather than `TYPE`, which would have required an additional `let` definition. Manuel and Christian have spend a few hours in offline meetings by now, and made a lot of good suggestions. It was hard work, but it has payed off, I'm very thankful ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2898393162 From epeter at openjdk.org Wed May 21 15:37:05 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 21 May 2025 15:37:05 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v55] In-Reply-To: <0jRlRRNQVdveMPwuxSXuX69ZZk5w5BpHeNtbMP03C2k=.b0830b58-f72e-43c0-bdaa-890748f962e7@github.com> References: <0jRlRRNQVdveMPwuxSXuX69ZZk5w5BpHeNtbMP03C2k=.b0830b58-f72e-43c0-bdaa-890748f962e7@github.com> Message-ID: On Wed, 21 May 2025 15:24:18 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > move order in tutorial Looks like I can only add one at a time: ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2898399019 From never at openjdk.org Wed May 21 15:57:55 2025 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 21 May 2025 15:57:55 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v2] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 12:14:07 GMT, Doug Simon wrote: >> As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: >> >> >> Error occurred during initialization of VM >> java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) >> >> >> This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. >> Instead of exiting the VM, the failure should be silent (unless `-XX:+PrintCompilation` is enabled) as the VM can continue without libgraal, albeit in a crippled state. This PR implements this solution. >> >> Alternative solutions include: >> 1. Trying to adjust the values used with `ulimit -v` in the tests to accommodate the [virtual address reservations](https://github.com/oracle/graal/blob/69f10d3d658a6aeca3d5ce59c64af6a18336f14c/substratevm/src/com.oracle.svm.core.genscavenge/src/com/oracle/svm/core/genscavenge/AddressRangeCommittedMemoryProvider.java#L150) needed by libgraal. This is brittle as it assumes knowledge about how much address space is needed (which is turn depends on how many libgraal compiler threads are created). >> 2. Add a `@requires !vm.libgraal.jit` guard to the tests so they are not run when libgraal is in use. >> >> I think the solution in this PR is the most robust for the long term. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > consolidate JVMCI eager initialization Silently disabling the top level JIT seems like a bad default behaviour for customers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25307#issuecomment-2898465575 From sviswanathan at openjdk.org Wed May 21 16:01:52 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 21 May 2025 16:01:52 GMT Subject: RFR: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry [v5] In-Reply-To: References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: On Wed, 21 May 2025 01:57:35 GMT, Jatin Bhateja wrote: >> PR adds missing EVEX compressed displacement attributes used for computing the scale factor (N) of compressed displacement. >> AVX512 memory operand instructions use compressed disp8 encoding if the displacement is a multiple of scale (N), which depends on Vector Length, embedded broadcasting, and lane size. Please refer to section 2.7.5 of Intel SDM for more details. >> >> e.g., Consider two instructions, one with displacement 0x10203040 and the other with displacement 0x40, instruction operates over full 64-byte vector hence scale N = 64. Displacement of latter instruction is a multiple of scale, thus can be represented by 1 byte displacement encoding, while the former requires 4 bytes to represent displacement in instruction encoding. >> >> >> 1) vpternlogq $0xff,0x10203040(%r20,%r21,8),%zmm23,%zmm24 >> EVEX OP MR SIB DISP IMM >> --------------|----|----|----|---------------|-----| >> 62 6b c1 40 25 84 ec 40 30 20 10 ff >> >> 2) vpternlogq $0xff,0x40(%r20,%r21,8),%zmm23,%zmm24 >> For full vector width operation, scalar matches with vector size, hence scale N = 64 >> effective displacement / compressed DISP8 = OFFSET(64) / 64 = 0x1 >> EVEX OP MR SIB DISP IMM >> -------------|----|---|---|-----------|---| >> 62 6b c1 40 25 44 ec 01 ff >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review Resoultions Looks good to me. Thanks for fixing this issue. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25021#pullrequestreview-2858346557 From dnsimon at openjdk.org Wed May 21 16:15:52 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 21 May 2025 16:15:52 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v2] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 15:54:58 GMT, Tom Rodriguez wrote: > Silently disabling the top level JIT seems like a bad default behaviour for customers. This does not disable the JIT, just suppresses a specific type of error (i.e., reserving virtual address space for the SVM heap) when trying to initialize libgraal at startup. Importantly, the error of badly specified libgraal options still causes a VM exit. What alternative solution would you prefer? One of the other 2 proposals in the PR description? Or something else? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25307#issuecomment-2898519536 From sviswanathan at openjdk.org Wed May 21 16:24:53 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 21 May 2025 16:24:53 GMT Subject: RFR: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry In-Reply-To: References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: On Mon, 12 May 2025 12:11:36 GMT, Jatin Bhateja wrote: >> @jatin-bhateja I'll run some internal testing, please ping me in 24h for results! :) > >> @jatin-bhateja I'll run some internal testing, please ping me in 24h for results! :) > > Please use the latest version > @jatin-bhateja @sviswa7 Can you explain the impact of the `EVEX_HVM`, `EVEX_QVM` etc, and what is the impact if we get them wrong? Performance? Wrong results? How can we test that they are correct? @eme64 In EVEX the displacement for memory in the addressing mode is encoded using compressed disp8 encoding scheme. The EVEX_FVM, EVEX_HVM, EVEX_QVM etc denote tuple type and are used to determine the scaling factor for displacement. Please see section "2.7.5 Compressed Displacement (disp8*N) Support in EVEX" in [Intel SDM Volume 2](https://cdrdv2.intel.com/v1/dl/getContent/671110). So to answer your question, if the tuple type is incorrect we will see wrong results if the displacement is non zero. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25021#issuecomment-2898543098 From bkilambi at openjdk.org Wed May 21 16:46:54 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 21 May 2025 16:46:54 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v4] In-Reply-To: <5OezxGXLCvvauaNiX7FkOacjbwvvB-sc3k8MdEjKmwo=.8d69862a-feee-4dbe-bcf9-b53620f823f7@github.com> References: <5OezxGXLCvvauaNiX7FkOacjbwvvB-sc3k8MdEjKmwo=.8d69862a-feee-4dbe-bcf9-b53620f823f7@github.com> Message-ID: On Wed, 21 May 2025 13:44:29 GMT, Emanuel Peter wrote: >>> @Bhavana-Kilambi I'm getting timeouts with your new test: `compiler/vectorization/TestFloat16VectorOperations.java` >>> >>> At least on `linux-aarch64-debug` and `windows-x64-debug`, but not all tests have completed yet, so more could be failing. >>> >>> Not sure if it is relevant, but both had extra flag `-XX:-UseTLAB`, we add this flag in our additional stress testing. >> >> Thanks for letting me know. This test by default takes a very long time to finish. As i added more flags (to test various vector sizes) to be tested, it probably ran way too long than anticipated and resulted in the timeout error. I will update with a patch to remove testing these extra flags for now. I did try to increase the default timeout value but the test continued to run for more than 20 min after which I had to terminate. I feel the best for now is to remove the additional tests for various vector sizes that I have added. I will update the patch soon. > > @Bhavana-Kilambi Ok, yes, 20min is a bit excessive ? > > Generally, we should periodically run all vector tests with various `MaxVectorSize` settings. But doing that all the time is often too time consuming. For some specific tests, it can make sense though to iterate over multiple sizes. > > I wonder if you could also reduce the runtime of the test in other ways? Maybe reduce the warmup? It seems a bit excessive to do `10000` warmup iterations, which each execute a loop with many iterations themselves. Hi @eme64 I removed the @Warmup entirely and the test does pass on aarch64. Although I am a bit afraid to fully remove it as it could sometimes lead to the loop not being warm enough for c2 vectorization to kick in. I haven't tried with different values of the warmup iterations though. Do you think it's ok to remove it entirely? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25096#issuecomment-2898601326 From never at openjdk.org Wed May 21 17:55:55 2025 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 21 May 2025 17:55:55 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v2] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 12:14:07 GMT, Doug Simon wrote: >> As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: >> >> >> Error occurred during initialization of VM >> java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) >> >> >> This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. >> Instead of exiting the VM, the failure should be silent (unless `-XX:+PrintCompilation` is enabled) as the VM can continue without libgraal, albeit in a crippled state. This PR implements this solution. >> >> Alternative solutions include: >> 1. Trying to adjust the values used with `ulimit -v` in the tests to accommodate the [virtual address reservations](https://github.com/oracle/graal/blob/69f10d3d658a6aeca3d5ce59c64af6a18336f14c/substratevm/src/com.oracle.svm.core.genscavenge/src/com/oracle/svm/core/genscavenge/AddressRangeCommittedMemoryProvider.java#L150) needed by libgraal. This is brittle as it assumes knowledge about how much address space is needed (which is turn depends on how many libgraal compiler threads are created). >> 2. Add a `@requires !vm.libgraal.jit` guard to the tests so they are not run when libgraal is in use. >> >> I think the solution in this PR is the most robust for the long term. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > consolidate JVMCI eager initialization After this executes we have a running JVM without a working libgraal right? It might be rare in a user environment but it's very confusing behaviour for an end user. Might this not occur in a virtualized environment? I agree it would be very hard to make libgraal robust in the face of such a limited virtual address space so I think disabling the tests for libgraal would be easiest. Or both of those tests could probably just run with -Xint to avoid this completely. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25307#issuecomment-2898777546 From sviswanathan at openjdk.org Wed May 21 20:04:54 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 21 May 2025 20:04:54 GMT Subject: RFR: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry In-Reply-To: References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: On Mon, 12 May 2025 12:11:36 GMT, Jatin Bhateja wrote: >> @jatin-bhateja I'll run some internal testing, please ping me in 24h for results! :) > >> @jatin-bhateja I'll run some internal testing, please ping me in 24h for results! :) > > Please use the latest version > > @jatin-bhateja @sviswa7 Can you explain the impact of the `EVEX_HVM`, `EVEX_QVM` etc, and what is the impact if we get them wrong? Performance? Wrong results? How can we test that they are correct? > > @eme64 In EVEX the displacement for memory in the addressing mode is encoded using compressed disp8 encoding scheme. The EVEX_FVM, EVEX_HVM, EVEX_QVM etc denote tuple type and are used to determine the scaling factor for displacement. Please see section "2.7.5 Compressed Displacement (disp8*N) Support in EVEX" in [Intel SDM Volume 2](https://cdrdv2.intel.com/v1/dl/getContent/671110). So to answer your question, if the tuple type is incorrect we will see wrong results if the displacement is non zero. For testing, the best way would be to create a SIMD instruction encoding test tool on similar lines as https://github.com/openjdk/jdk/commit/52d752c43b3a9935ea97051c39adf381084035cc in a separate future PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25021#issuecomment-2899098119 From liach at openjdk.org Wed May 21 20:16:01 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 21 May 2025 20:16:01 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v6] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 06:08:37 GMT, Jaikiran Pai wrote: >> Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - Move intrinsic to be a subsection; just one most common function of the annotation >> - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate >> - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate >> - Update src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java >> >> Co-authored-by: Raffaello Giulietti >> - Shorter first sentence >> - Updates, thanks to John >> - Refine validation and defensive copying >> - 8355223: Improve documentation on @IntrinsicCandidate > > src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 39: > >> 37: *

Intrinsification

>> 38: * The most frequently special treatment is intrinsification, which replaces a >> 39: * candidate method's body, bytecode or native, with handwritten platform > > Is this sentence missing the word "code" after "native"? Should it have been: > >> bytecode or native code, ... This is referring to native method's bodies. I think "bytecode or native" is sufficient to summarize the executable method body in the Java Language/Virtual Machine Specification. > src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 50: > >> 48: * For example, the bytecodes of a candidate method may be executed by lower >> 49: * compilation tiers of VM execution, while higher compilation tiers may replace >> 50: * the bytecodes with specialized assembly code and/or compiler IR. Therefore, > >> while higher compilation tiers may replace the bytecodes with specialized assembly code and/or compiler IR > > Is there ever a case, where for a `@IntrinsicCandidate` method, the runtime will choose to execute the instrinsic for that method for a certain duration and then at a later point in time replace the intrinsic with compiler generated code? In other words, once the runtime executes the intrinsic implementation for a `@IntrinsicCandidate` method, will the method's implementation be switched to anything else during the lifetime of an application? We cannot rule it out, but this sentence begins "for example" meaning this is just one scenario and is not exhaustive. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2101082191 PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2101080051 From dnsimon at openjdk.org Wed May 21 20:41:35 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 21 May 2025 20:41:35 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v3] In-Reply-To: References: Message-ID: > As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: > > > Error occurred during initialization of VM > java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) > > > This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. > Since these tests were passing on libgraal prior to JDK-8356447, they obviously do not require JIT compilation. The simplest fix is to then use `-Xint` to disable the JIT. Doug Simon has updated the pull request incrementally with three additional commits since the last revision: - tests that use 'ulimit -v' should run with -Xint - Revert "do not exit VM if libjvmci env creation fails" This reverts commit 7eb259b92553669065db57d230476cf465a67d02. - Revert "consolidate JVMCI eager initialization" This reverts commit 32986d1a2b741ee8c9090cefbecc148bb8fbd7e4. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25307/files - new: https://git.openjdk.org/jdk/pull/25307/files/32986d1a..1a79617e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25307&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25307&range=01-02 Stats: 55 lines in 9 files changed: 30 ins; 18 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/25307.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25307/head:pull/25307 PR: https://git.openjdk.org/jdk/pull/25307 From dnsimon at openjdk.org Wed May 21 20:41:35 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 21 May 2025 20:41:35 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v2] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 17:53:13 GMT, Tom Rodriguez wrote: > Or both of those tests could probably just run with -Xint to avoid this completely. I've reverted to this solution - thanks for the suggestion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25307#issuecomment-2899176436 From dnsimon at openjdk.org Wed May 21 20:46:04 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 21 May 2025 20:46:04 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v4] In-Reply-To: References: Message-ID: > As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: > > > Error occurred during initialization of VM > java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) > > > This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. > Since these tests were passing on libgraal prior to JDK-8356447, they obviously do not require JIT compilation. The simplest fix is to then use `-Xint` to disable the JIT. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: added comments justifying use of -Xint ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25307/files - new: https://git.openjdk.org/jdk/pull/25307/files/1a79617e..b0d45b1b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25307&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25307&range=02-03 Stats: 7 lines in 2 files changed: 5 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25307.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25307/head:pull/25307 PR: https://git.openjdk.org/jdk/pull/25307 From dnsimon at openjdk.org Wed May 21 20:46:05 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 21 May 2025 20:46:05 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v3] In-Reply-To: References: Message-ID: <4TOJwaT4xDVYnzB1co2JKSILNBV5lwBUduMZHRtquSU=.754489ed-035f-427b-8903-f5edcd0309cd@github.com> On Wed, 21 May 2025 20:41:35 GMT, Doug Simon wrote: >> As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: >> >> >> Error occurred during initialization of VM >> java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) >> >> >> This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. >> Since these tests were passing on libgraal prior to JDK-8356447, they obviously do not require JIT compilation. The simplest fix is to then use `-Xint` to disable the JIT. > > Doug Simon has updated the pull request incrementally with three additional commits since the last revision: > > - tests that use 'ulimit -v' should run with -Xint > - Revert "do not exit VM if libjvmci env creation fails" > > This reverts commit 7eb259b92553669065db57d230476cf465a67d02. > - Revert "consolidate JVMCI eager initialization" > > This reverts commit 32986d1a2b741ee8c9090cefbecc148bb8fbd7e4. Tested locally with a build that includes libgraal. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25307#issuecomment-2899184608 From dnsimon at openjdk.org Wed May 21 20:59:33 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 21 May 2025 20:59:33 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v5] In-Reply-To: References: Message-ID: > As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: > > > Error occurred during initialization of VM > java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) > > > This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. > Since these tests were passing on libgraal prior to JDK-8356447, they obviously do not require JIT compilation. The simplest fix is to then use `-Xint` to disable the JIT. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: removed trailing space ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25307/files - new: https://git.openjdk.org/jdk/pull/25307/files/b0d45b1b..3201a5d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25307&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25307&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25307.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25307/head:pull/25307 PR: https://git.openjdk.org/jdk/pull/25307 From mchevalier at openjdk.org Wed May 21 21:05:52 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 21 May 2025 21:05:52 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll In-Reply-To: References: Message-ID: On Mon, 19 May 2025 06:43:38 GMT, Marc Chevalier wrote: > This assert seems a bit too tight. See the JBS issue to check the math: the bound of `trip_count` should be `<= 2^31`, while the current bound is ` < (julong)max_juint/2` = floor((2^32-1)/2) = (2^32-2) / 2 = 2^31-1. Nope, I couldn't find anything like that before, and I didn't manage to trick the stressed case to reproduce without the stress flag. But I'm not sure what that implies: if unroll happens in such a case, we would hit the assert, which manual computation shows as too restrictive (that is not true in the general case). That means that if we change policy_peeling in some ways, we could hit the assert in do_unroll. I don't really see how the fact we use a stress flag to reproduce changes anything: we do fix bugs that manifest only with some stress flags because we are always one irrelevant change away from having it happen for real. But if you think it's not worth fixing as long as we can't find a case that makes it happen without stress flag, fine with me! I don't mind closing the issue either. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25295#issuecomment-2899225819 From sviswanathan at openjdk.org Wed May 21 21:30:59 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 21 May 2025 21:30:59 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v22] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 01:24:51 GMT, Srinivas Vamsi Parasa wrote: >> Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. >> >> For example: >> >> `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding >> `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > Fix for UseAddressNop related failure src/hotspot/cpu/x86/vm_version_x86.cpp line 1635: > 1633: UseStoreImmI16 = false; // don't use it on Intel cpus > 1634: } > 1635: if (cpu_family() == 6 || cpu_family() == 15 || cpu_family() == 18 || cpu_family() == 19) { This could be written as if (is_P6_or_later() || cpu_family() == 15) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2100918961 From liach at openjdk.org Wed May 21 21:31:16 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 21 May 2025 21:31:16 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v7] In-Reply-To: References: Message-ID: > In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list". Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - More review updates - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate - Move intrinsic to be a subsection; just one most common function of the annotation - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate - Update src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java Co-authored-by: Raffaello Giulietti - Shorter first sentence - Updates, thanks to John - Refine validation and defensive copying - 8355223: Improve documentation on @IntrinsicCandidate ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24777/files - new: https://git.openjdk.org/jdk/pull/24777/files/317dd27a..a312d92b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24777&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24777&range=05-06 Stats: 45626 lines in 1611 files changed: 26514 ins; 10851 del; 8261 mod Patch: https://git.openjdk.org/jdk/pull/24777.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24777/head:pull/24777 PR: https://git.openjdk.org/jdk/pull/24777 From liach at openjdk.org Wed May 21 21:31:16 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 21 May 2025 21:31:16 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v6] In-Reply-To: References: Message-ID: <3iH45ZNknFrbSpG6duwRFvGzrbzEGxCY0EfCF1nd6SU=.11552086-a4a7-455a-8523-172f581618d7@github.com> On Fri, 16 May 2025 19:45:16 GMT, John R Rose wrote: > Or just: s/, unlike the other methods// Removed mention of "the other methods". > src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 90: > >> 88: * intrinsic.) For example, the documentation can simply say that the result is >> 89: * undefined if a race happens. However, race conditions must not lead to >> 90: * program failures or type safety breaches, as listed above. > > Maybe add a teaching paragraph: > >> Reasoning about such race conditions is difficult, but it is a necessary skill when working with intrinsics that can observe racing shared variables. One example of a tolerable race is a repeated read of a shared reference. This only works if the algorithm takes no action based on the first read, other than deciding to perform the second read; it must "forget what it saw" in the first read. This is why the array-mismatch intrinsics can sometimes report a tentative search hit (maybe using vectorized code), which can then be confirmed (by scalar code) as the caller makes a fresh and independent observation. > > (This is done when the array mismatch logic performs NaN-folding. I just noticed that the NaN-folding code in ArraysSupport is slightly incorrect with respect to races!) I have appened this teaching paragraph also as a blockquote inlined note after this current paragraph. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2101197921 PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2101199974 From liach at openjdk.org Wed May 21 21:31:16 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 21 May 2025 21:31:16 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v6] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 20:11:39 GMT, Chen Liang wrote: >> src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 50: >> >>> 48: * For example, the bytecodes of a candidate method may be executed by lower >>> 49: * compilation tiers of VM execution, while higher compilation tiers may replace >>> 50: * the bytecodes with specialized assembly code and/or compiler IR. Therefore, >> >>> while higher compilation tiers may replace the bytecodes with specialized assembly code and/or compiler IR >> >> Is there ever a case, where for a `@IntrinsicCandidate` method, the runtime will choose to execute the instrinsic for that method for a certain duration and then at a later point in time replace the intrinsic with compiler generated code? In other words, once the runtime executes the intrinsic implementation for a `@IntrinsicCandidate` method, will the method's implementation be switched to anything else during the lifetime of an application? > > We cannot rule it out, but this sentence begins "for example" meaning this is just one scenario and is not exhaustive. To address your concern, I have reworded: * During execution, intrinsification may happen and may be rolled back at any * moment; this loading and unloading process may happen zero to many times. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2101198872 From rriggs at openjdk.org Wed May 21 21:33:56 2025 From: rriggs at openjdk.org (Roger Riggs) Date: Wed, 21 May 2025 21:33:56 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v6] In-Reply-To: <3iH45ZNknFrbSpG6duwRFvGzrbzEGxCY0EfCF1nd6SU=.11552086-a4a7-455a-8523-172f581618d7@github.com> References: <3iH45ZNknFrbSpG6duwRFvGzrbzEGxCY0EfCF1nd6SU=.11552086-a4a7-455a-8523-172f581618d7@github.com> Message-ID: On Wed, 21 May 2025 21:28:07 GMT, Chen Liang wrote: >> src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java line 90: >> >>> 88: * intrinsic.) For example, the documentation can simply say that the result is >>> 89: * undefined if a race happens. However, race conditions must not lead to >>> 90: * program failures or type safety breaches, as listed above. >> >> Maybe add a teaching paragraph: >> >>> Reasoning about such race conditions is difficult, but it is a necessary skill when working with intrinsics that can observe racing shared variables. One example of a tolerable race is a repeated read of a shared reference. This only works if the algorithm takes no action based on the first read, other than deciding to perform the second read; it must "forget what it saw" in the first read. This is why the array-mismatch intrinsics can sometimes report a tentative search hit (maybe using vectorized code), which can then be confirmed (by scalar code) as the caller makes a fresh and independent observation. >> >> (This is done when the array mismatch logic performs NaN-folding. I just noticed that the NaN-folding code in ArraysSupport is slightly incorrect with respect to races!) > > I have appened this teaching paragraph also as a blockquote inlined note after this current paragraph. This unnecessary detail goes will beyond the description of the annotation and is more of a design doc for VM implementation so this is not really the best place for it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2101204688 From liach at openjdk.org Wed May 21 21:43:55 2025 From: liach at openjdk.org (Chen Liang) Date: Wed, 21 May 2025 21:43:55 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v6] In-Reply-To: References: <3iH45ZNknFrbSpG6duwRFvGzrbzEGxCY0EfCF1nd6SU=.11552086-a4a7-455a-8523-172f581618d7@github.com> Message-ID: On Wed, 21 May 2025 21:31:39 GMT, Roger Riggs wrote: >> I have appened this teaching paragraph also as a blockquote inlined note after this current paragraph. > > This unnecessary detail goes will beyond the description of the annotation and is more of a design doc for VM implementation so this is not really the best place for it. This is a general note for intrinsic APIs - in rendering, this is a blockquote, so it is nested as if it is an inlined footnote. It does not impact the readability of the main body. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24777#discussion_r2101216837 From sviswanathan at openjdk.org Wed May 21 22:31:58 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 21 May 2025 22:31:58 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill In-Reply-To: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Wed, 21 May 2025 12:33:26 GMT, Jatin Bhateja wrote: > Patch spills APX EGPRs across runtime calls to slow-path barriers using PUSH2P/POP2 instructions with PPX hints. > These instructions operate over a pair of registers resulting into an smaller save/restoration JIT code, on the hind side they have hard alignment and balancing constraints, as they operate over 16-byte aligned stack address. > ZRuntimeCallSpill is agnostic to live register, thus resulting SPILL sequence should not modify the contents of the register. > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. > > Kindly review and share your feedback. > > Best Regards, > Jatin src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp line 82: > 80: MacroAssembler* masm = _masm; > 81: if (VM_Version::supports_apx_f()) { > 82: __ push(rax); if _result is not equal to rax this also could be pushp rax here and popp rax in restore(). src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp line 83: > 81: if (VM_Version::supports_apx_f()) { > 82: __ push(rax); > 83: __ push(rcx); This could be __ pushp(rcx). src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp line 92: > 90: // Note: For PPX to work properly, a PPX-marked PUSH2 (respectively, POP2) should always > 91: // be matched with a PPX-marked POP2 (PUSH2), not with two PPX-marked POPs (PUSHs). > 92: __ pushp(rcx); This is saving old rsp on stack and restored using __ movptr(rsp, Address(rsp)) on the other end in restore(). So this should be __ push(rcx) and not __ pushp(rcx) as there is no corresponding __ popp() instruction for this pushp. src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp line 185: > 183: // Re-instantiate original stack pointer. > 184: __ movptr(rsp, Address(rsp)); > 185: __ pop(rcx); This could be __ popp(rcx). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25351#discussion_r2101275404 PR Review Comment: https://git.openjdk.org/jdk/pull/25351#discussion_r2101266706 PR Review Comment: https://git.openjdk.org/jdk/pull/25351#discussion_r2101264344 PR Review Comment: https://git.openjdk.org/jdk/pull/25351#discussion_r2101267521 From swen at openjdk.org Wed May 21 23:08:54 2025 From: swen at openjdk.org (Shaojin Wen) Date: Wed, 21 May 2025 23:08:54 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> Message-ID: <3uQVp0jGv5C58qUYuNkBTHakAXEp3eBm_6poKUgRfJI=.b31bd4a4-0e8a-471e-86fd-b2957a33870c@github.com> On Mon, 24 Mar 2025 11:41:46 GMT, Emanuel Peter wrote: >>> @kuaiwei I have not yet had the time to read through the PR, but I would like to talk about `LoadNode::Ideal`. The idea with `Ideal` in general, is that you replace one node with another. After `Ideal` returns, all usages of the old node now take the new node instead. >>> >>> You copied the structure from my MergeStores implementation in `StoreNode::Idea`. There it made sense to replace `StoreB` nodes that have a memory output with `LoadI` nodes, which also have memory output. >>> >>> But it does not make sense to replace a `LoadB` that has a byte/int output with a `LoadL` that has a long output for example. >>> >>> I think your implementation should go into `OrINode`, and match the expression up from there. Because we want to replace the old `OrI` with the new `LoadL`. >>> >>> Another question: Do you have some tests where some of the nodes in the `load/shift/or` expression have other uses? Imagine this: >>> >>> ``` >>> l0 = a[0]; >>> l1 = a[1]; >>> l2 = a[2]; >>> l3 = a[3]; >>> l = ; >>> now use l1 for something else as well >>> ``` >>> >>> What happens now? Do you check that we only use the old `LoadB` in the expression we are replacing? >> >> Hi @eme64 , I understand your concern. In this patch , I check the usage of all `loadB` nodes and only allow they have only single usage into `OrNode`, I also check the `OrNode` as well. So I think it will not cause the trouble. >> >> >> l0 = a[0]; >> l1 = a[1]; >> l2 = a[2]; >> l3 = a[3]; >> l = ; >> now use l1 for something else as well >> >> For this case, because l1 has other usage, all these loads will not be merged. >> >> In my previous patch, I tried to extract value from merged `LoadNode` if origin `loadB` has other usage, such as used by uncommon trap. You can find them in https://github.com/openjdk/jdk/pull/24023/commits/b621db1cf0c17885516254a2af4b5df43e06c098 and search MergePrimitiveLoads::extract_value_for_uncommon_trap . But in my test with jtreg tier1, it never hit a case which replaced `LoadB` used by uncommon trap, I think range check smearing remove all the uncommon trap usages. So I revert it to make code simple. In my opinion, the extract_value function can be used as a general solution for other usages. But we may need a cost model to evaluate cost of new instructions which used for extracting and benefit of merged load. To simplify, I choose to check usage strictly. > > @kuaiwei Thanks for your response! > > What about these two things I brought up? > >> Do you have some tests where some of the nodes in the load/shift/or expression have other uses? > > It would be good to have these tests, even if we think your code is correct. It is good to verify it with tests. And someone in the future might break it. > >> I think your implementation should go into OrINode, and match the expression up from there. Because we want to replace the old OrI with the new LoadL. > > This is really the pattern we use in `Idea`. We replace the node at the bottom of an expression with a new node (or new expression). @eme64 Can we complete the integration in JDK25 before June 5th? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2899460503 From sparasa at openjdk.org Wed May 21 23:35:39 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 21 May 2025 23:35:39 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v34] In-Reply-To: References: Message-ID: > Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. > > For example: > > `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding > `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding Srinivas Vamsi Parasa has updated the pull request incrementally with five additional commits since the last revision: - refactor to use is_P6_or_later() - rename byte1 to opcode_byte - rename evex_opcode_prefix_and_encode as emit_eevex_or_demote - rename evex to eevex in method names - reset swap=false as default ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/9a517c2f..110db142 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=33 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=32-33 Stats: 119 lines in 3 files changed: 0 ins; 0 del; 119 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From sviswanathan at openjdk.org Wed May 21 23:36:51 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 21 May 2025 23:36:51 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill In-Reply-To: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Wed, 21 May 2025 12:33:26 GMT, Jatin Bhateja wrote: > Patch spills APX EGPRs across runtime calls to slow-path barriers using PUSH2P/POP2 instructions with PPX hints. > These instructions operate over a pair of registers resulting into an smaller save/restoration JIT code, on the hind side they have hard alignment and balancing constraints, as they operate over 16-byte aligned stack address. > ZRuntimeCallSpill is agnostic to live register, thus resulting SPILL sequence should not modify the contents of the register. > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. > > Kindly review and share your feedback. > > Best Regards, > Jatin @xmas92 Could you please also take a look at this? Intel APX add additional GPR registers (R16 - R31). Our understanding is that these also need to be saved and restored as part of ZRuntimeCallSpill. Is that correct? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25351#issuecomment-2899502203 From sparasa at openjdk.org Wed May 21 23:43:57 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 21 May 2025 23:43:57 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v22] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 18:26:40 GMT, Sandhya Viswanathan wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix for UseAddressNop related failure > > src/hotspot/cpu/x86/vm_version_x86.cpp line 1635: > >> 1633: UseStoreImmI16 = false; // don't use it on Intel cpus >> 1634: } >> 1635: if (cpu_family() == 6 || cpu_family() == 15 || cpu_family() == 18 || cpu_family() == 19) { > > This could be written as if (is_P6_or_later() || cpu_family() == 15) This got fixed as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2101360424 From sviswanathan at openjdk.org Wed May 21 23:50:01 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 21 May 2025 23:50:01 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v34] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 23:35:39 GMT, Srinivas Vamsi Parasa wrote: >> Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. >> >> For example: >> >> `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding >> `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding > > Srinivas Vamsi Parasa has updated the pull request incrementally with five additional commits since the last revision: > > - refactor to use is_P6_or_later() > - rename byte1 to opcode_byte > - rename evex_opcode_prefix_and_encode as emit_eevex_or_demote > - rename evex to eevex in method names > - reset swap=false as default Changes look good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24431#pullrequestreview-2859399896 From sparasa at openjdk.org Wed May 21 23:53:00 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 21 May 2025 23:53:00 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v14] In-Reply-To: <9lqWA3ERAtAuuUDJNS0gIQDtN-RTOH_C-sxC_4ALH5g=.46c2438c-bdf6-43e1-847d-56c6c51e5454@github.com> References: <9lqWA3ERAtAuuUDJNS0gIQDtN-RTOH_C-sxC_4ALH5g=.46c2438c-bdf6-43e1-847d-56c6c51e5454@github.com> Message-ID: <3Tq3ef_yFX56l4-rKyLFx_r7iJfrHkOWJ9K65beGmV8=.5b8d258f-1d40-47d4-9631-ad02e8dc0ecd@github.com> On Wed, 7 May 2025 21:22:49 GMT, Emanuel Peter wrote: >> Hi Sandhya (@sviswa7) and Jatin (@jatin-bhateja), >> >> Could you please review the refactored changes? >> >> Thanks, >> Vamsi > > @vamsi-parasa @sviswa7 Did you already test this with `sde` and the `-future` flag? Once this is fully reviewed I can also run our internal testing, just let me know when you are ready :) Hello Emanuel (@eme64 ), Could you please run the tests and let me know? Thanks, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2899521127 From duke at openjdk.org Thu May 22 03:05:51 2025 From: duke at openjdk.org (Anjian-Wen) Date: Thu, 22 May 2025 03:05:51 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v14] In-Reply-To: References: <-Ig2biJjwMoR79hyYfSNxJLqavcMzVyLFZvnV0J_t90=.4eb702b5-2cbd-40c6-81fd-744a2fe98acf@github.com> Message-ID: On Wed, 21 May 2025 12:43:46 GMT, Hamlin Li wrote: > > The last change on the code after L_fill_elements make the byte store 1 by 1, it seems that 'true' or 'false' may not affect it when the count is less or equal 7? > > Ah, I see. > > > I think the above align section is like a fast path which can reduce instruction for the count is large than 7, so we may should check the result when count > 7 ? besides, I'm testing on that right now(delete the align tail part), we can find it out later > > Thanks, let's see the test result. This is the result without 'fast' path?it shows that when count is 255 and 63, the result change 52 -> 54 or 50 -> 53. This change will lead to a deterioration of the results, but the degree of deterioration is relatively small Benchmark (aligned) (size) Mode Cnt Score Error Units MemorySegmentFillUnsafe.panama true 1 avgt 30 23.865 ? 0.325 ns/op MemorySegmentFillUnsafe.panama true 2 avgt 30 20.676 ? 0.006 ns/op MemorySegmentFillUnsafe.panama true 3 avgt 30 20.850 ? 0.108 ns/op MemorySegmentFillUnsafe.panama true 4 avgt 30 19.655 ? 0.227 ns/op MemorySegmentFillUnsafe.panama true 5 avgt 30 21.168 ? 0.571 ns/op MemorySegmentFillUnsafe.panama true 6 avgt 30 20.840 ? 0.092 ns/op MemorySegmentFillUnsafe.panama true 7 avgt 30 21.399 ? 0.093 ns/op MemorySegmentFillUnsafe.panama true 8 avgt 30 25.165 ? 0.082 ns/op MemorySegmentFillUnsafe.panama true 15 avgt 30 30.974 ? 0.159 ns/op MemorySegmentFillUnsafe.panama true 16 avgt 30 26.323 ? 0.014 ns/op MemorySegmentFillUnsafe.panama true 63 avgt 30 48.771 ? 0.267 ns/op MemorySegmentFillUnsafe.panama true 64 avgt 30 49.583 ? 0.459 ns/op MemorySegmentFillUnsafe.panama true 255 avgt 30 63.744 ? 0.158 ns/op MemorySegmentFillUnsafe.panama true 256 avgt 30 62.142 ? 0.202 ns/op MemorySegmentFillUnsafe.panama false 1 avgt 30 23.184 ? 0.006 ns/op MemorySegmentFillUnsafe.panama false 2 avgt 30 20.673 ? 0.005 ns/op MemorySegmentFillUnsafe.panama false 3 avgt 30 20.708 ? 0.044 ns/op MemorySegmentFillUnsafe.panama false 4 avgt 30 19.438 ? 0.037 ns/op MemorySegmentFillUnsafe.panama false 5 avgt 30 20.857 ? 0.135 ns/op MemorySegmentFillUnsafe.panama false 6 avgt 30 20.966 ? 0.260 ns/op MemorySegmentFillUnsafe.panama false 7 avgt 30 21.327 ? 0.031 ns/op MemorySegmentFillUnsafe.panama false 8 avgt 30 25.064 ? 0.007 ns/op MemorySegmentFillUnsafe.panama false 15 avgt 30 30.884 ? 0.134 ns/op MemorySegmentFillUnsafe.panama false 16 avgt 30 26.942 ? 0.008 ns/op MemorySegmentFillUnsafe.panama false 63 avgt 30 48.329 ? 0.157 ns/op MemorySegmentFillUnsafe.panama false 64 avgt 30 48.248 ? 0.232 ns/op MemorySegmentFillUnsafe.panama false 255 avgt 30 62.617 ? 0.162 ns/op MemorySegmentFillUnsafe.panama false 256 avgt 30 62.126 ? 0.375 ns/op MemorySegmentFillUnsafe.unsafe true 1 avgt 30 21.298 ? 0.005 ns/op MemorySegmentFillUnsafe.unsafe true 2 avgt 30 23.749 ? 0.303 ns/op MemorySegmentFillUnsafe.unsafe true 3 avgt 30 22.139 ? 0.194 ns/op MemorySegmentFillUnsafe.unsafe true 4 avgt 30 23.803 ? 0.015 ns/op MemorySegmentFillUnsafe.unsafe true 5 avgt 30 23.810 ? 0.009 ns/op MemorySegmentFillUnsafe.unsafe true 6 avgt 30 24.698 ? 0.225 ns/op MemorySegmentFillUnsafe.unsafe true 7 avgt 30 24.652 ? 0.199 ns/op MemorySegmentFillUnsafe.unsafe true 8 avgt 30 34.475 ? 0.027 ns/op MemorySegmentFillUnsafe.unsafe true 15 avgt 30 36.370 ? 0.053 ns/op MemorySegmentFillUnsafe.unsafe true 16 avgt 30 34.472 ? 0.035 ns/op MemorySegmentFillUnsafe.unsafe true 63 avgt 30 38.239 ? 0.055 ns/op MemorySegmentFillUnsafe.unsafe true 64 avgt 30 39.208 ? 0.041 ns/op MemorySegmentFillUnsafe.unsafe true 255 avgt 30 53.486 ? 0.239 ns/op MemorySegmentFillUnsafe.unsafe true 256 avgt 30 51.814 ? 0.091 ns/op MemorySegmentFillUnsafe.unsafe false 1 avgt 30 21.512 ? 0.200 ns/op MemorySegmentFillUnsafe.unsafe false 2 avgt 30 23.700 ? 0.264 ns/op MemorySegmentFillUnsafe.unsafe false 3 avgt 30 22.036 ? 0.146 ns/op MemorySegmentFillUnsafe.unsafe false 4 avgt 30 23.809 ? 0.019 ns/op MemorySegmentFillUnsafe.unsafe false 5 avgt 30 23.596 ? 0.203 ns/op MemorySegmentFillUnsafe.unsafe false 6 avgt 30 24.735 ? 0.249 ns/op MemorySegmentFillUnsafe.unsafe false 7 avgt 30 24.438 ? 0.008 ns/op MemorySegmentFillUnsafe.unsafe false 8 avgt 30 31.339 ? 0.012 ns/op MemorySegmentFillUnsafe.unsafe false 15 avgt 30 37.700 ? 0.352 ns/op MemorySegmentFillUnsafe.unsafe false 16 avgt 30 36.885 ? 0.053 ns/op MemorySegmentFillUnsafe.unsafe false 63 avgt 30 39.048 ? 0.078 ns/op MemorySegmentFillUnsafe.unsafe false 64 avgt 30 39.057 ? 0.367 ns/op MemorySegmentFillUnsafe.unsafe false 255 avgt 30 54.066 ? 0.434 ns/op MemorySegmentFillUnsafe.unsafe false 256 avgt 30 53.200 ? 0.065 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2899763003 From kvn at openjdk.org Thu May 22 03:55:38 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 22 May 2025 03:55:38 GMT Subject: RFR: 8357514: Disable AOT caching for runtime stubs Message-ID: After [JDK-8354887](https://bugs.openjdk.org/browse/JDK-8354887) was integrated we hit strange failures which looks like memory stomps during our JCK testing of AOT new JEPs: # Internal Error (/workspace/open/src/hotspot/share/opto/regmask.hpp:222), pid=4186624, tid=4186658 # assert(_RM_UP[i] == 0) failed: _hwm too low: 5 regs at: 4 or # Internal Error (/workspace/open/src/hotspot/share/opto/type.cpp:996), pid=2832821, tid=2832868 # fatal error: meet not symmetric or other strange issues during C2 compilation After investigating (running tests in loop) I narrowed down the issue to AOT caching of C2 runtime stubs: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/output.cpp#L3491 After internal discussion we decided disable all runtime stubs caching. There is no guarantee that we may not have issues with C1 stubs too. I propose hard code AOTStubCaching flag to `false` value until the issue is solved. ------------- Commit messages: - 8357514: Disable AOT caching for runtime stubs Changes: https://git.openjdk.org/jdk/pull/25379/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25379&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357514 Stats: 5 lines in 2 files changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25379.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25379/head:pull/25379 PR: https://git.openjdk.org/jdk/pull/25379 From iveresov at openjdk.org Thu May 22 03:55:38 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 22 May 2025 03:55:38 GMT Subject: RFR: 8357514: Disable AOT caching for runtime stubs In-Reply-To: References: Message-ID: On Thu, 22 May 2025 03:46:39 GMT, Vladimir Kozlov wrote: > After [JDK-8354887](https://bugs.openjdk.org/browse/JDK-8354887) was integrated we hit strange failures which looks like memory stomps during our JCK testing of AOT new JEPs: > > # Internal Error (/workspace/open/src/hotspot/share/opto/regmask.hpp:222), pid=4186624, tid=4186658 > # assert(_RM_UP[i] == 0) failed: _hwm too low: 5 regs at: 4 > > or > > # Internal Error (/workspace/open/src/hotspot/share/opto/type.cpp:996), pid=2832821, tid=2832868 > # fatal error: meet not symmetric > > or other strange issues during C2 compilation > > After investigating (running tests in loop) I narrowed down the issue to AOT caching of C2 runtime stubs: > > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/output.cpp#L3491 > > After internal discussion we decided disable all runtime stubs caching. > There is no guarantee that we may not have issues with C1 stubs too. > > I propose hard code AOTStubCaching flag to `false` value until the issue is solved. Looks good and trivial Marked as reviewed by iveresov (Reviewer). ------------- Marked as reviewed by iveresov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25379#pullrequestreview-2859687567 PR Review: https://git.openjdk.org/jdk/pull/25379#pullrequestreview-2859687983 From kvn at openjdk.org Thu May 22 03:55:38 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 22 May 2025 03:55:38 GMT Subject: RFR: 8357514: Disable AOT caching for runtime stubs In-Reply-To: References: Message-ID: On Thu, 22 May 2025 03:46:39 GMT, Vladimir Kozlov wrote: > After [JDK-8354887](https://bugs.openjdk.org/browse/JDK-8354887) was integrated we hit strange failures which looks like memory stomps during our JCK testing of AOT new JEPs: > > # Internal Error (/workspace/open/src/hotspot/share/opto/regmask.hpp:222), pid=4186624, tid=4186658 > # assert(_RM_UP[i] == 0) failed: _hwm too low: 5 regs at: 4 > > or > > # Internal Error (/workspace/open/src/hotspot/share/opto/type.cpp:996), pid=2832821, tid=2832868 > # fatal error: meet not symmetric > > or other strange issues during C2 compilation > > After investigating (running tests in loop) I narrowed down the issue to AOT caching of C2 runtime stubs: > > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/output.cpp#L3491 > > After internal discussion we decided disable all runtime stubs caching. > There is no guarantee that we may not have issues with C1 stubs too. > > I propose hard code AOTStubCaching flag to `false` value until the issue is solved. @ashu-mehra please look ------------- PR Comment: https://git.openjdk.org/jdk/pull/25379#issuecomment-2899816129 From kvn at openjdk.org Thu May 22 03:59:50 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 22 May 2025 03:59:50 GMT Subject: RFR: 8357514: Disable AOT caching for runtime stubs In-Reply-To: References: Message-ID: <1plFos0_ZMV-0VYGRrIbCzFZ3b4bXe3Md9so-uWUoeA=.8794d55b-5cf9-4b43-8191-fb65ce742cdc@github.com> On Thu, 22 May 2025 03:52:33 GMT, Igor Veresov wrote: >> After [JDK-8354887](https://bugs.openjdk.org/browse/JDK-8354887) was integrated we hit strange failures which looks like memory stomps during our JCK testing of AOT new JEPs: >> >> # Internal Error (/workspace/open/src/hotspot/share/opto/regmask.hpp:222), pid=4186624, tid=4186658 >> # assert(_RM_UP[i] == 0) failed: _hwm too low: 5 regs at: 4 >> >> or >> >> # Internal Error (/workspace/open/src/hotspot/share/opto/type.cpp:996), pid=2832821, tid=2832868 >> # fatal error: meet not symmetric >> >> or other strange issues during C2 compilation >> >> After investigating (running tests in loop) I narrowed down the issue to AOT caching of C2 runtime stubs: >> >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/output.cpp#L3491 >> >> After internal discussion we decided disable all runtime stubs caching. >> There is no guarantee that we may not have issues with C1 stubs too. >> >> I propose hard code AOTStubCaching flag to `false` value until the issue is solved. > > Marked as reviewed by iveresov (Reviewer). Thank you, @veresov ------------- PR Comment: https://git.openjdk.org/jdk/pull/25379#issuecomment-2899825901 From kvn at openjdk.org Thu May 22 03:59:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 22 May 2025 03:59:51 GMT Subject: RFR: 8357514: Disable AOT caching for runtime stubs In-Reply-To: References: Message-ID: On Thu, 22 May 2025 03:46:39 GMT, Vladimir Kozlov wrote: > After [JDK-8354887](https://bugs.openjdk.org/browse/JDK-8354887) was integrated we hit strange failures which looks like memory stomps during our JCK testing of AOT new JEPs: > > # Internal Error (/workspace/open/src/hotspot/share/opto/regmask.hpp:222), pid=4186624, tid=4186658 > # assert(_RM_UP[i] == 0) failed: _hwm too low: 5 regs at: 4 > > or > > # Internal Error (/workspace/open/src/hotspot/share/opto/type.cpp:996), pid=2832821, tid=2832868 > # fatal error: meet not symmetric > > or other strange issues during C2 compilation > > After investigating (running tests in loop) I narrowed down the issue to AOT caching of C2 runtime stubs: > > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/output.cpp#L3491 > > After internal discussion we decided disable all runtime stubs caching. > There is no guarantee that we may not have issues with C1 stubs too. > > I propose hard code AOTStubCaching flag to `false` value until the issue is solved. I will wait GHA and mach5 testing finished. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25379#issuecomment-2899826456 From iklam at openjdk.org Thu May 22 04:10:59 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 22 May 2025 04:10:59 GMT Subject: RFR: 8357514: Disable AOT caching for runtime stubs In-Reply-To: References: Message-ID: On Thu, 22 May 2025 03:46:39 GMT, Vladimir Kozlov wrote: > After [JDK-8354887](https://bugs.openjdk.org/browse/JDK-8354887) was integrated we hit strange failures which looks like memory stomps during our JCK testing of AOT new JEPs: > > # Internal Error (/workspace/open/src/hotspot/share/opto/regmask.hpp:222), pid=4186624, tid=4186658 > # assert(_RM_UP[i] == 0) failed: _hwm too low: 5 regs at: 4 > > or > > # Internal Error (/workspace/open/src/hotspot/share/opto/type.cpp:996), pid=2832821, tid=2832868 > # fatal error: meet not symmetric > > or other strange issues during C2 compilation > > After investigating (running tests in loop) I narrowed down the issue to AOT caching of C2 runtime stubs: > > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/output.cpp#L3491 > > After internal discussion we decided disable all runtime stubs caching. > There is no guarantee that we may not have issues with C1 stubs too. > > I propose hard code AOTStubCaching flag to `false` value until the issue is solved. test/hotspot/jtreg/runtime/cds/appcds/aotCode/AOTCodeFlags.java line 54: > 52: Tester t = new Tester(); > 53: // Run only 2 modes (0 - no AOT code, 1 - AOT adapters) until stubs caching is restored > 54: for (int mode = 0; mode < 2; mode++) { For tracking purposes, please add JDK-8357398 to the comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25379#discussion_r2101581623 From duke at openjdk.org Thu May 22 04:39:43 2025 From: duke at openjdk.org (Anjian-Wen) Date: Thu, 22 May 2025 04:39:43 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v16] In-Reply-To: References: Message-ID: > From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time > > on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below > > before the patch > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op > MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op > MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op > MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op > MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op > MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op > MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op > MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op > MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op > MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op > MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op > MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op > MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op > MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op > MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op > MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op > MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op > MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op > MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/op > MemorySegmentZeroUnsafe.panama false 63 avgt 30... Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: delete the path for code size and test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23890/files - new: https://git.openjdk.org/jdk/pull/23890/files/4021ff5e..fe8258ee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=14-15 Stats: 50 lines in 1 file changed: 0 ins; 26 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/23890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23890/head:pull/23890 PR: https://git.openjdk.org/jdk/pull/23890 From duke at openjdk.org Thu May 22 04:39:43 2025 From: duke at openjdk.org (Anjian-Wen) Date: Thu, 22 May 2025 04:39:43 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v14] In-Reply-To: References: <-Ig2biJjwMoR79hyYfSNxJLqavcMzVyLFZvnV0J_t90=.4eb702b5-2cbd-40c6-81fd-744a2fe98acf@github.com> Message-ID: On Thu, 22 May 2025 03:03:13 GMT, Anjian-Wen wrote: > > The last change on the code after L_fill_elements make the byte store 1 by 1, it seems that 'true' or 'false' may not affect it when the count is less or equal 7? > > Ah, I see. > > > I think the above align section is like a fast path which can reduce instruction for the count is large than 7, so we may should check the result when count > 7 ? besides, I'm testing on that right now(delete the align tail part), we can find it out later > > Thanks, let's see the test result. @Hamlin-Li The original branch rarely walks, which is almost never tested, and the performance impact of the ?fast? path is relatively small. I think you are right, it is suitable to be deleted ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2899870707 From kvn at openjdk.org Thu May 22 04:56:37 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 22 May 2025 04:56:37 GMT Subject: RFR: 8357514: Disable AOT caching for runtime stubs [v2] In-Reply-To: References: Message-ID: > After [JDK-8354887](https://bugs.openjdk.org/browse/JDK-8354887) was integrated we hit strange failures which looks like memory stomps during our JCK testing of AOT new JEPs: > > # Internal Error (/workspace/open/src/hotspot/share/opto/regmask.hpp:222), pid=4186624, tid=4186658 > # assert(_RM_UP[i] == 0) failed: _hwm too low: 5 regs at: 4 > > or > > # Internal Error (/workspace/open/src/hotspot/share/opto/type.cpp:996), pid=2832821, tid=2832868 > # fatal error: meet not symmetric > > or other strange issues during C2 compilation > > After investigating (running tests in loop) I narrowed down the issue to AOT caching of C2 runtime stubs: > > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/output.cpp#L3491 > > After internal discussion we decided disable all runtime stubs caching. > There is no guarantee that we may not have issues with C1 stubs too. > > I propose hard code AOTStubCaching flag to `false` value until the issue is solved. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Address comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25379/files - new: https://git.openjdk.org/jdk/pull/25379/files/a375393d..e238ffb1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25379&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25379&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25379.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25379/head:pull/25379 PR: https://git.openjdk.org/jdk/pull/25379 From kvn at openjdk.org Thu May 22 04:56:37 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 22 May 2025 04:56:37 GMT Subject: RFR: 8357514: Disable AOT caching for runtime stubs [v2] In-Reply-To: References: Message-ID: <0bfqiRpS0jlL9GNPLN_ai8fsH8maHKyBQXWMQSyv4dg=.d1c1065f-b8dc-4a14-80e0-9db84543c3b2@github.com> On Thu, 22 May 2025 04:07:56 GMT, Ioi Lam wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Address comment > > test/hotspot/jtreg/runtime/cds/appcds/aotCode/AOTCodeFlags.java line 54: > >> 52: Tester t = new Tester(); >> 53: // Run only 2 modes (0 - no AOT code, 1 - AOT adapters) until stubs caching is restored >> 54: for (int mode = 0; mode < 2; mode++) { > > For tracking purposes, please add JDK-8357398 to the comment. done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25379#discussion_r2101620660 From iveresov at openjdk.org Thu May 22 05:29:59 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 22 May 2025 05:29:59 GMT Subject: RFR: 8357514: Disable AOT caching for runtime stubs [v2] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 04:56:37 GMT, Vladimir Kozlov wrote: >> After [JDK-8354887](https://bugs.openjdk.org/browse/JDK-8354887) was integrated we hit strange failures which looks like memory stomps during our JCK testing of AOT new JEPs: >> >> # Internal Error (/workspace/open/src/hotspot/share/opto/regmask.hpp:222), pid=4186624, tid=4186658 >> # assert(_RM_UP[i] == 0) failed: _hwm too low: 5 regs at: 4 >> >> or >> >> # Internal Error (/workspace/open/src/hotspot/share/opto/type.cpp:996), pid=2832821, tid=2832868 >> # fatal error: meet not symmetric >> >> or other strange issues during C2 compilation >> >> After investigating (running tests in loop) I narrowed down the issue to AOT caching of C2 runtime stubs: >> >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/output.cpp#L3491 >> >> After internal discussion we decided disable all runtime stubs caching. >> There is no guarantee that we may not have issues with C1 stubs too. >> >> I propose hard code AOTStubCaching flag to `false` value until the issue is solved. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Address comment Marked as reviewed by iveresov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25379#pullrequestreview-2859816957 From kvn at openjdk.org Thu May 22 05:38:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 22 May 2025 05:38:52 GMT Subject: RFR: 8357514: Disable AOT caching for runtime stubs [v2] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 05:27:07 GMT, Igor Veresov wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Address comment > > Marked as reviewed by iveresov (Reviewer). Thank you @veresov and @iklam for reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/25379#issuecomment-2899967080 From iklam at openjdk.org Thu May 22 05:45:53 2025 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 22 May 2025 05:45:53 GMT Subject: RFR: 8357514: Disable AOT caching for runtime stubs [v2] In-Reply-To: References: Message-ID: <3oKniHSwmo2-t6ujA2CFY1hzck0DcgxpXieoZL2Edng=.fb3c5d2e-69ec-401b-8820-da4c93d023f8@github.com> On Thu, 22 May 2025 04:56:37 GMT, Vladimir Kozlov wrote: >> After [JDK-8354887](https://bugs.openjdk.org/browse/JDK-8354887) was integrated we hit strange failures which looks like memory stomps during our JCK testing of AOT new JEPs: >> >> # Internal Error (/workspace/open/src/hotspot/share/opto/regmask.hpp:222), pid=4186624, tid=4186658 >> # assert(_RM_UP[i] == 0) failed: _hwm too low: 5 regs at: 4 >> >> or >> >> # Internal Error (/workspace/open/src/hotspot/share/opto/type.cpp:996), pid=2832821, tid=2832868 >> # fatal error: meet not symmetric >> >> or other strange issues during C2 compilation >> >> After investigating (running tests in loop) I narrowed down the issue to AOT caching of C2 runtime stubs: >> >> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/output.cpp#L3491 >> >> After internal discussion we decided disable all runtime stubs caching. >> There is no guarantee that we may not have issues with C1 stubs too. >> >> I propose hard code AOTStubCaching flag to `false` value until the issue is solved. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Address comment Marked as reviewed by iklam (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25379#pullrequestreview-2859853854 From xgong at openjdk.org Thu May 22 05:48:57 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 22 May 2025 05:48:57 GMT Subject: RFR: 8355094: Performance drop in auto-vectorized kernel due to split store [v2] In-Reply-To: References: Message-ID: On Mon, 19 May 2025 09:23:28 GMT, Tobias Hartmann wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/superword.cpp >> >> Co-authored-by: Manuel H?ssig > >> Impressive analysis, Emanuel! Very deep, thorough, and insightful. > > +1 to this. Great work, Emanuel! The fix looks good to me. > @TobiHartmann Thank you for the review :) > > @theRealAph @XiaohongGong Do you have any idea about the somewhat confusing behavior of aarch64 in these benchmarks? Hi @eme64 , to be honest, I'm not quite sure about the unaligned memory access behavior on AArch64. I tried to make it clear by reading some ARM docs. But unfortunately, the message that I got most is it's HW implementation defined behavior. Some AArch64 micro-architectures prefer aligning memory for loads instead of stores to obtain better performance, but others maybe on the contrary. That's the reality. My colleague provided to me several patches in go project which also use an option to prefer load alignment or store for a memory move library optimization [1][2][3] on AArch64. Different AArch64 micro-architecture can choose the optimal alignment solution based on the performance results. And it chooses to align loads for Neoverse CPUs by default. Hope this could help you. I think the basic ideal is align with what you did in this PR. Thanks! [1] https://go-review.googlesource.com/c/go/+/243357 [2] https://github.com/golang/go/blob/7f806c1052aa919c1c195a5b2223626beab2495c/src/runtime/cpuflags_arm64.go#L11 [3] https://go-review.googlesource.com/c/go/+/664038 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25065#issuecomment-2899989790 From epeter at openjdk.org Thu May 22 05:49:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 May 2025 05:49:51 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References: <_Lde8-U9gDYDIMGDWg5aQKrWTq2suhAmkz97-xy54QU=.b73a1b9b-64dc-49f7-95d5-d00bd98a67eb@github.com> Message-ID: On Wed, 21 May 2025 14:00:38 GMT, Zdenek Zambersky wrote: >> @zzambers Are you still working on this? > > @eme64 Sorry for late response, > > I have updated change set to use approach with `-XX:+IgnoreUnrecognizedVMOptions`. Tested locally with `hotspot_compiler` tests. Rebased to current master. > > `LateInlinePrinting.java` test keeps using `@requires`, because output printed by ClientVM is different (with unrecognized options ignored). @zzambers Can you please update the PR description to match your current changes? And then you should copy that PR description to JIRA :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2899992093 From epeter at openjdk.org Thu May 22 05:56:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 May 2025 05:56:56 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option [v2] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 13:57:07 GMT, Zdenek Zambersky wrote: >> This adds `@requires vm.compiler2.enabled` to tests, which fail with `Unrecognized VM option` on client VM. > > Zdenek Zambersky has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: > > Fix of compiler tests for client VM I quickly looked through the changes, and I think it looks ok. It's a little painful to have to add it everywhere though... It also increases the risk of misspelled flags, or using removed flags etc. But I don't have a great alternative solution. I'll run some internal testing now, please ping me again in 24h :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2900001590 From epeter at openjdk.org Thu May 22 06:03:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 May 2025 06:03:55 GMT Subject: RFR: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry In-Reply-To: References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: On Wed, 21 May 2025 20:02:27 GMT, Sandhya Viswanathan wrote: >>> @jatin-bhateja I'll run some internal testing, please ping me in 24h for results! :) >> >> Please use the latest version > >> > @jatin-bhateja @sviswa7 Can you explain the impact of the `EVEX_HVM`, `EVEX_QVM` etc, and what is the impact if we get them wrong? Performance? Wrong results? How can we test that they are correct? >> >> @eme64 In EVEX the displacement for memory in the addressing mode is encoded using compressed disp8 encoding scheme. The EVEX_FVM, EVEX_HVM, EVEX_QVM etc denote tuple type and are used to determine the scaling factor for displacement. Please see section "2.7.5 Compressed Displacement (disp8*N) Support in EVEX" in [Intel SDM Volume 2](https://cdrdv2.intel.com/v1/dl/getContent/671110). So to answer your question, if the tuple type is incorrect we will see wrong results if the displacement is non zero. > > For testing, the best way would be to create a SIMD instruction encoding test tool on similar lines as https://github.com/openjdk/jdk/commit/52d752c43b3a9935ea97051c39adf381084035cc in a separate future PR. @sviswa7 Thanks for the explanations! Could we also test it with Java code that generates all sorts of address shapes, e.g. with various offsets and scaling factors? I'll re-run testing now, just to be sure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25021#issuecomment-2900013532 From kvn at openjdk.org Thu May 22 06:12:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 22 May 2025 06:12:55 GMT Subject: Integrated: 8357514: Disable AOT caching for runtime stubs In-Reply-To: References: Message-ID: On Thu, 22 May 2025 03:46:39 GMT, Vladimir Kozlov wrote: > After [JDK-8354887](https://bugs.openjdk.org/browse/JDK-8354887) was integrated we hit strange failures which looks like memory stomps during our JCK testing of AOT new JEPs: > > # Internal Error (/workspace/open/src/hotspot/share/opto/regmask.hpp:222), pid=4186624, tid=4186658 > # assert(_RM_UP[i] == 0) failed: _hwm too low: 5 regs at: 4 > > or > > # Internal Error (/workspace/open/src/hotspot/share/opto/type.cpp:996), pid=2832821, tid=2832868 > # fatal error: meet not symmetric > > or other strange issues during C2 compilation > > After investigating (running tests in loop) I narrowed down the issue to AOT caching of C2 runtime stubs: > > https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/output.cpp#L3491 > > After internal discussion we decided disable all runtime stubs caching. > There is no guarantee that we may not have issues with C1 stubs too. > > I propose hard code AOTStubCaching flag to `false` value until the issue is solved. This pull request has now been integrated. Changeset: 8184ce39 Author: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/8184ce39a8a732352ee841fed09cae905d27643c Stats: 5 lines in 2 files changed: 4 ins; 0 del; 1 mod 8357514: Disable AOT caching for runtime stubs Reviewed-by: iveresov, iklam ------------- PR: https://git.openjdk.org/jdk/pull/25379 From epeter at openjdk.org Thu May 22 06:19:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 May 2025 06:19:55 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v4] In-Reply-To: References: <5OezxGXLCvvauaNiX7FkOacjbwvvB-sc3k8MdEjKmwo=.8d69862a-feee-4dbe-bcf9-b53620f823f7@github.com> Message-ID: <2gqcpkBJkkrb21MZlPPHMWqcVWRNEz0KXdolQxd8CkI=.80a3abdb-7a11-4aec-b307-66874de81757@github.com> On Wed, 21 May 2025 16:44:35 GMT, Bhavana Kilambi wrote: >> @Bhavana-Kilambi Ok, yes, 20min is a bit excessive ? >> >> Generally, we should periodically run all vector tests with various `MaxVectorSize` settings. But doing that all the time is often too time consuming. For some specific tests, it can make sense though to iterate over multiple sizes. >> >> I wonder if you could also reduce the runtime of the test in other ways? Maybe reduce the warmup? It seems a bit excessive to do `10000` warmup iterations, which each execute a loop with many iterations themselves. > > Hi @eme64 I removed the `@Warmup` entirely and the test does pass on aarch64. Although I am a bit afraid to fully remove it as it could sometimes lead to the loop not being warm enough for c2 vectorization to kick in. I haven't tried with different values of the warmup iterations though. Do you think it's ok to remove it entirely? @Bhavana-Kilambi The TestFramework actually forces C2 compilation: - runs warmup iterations, maybe C2 triggers automatically because there are enough iterations. - Once warmup is over, the TestFramework checks if the method is already compiled, if not, it enqueues it. - In the end, we know it is C2 compiled, which gives us the C2 IR we can match with. In my experience, having low warmup count works in most cases. Except when you need profiling data. If you have zero warup, we basically have compilation with `-Xcomp`. So it really depends on your specific case. In general, I would avoid doing an `Xcomp` compilation / zero warmup, because then we do not test normal compilation with profiling. And compilation with profiling is more important I think. But in cases where you have a large loop in the test method, we would trigger OSR and normal compilation with profiling rather soon anyway. So lowering the warmup is ok. How many loop iterations do we need for OSR? `product(intx, Tier4BackEdgeThreshold, 40000`. We could round that up to `100_000`, just to be sure. With `LEN = 2048`, you would thus only need about `50` invocations of the tests during warmup to reach C2 compilation. Hence, the current `@Warmup(10000)` is much too high, I think. You could cut down the runtime by about a factor of `100` here, if my math is correct :exploding_head: What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25096#issuecomment-2900043919 From epeter at openjdk.org Thu May 22 06:29:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 May 2025 06:29:56 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> Message-ID: On Mon, 24 Mar 2025 11:41:46 GMT, Emanuel Peter wrote: >>> @kuaiwei I have not yet had the time to read through the PR, but I would like to talk about `LoadNode::Ideal`. The idea with `Ideal` in general, is that you replace one node with another. After `Ideal` returns, all usages of the old node now take the new node instead. >>> >>> You copied the structure from my MergeStores implementation in `StoreNode::Idea`. There it made sense to replace `StoreB` nodes that have a memory output with `LoadI` nodes, which also have memory output. >>> >>> But it does not make sense to replace a `LoadB` that has a byte/int output with a `LoadL` that has a long output for example. >>> >>> I think your implementation should go into `OrINode`, and match the expression up from there. Because we want to replace the old `OrI` with the new `LoadL`. >>> >>> Another question: Do you have some tests where some of the nodes in the `load/shift/or` expression have other uses? Imagine this: >>> >>> ``` >>> l0 = a[0]; >>> l1 = a[1]; >>> l2 = a[2]; >>> l3 = a[3]; >>> l = ; >>> now use l1 for something else as well >>> ``` >>> >>> What happens now? Do you check that we only use the old `LoadB` in the expression we are replacing? >> >> Hi @eme64 , I understand your concern. In this patch , I check the usage of all `loadB` nodes and only allow they have only single usage into `OrNode`, I also check the `OrNode` as well. So I think it will not cause the trouble. >> >> >> l0 = a[0]; >> l1 = a[1]; >> l2 = a[2]; >> l3 = a[3]; >> l = ; >> now use l1 for something else as well >> >> For this case, because l1 has other usage, all these loads will not be merged. >> >> In my previous patch, I tried to extract value from merged `LoadNode` if origin `loadB` has other usage, such as used by uncommon trap. You can find them in https://github.com/openjdk/jdk/pull/24023/commits/b621db1cf0c17885516254a2af4b5df43e06c098 and search MergePrimitiveLoads::extract_value_for_uncommon_trap . But in my test with jtreg tier1, it never hit a case which replaced `LoadB` used by uncommon trap, I think range check smearing remove all the uncommon trap usages. So I revert it to make code simple. In my opinion, the extract_value function can be used as a general solution for other usages. But we may need a cost model to evaluate cost of new instructions which used for extracting and benefit of merged load. To simplify, I choose to check usage strictly. > > @kuaiwei Thanks for your response! > > What about these two things I brought up? > >> Do you have some tests where some of the nodes in the load/shift/or expression have other uses? > > It would be good to have these tests, even if we think your code is correct. It is good to verify it with tests. And someone in the future might break it. > >> I think your implementation should go into OrINode, and match the expression up from there. Because we want to replace the old OrI with the new LoadL. > > This is really the pattern we use in `Idea`. We replace the node at the bottom of an expression with a new node (or new expression). > @eme64 Can we complete the integration in JDK25 before June 5th? I've been waiting on a ping from @kuaiwei , since there were still new changes added last week, and I don't know if there are more coming. I'm a little slow with reviewing, as I also have a lot of other work to do and PRs to review. Feel free to find other reviewers to help speed up the process ;) Being so close to RDP1 (June 5th) usually makes us a little hesitant to integrate larger features where there is often a bug tail. I would feel better if we could integrate it in early JDK26, and then we have more time to fix the follow-up bugs during JDK26 development. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2900063065 From epeter at openjdk.org Thu May 22 06:35:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 May 2025 06:35:58 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v14] In-Reply-To: <3Tq3ef_yFX56l4-rKyLFx_r7iJfrHkOWJ9K65beGmV8=.5b8d258f-1d40-47d4-9631-ad02e8dc0ecd@github.com> References: <9lqWA3ERAtAuuUDJNS0gIQDtN-RTOH_C-sxC_4ALH5g=.46c2438c-bdf6-43e1-847d-56c6c51e5454@github.com> <3Tq3ef_yFX56l4-rKyLFx_r7iJfrHkOWJ9K65beGmV8=.5b8d258f-1d40-47d4-9631-ad02e8dc0ecd@github.com> Message-ID: On Wed, 21 May 2025 23:49:46 GMT, Srinivas Vamsi Parasa wrote: >> @vamsi-parasa @sviswa7 Did you already test this with `sde` and the `-future` flag? Once this is fully reviewed I can also run our internal testing, just let me know when you are ready :) > > Hello Emanuel (@eme64 ), > > Could you please run the tests and let me know? > > Thanks, > Vamsi @vamsi-parasa Testing launched, ping me again in 24h :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2900074663 From epeter at openjdk.org Thu May 22 06:45:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 May 2025 06:45:01 GMT Subject: RFR: 8355094: Performance drop in auto-vectorized kernel due to split store [v2] In-Reply-To: References: Message-ID: <1po0_2e_mb-DbLCwPoy4ODxxQNrh4JlaKgSaQ5_HT0o=.161ef677-4c1b-45f5-8194-b5548f45d4d3@github.com> On Thu, 22 May 2025 05:46:02 GMT, Xiaohong Gong wrote: >>> Impressive analysis, Emanuel! Very deep, thorough, and insightful. >> >> +1 to this. Great work, Emanuel! The fix looks good to me. > >> @TobiHartmann Thank you for the review :) >> >> @theRealAph @XiaohongGong Do you have any idea about the somewhat confusing behavior of aarch64 in these benchmarks? > > Hi @eme64 , to be honest, I'm not quite sure about the unaligned memory access behavior on AArch64. I tried to make it clear by reading some ARM docs. But unfortunately, the message that I got most is it's HW implementation defined behavior. Some AArch64 micro-architectures prefer aligning memory for loads instead of stores to obtain better performance, but others maybe on the contrary. That's the reality. > > My colleague provided to me several patches in go project which also use an option to prefer load alignment or store for a memory move library optimization [1][2][3] on AArch64. Different AArch64 micro-architecture can choose the optimal alignment solution based on the performance results. And it chooses to align loads for Neoverse CPUs by default. Hope this could help you. I think the basic ideal is align with what you did in this PR. Thanks! > > [1] https://go-review.googlesource.com/c/go/+/243357 > [2] https://github.com/golang/go/blob/7f806c1052aa919c1c195a5b2223626beab2495c/src/runtime/cpuflags_arm64.go#L11 > [3] https://go-review.googlesource.com/c/go/+/664038 @XiaohongGong Thanks a lot for taking the time to respond! That is very fascinating, and reassuring. Seems I'm not the only one seeing these kinds of results :) I suppose we could add a similar flag, to target the `aarch64` machines where load alignment is preferred. But from what I see the wins would be marginal, and I don't know `aarch64` enough to figure out which implementations would benefit. But if anyone wants to take this on, I'd be happy to review the PR ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25065#issuecomment-2900093404 From kwei at openjdk.org Thu May 22 06:54:01 2025 From: kwei at openjdk.org (Kuai Wei) Date: Thu, 22 May 2025 06:54:01 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> Message-ID: On Thu, 22 May 2025 06:26:57 GMT, Emanuel Peter wrote: >> @kuaiwei Thanks for your response! >> >> What about these two things I brought up? >> >>> Do you have some tests where some of the nodes in the load/shift/or expression have other uses? >> >> It would be good to have these tests, even if we think your code is correct. It is good to verify it with tests. And someone in the future might break it. >> >>> I think your implementation should go into OrINode, and match the expression up from there. Because we want to replace the old OrI with the new LoadL. >> >> This is really the pattern we use in `Idea`. We replace the node at the bottom of an expression with a new node (or new expression). > >> @eme64 Can we complete the integration in JDK25 before June 5th? > > I've been waiting on a ping from @kuaiwei , since there were still new changes added last week, and I don't know if there are more coming. I'm a little slow with reviewing, as I also have a lot of other work to do and PRs to review. Feel free to find other reviewers to help speed up the process ;) > > Being so close to RDP1 (June 5th) usually makes us a little hesitant to integrate larger features where there is often a bug tail. I would feel better if we could integrate it in early JDK26, and then we have more time to fix the follow-up bugs during JDK26 development. @eme64 @wenshao I have a little change to this PR. I will send it soon. Thanks for your patience. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2900113150 From aboldtch at openjdk.org Thu May 22 06:54:56 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 22 May 2025 06:54:56 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill In-Reply-To: References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Wed, 21 May 2025 23:33:47 GMT, Sandhya Viswanathan wrote: > Intel APX add additional GPR registers (R16 - R31). Our understanding is that these also need to be saved and restored as part of ZRuntimeCallSpill. Is that correct? ZRuntimeCallSpill is used when doing calls into libjvm from contexts where we do not track the liveness of the registers. So all caller saved registers must be saved and restored. If all APX registers are caller saved, then yes this is correct. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25351#issuecomment-2900114700 From rehn at openjdk.org Thu May 22 06:56:36 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 22 May 2025 06:56:36 GMT Subject: RFR: 8357056: RISC-V: Asm fixes - load/store width [v3] In-Reply-To: References: Message-ID: <0nyEXe_gpWn158OHmSJMEAPk_MMn9EXLctWeipKZQd4=.b89cbb07-50a5-4466-862e-951d7a6a9059@github.com> > Hi, please consider. > > While working on https://github.com/openjdk/jdk/pull/25252, I notice: > - Major op code was just repeat > - Width coded in binary > - Stores have mixed up rs1 and rs2 > - Bonus, fsd used a macro for no reason > > I think this improves readability. > > Tested tier1 > > Thanks, Robbin Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Rd/Rs->Rs2/Rs1 - Merge branch 'master' into asm_fixes - Fixed flh/flw/fld - Merge branch 'master' into asm_fixes - Fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25253/files - new: https://git.openjdk.org/jdk/pull/25253/files/2d658948..aec6511a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25253&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25253&range=01-02 Stats: 40289 lines in 432 files changed: 27716 ins; 9995 del; 2578 mod Patch: https://git.openjdk.org/jdk/pull/25253.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25253/head:pull/25253 PR: https://git.openjdk.org/jdk/pull/25253 From epeter at openjdk.org Thu May 22 07:05:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 May 2025 07:05:58 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> Message-ID: <9ABhENoZtR76wmsgRmzeEceDvCvoflfCcbDbK8H2rso=.e351f63f-1331-4e2e-8a02-763a8c0c4f70@github.com> On Thu, 22 May 2025 06:51:02 GMT, Kuai Wei wrote: >>> @eme64 Can we complete the integration in JDK25 before June 5th? >> >> I've been waiting on a ping from @kuaiwei , since there were still new changes added last week, and I don't know if there are more coming. I'm a little slow with reviewing, as I also have a lot of other work to do and PRs to review. Feel free to find other reviewers to help speed up the process ;) >> >> Being so close to RDP1 (June 5th) usually makes us a little hesitant to integrate larger features where there is often a bug tail. I would feel better if we could integrate it in early JDK26, and then we have more time to fix the follow-up bugs during JDK26 development. > > @eme64 @wenshao I have a little change to this PR. I will send it soon. Thanks for your patience. @kuaiwei I'm not in a rush with this one. I'd rather we have a good design and be reasonably sure that it is correct, rather than rush it now and having to do extra cycles fixing things later ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2900141305 From mli at openjdk.org Thu May 22 07:12:52 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 22 May 2025 07:12:52 GMT Subject: RFR: 8357056: RISC-V: Asm fixes - load/store width [v3] In-Reply-To: <0nyEXe_gpWn158OHmSJMEAPk_MMn9EXLctWeipKZQd4=.b89cbb07-50a5-4466-862e-951d7a6a9059@github.com> References: <0nyEXe_gpWn158OHmSJMEAPk_MMn9EXLctWeipKZQd4=.b89cbb07-50a5-4466-862e-951d7a6a9059@github.com> Message-ID: <-pDBz-72cFZEUgkA92BL9cu5us7zSgHZJJZfYL4s2WM=.01bc67a1-d9e7-480b-82f7-2c432e02d6ad@github.com> On Thu, 22 May 2025 06:56:36 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> While working on https://github.com/openjdk/jdk/pull/25252, I notice: >> - Major op code was just repeat >> - Width coded in binary >> - Stores have mixed up rs1 and rs2 >> - Bonus, fsd used a macro for no reason >> >> I think this improves readability. >> >> Tested tier1 >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Rd/Rs->Rs2/Rs1 > - Merge branch 'master' into asm_fixes > - Fixed flh/flw/fld > - Merge branch 'master' into asm_fixes > - Fixes I think the original rd/rs is easier to understand the meaning of the parameters. Although the changed one is consistent with the specification, it seems to me that it is more error-prone to use it, maybe it's just my feeling? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25253#issuecomment-2900157056 From rcastanedalo at openjdk.org Thu May 22 07:20:04 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 22 May 2025 07:20:04 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v53] In-Reply-To: <2WOGu4F76zeKo3VTQe4GNuA1rTZKVbyyhUEcAhrmjt4=.66c348ca-0016-4c9b-8af4-3007bde64c71@github.com> References: <2WOGu4F76zeKo3VTQe4GNuA1rTZKVbyyhUEcAhrmjt4=.66c348ca-0016-4c9b-8af4-3007bde64c71@github.com> Message-ID: On Wed, 21 May 2025 13:42:19 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > improve documentation in TestTemplate.java test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 85: > 83: comp.compile(); > 84: > 85: // Object ret = p.xyz.InnerTest1.main(); This comment is incomplete, I'd suggest removing it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2100381117 From rcastanedalo at openjdk.org Thu May 22 07:20:10 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 22 May 2025 07:20:10 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v55] In-Reply-To: <0jRlRRNQVdveMPwuxSXuX69ZZk5w5BpHeNtbMP03C2k=.b0830b58-f72e-43c0-bdaa-890748f962e7@github.com> References: <0jRlRRNQVdveMPwuxSXuX69ZZk5w5BpHeNtbMP03C2k=.b0830b58-f72e-43c0-bdaa-890748f962e7@github.com> Message-ID: <1w-eYiC9yF0tVUdC57G0Hoe7Xol5JkPPR_V3cT2I8ww=.659b7797-5254-4f39-b092-ec279a4ef57d@github.com> On Wed, 21 May 2025 15:24:18 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > move order in tutorial test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 485: > 483: // can access these fields and registers again with "dataNames()". > 484: // > 485: // Here a few use-cases: Suggestion: // Here are a few use-cases: test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 493: > 491: // an inner Template could read from or even write to. You can "addDataName" the > 492: // variable, and the inner Template can then find that variable in "dataNames()". > 493: // If the inner Template wants to find a random field or varialbe, it may sample Suggestion: // If the inner Template wants to find a random field or variable, it may sample test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 494: > 492: // variable, and the inner Template can then find that variable in "dataNames()". > 493: // If the inner Template wants to find a random field or varialbe, it may sample > 494: // from "dataNodes()", and with some probability, it would sample the your variable. Suggestion: // from "dataNodes()", and with some probability, it would sample your variable. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 551: > 549: > 550: public class InnerTest7 { > 551: // Let us define a some fields. Suggestion: // Let us define some fields. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 764: > 762: } > 763: > 764: // Even simpler: count the available variable and return the count immediately. Suggestion: // Even simpler: count the available variables and return the count immediately. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 955: > 953: // we want to store to a field or variable. We have to make sure that we > 954: // do not generate code that tries to store to a final field or variable. > 955: // In other cases, we are only want to load, and we do not care if the Suggestion: // In other cases, we only want to load, and we do not care if the test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 961: > 959: // is irrelevant, but with instances of Objects, this becomes relevant. > 960: // We may want to load an object of any field or variable of a certain > 961: // class, or any subclass. When a value of a given class, we can only Suggestion: // class, or any subclass. When a value is of a given class, we can only test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 997: > 995: // We only load from the field, so we do not need a mutable one, > 996: // we can load from final and non-final fields. > 997: // We want to find any field of which we can read the value and store Suggestion: // We want to find any field from which we can read the value and store test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 1049: > 1047: myClassList.stream().map(c -> templateLoad.asToken(c)).toList(), > 1048: """ > 1049: // Now lets mutate some fields. Suggestion: // Now let us mutate some fields. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 1074: > 1072: // We now introduce another set of "Names", the "StructuralNames". They are > 1073: // useful for modeling method names an class names, and possibly more. Anything > 1074: // that has a fixed name in the Java code, for which mutability is inapplicalbe. Suggestion: // that has a fixed name in the Java code, for which mutability is inapplicable. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 1086: > 1084: // caught. > 1085: // > 1086: // Let us show an examples with Method names. But for simplicity, we assume they Suggestion: // Let us show an example with Method names. But for simplicity, we assume they test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 1136: > 1134: // If we directly nest the templateMethod, then the addStructuralName goes to the nested > 1135: // scope, and is not available at the class scope, i.e. it is not visible > 1136: // for sampleStructuralName in outside of the templateMethod. Suggestion: // for sampleStructuralName outside of the templateMethod. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 1141: > 1139: templateMethod.asToken("+"), > 1140: templateMethod.asToken("-"), > 1141: // However, if we insert to the CLASS_HOOK, then the Rendere makes the Suggestion: // However, if we insert to the CLASS_HOOK, then the Renderer makes the test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestTemplate.java line 121: > 119: > 120: public static void main(String[] args) { > 121: // The follwing tests all pass, i.e. have no errors during rendering. Suggestion: // The following tests all pass, i.e. have no errors during rendering. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101812122 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101813916 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101813623 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101814762 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101815298 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101815758 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101816278 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101816774 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101817110 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101817702 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101818057 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101818686 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101819135 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101807152 From rehn at openjdk.org Thu May 22 07:33:53 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 22 May 2025 07:33:53 GMT Subject: RFR: 8357056: RISC-V: Asm fixes - load/store width [v2] In-Reply-To: References: Message-ID: <9Nkcpc7anxvP_1Voepk26q9S4B4ieTK1N-OlAuXY1SU=.745bdc22-74d1-420c-95f2-c94891ebd828@github.com> On Wed, 21 May 2025 08:36:09 GMT, Feilong Jiang wrote: >> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Fixed flh/flw/fld >> - Merge branch 'master' into asm_fixes >> - Fixes > > src/hotspot/cpu/riscv/assembler_riscv.hpp line 3307: > >> 3305: >> 3306: // -------------------------- >> 3307: void sd(Register Rd, Register Rs, const int32_t offset) { > > We can rename `Rd`/`Rs` to `Rs2`/`Rs1` to be more consistent with the specification. Fixed > src/hotspot/cpu/riscv/assembler_riscv.hpp line 3322: > >> 3320: >> 3321: // -------------------------- >> 3322: void sw(Register Rd, Register Rs, const int32_t offset) { > > Same here. Fixed > src/hotspot/cpu/riscv/assembler_riscv.hpp line 3337: > >> 3335: >> 3336: // -------------------------- >> 3337: void fsd(FloatRegister Rd, Register Rs, const int32_t offset) { > > And here. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25253#discussion_r2101847874 PR Review Comment: https://git.openjdk.org/jdk/pull/25253#discussion_r2101848112 PR Review Comment: https://git.openjdk.org/jdk/pull/25253#discussion_r2101848338 From rehn at openjdk.org Thu May 22 07:40:54 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 22 May 2025 07:40:54 GMT Subject: RFR: 8357056: RISC-V: Asm fixes - load/store width [v3] In-Reply-To: <0nyEXe_gpWn158OHmSJMEAPk_MMn9EXLctWeipKZQd4=.b89cbb07-50a5-4466-862e-951d7a6a9059@github.com> References: <0nyEXe_gpWn158OHmSJMEAPk_MMn9EXLctWeipKZQd4=.b89cbb07-50a5-4466-862e-951d7a6a9059@github.com> Message-ID: On Thu, 22 May 2025 06:56:36 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> While working on https://github.com/openjdk/jdk/pull/25252, I notice: >> - Major op code was just repeat >> - Width coded in binary >> - Stores have mixed up rs1 and rs2 >> - Bonus, fsd used a macro for no reason >> >> I think this improves readability. >> >> Tested tier1 >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Rd/Rs->Rs2/Rs1 > - Merge branch 'master' into asm_fixes > - Fixed flh/flw/fld > - Merge branch 'master' into asm_fixes > - Fixes I can imagine something like: `void _sd(Register Rs2_value, Register Rs1_address, const int32_t offset)` ------------- PR Comment: https://git.openjdk.org/jdk/pull/25253#issuecomment-2900217955 From fyang at openjdk.org Thu May 22 07:44:56 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 22 May 2025 07:44:56 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v16] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 04:39:43 GMT, Anjian-Wen wrote: >> From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time >> >> on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below >> >> before the patch >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op >> MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op >> MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op >> MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op >> MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op >> MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op >> MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op >> MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op >> MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op >> MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op >> MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op >> MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op >> MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op >> MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op >> MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op >> MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op >> MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op >> MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op >> MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/o... > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > delete the path for code size and test Looks good to me. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1700: > 1698: __ addi(dest, dest, 1); > 1699: __ subi(count, count, 1); > 1700: __ bind(L_skip_align1); Leave a new line before this bind. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1707: > 1705: __ addi(dest, dest, 2); > 1706: __ subi(count, count, 2); > 1707: __ bind(L_skip_align2); Leave a new line before this bind. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23890#pullrequestreview-2860140072 PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2101867919 PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2101868367 From qxing at openjdk.org Thu May 22 07:53:39 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Thu, 22 May 2025 07:53:39 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v3] In-Reply-To: References: Message-ID: > In `PhaseIdealLoop`, `IdealLoopTree::check_safepts` method checks if any call that is guaranteed to have a safepoint dominates the tail of the loop. In the previous implementation, `check_safepts` would stop if it found a local non-call safepoint. At this time, if there was a call before the safepoint in the dom-path, this safepoint would not be eliminated. > > loop-safepoint > > This patch changes the behavior of `check_safepts` to not stop when it finds a non-local safepoint. This makes simple loops with one method call ~3.8% faster (on aarch64). > > > Benchmark Mode Cnt Score Error Units > LoopSafepoint.loopVar avgt 15 208296.259 ? 1350.409 ns/op # baseline > LoopSafepoint.loopVar avgt 15 200692.874 ? 616.770 ns/op # this patch > > > Testing: tier1-2 on x86_64 and aarch64. Qizheng Xing has updated the pull request incrementally with one additional commit since the last revision: Improve documentation comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23057/files - new: https://git.openjdk.org/jdk/pull/23057/files/56983ed5..1a216046 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23057&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23057&range=01-02 Stats: 20 lines in 1 file changed: 8 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/23057.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23057/head:pull/23057 PR: https://git.openjdk.org/jdk/pull/23057 From qxing at openjdk.org Thu May 22 07:56:01 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Thu, 22 May 2025 07:56:01 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v2] In-Reply-To: References: Message-ID: <5ovuOEYt-lBwRPKWDU5emB_JDWrvwEiNXRQKt5EwqkM=.c645ff3f-c2bd-4931-8e6c-1f6c8482c837@github.com> On Wed, 2 Apr 2025 07:22:13 GMT, Emanuel Peter wrote: >> The second question: >> >>> If we now removed safepoints in places where we would actually have needed them: how would we find out? I suppose we would get longer time to safepoint - higher latency in some cases. How would we catch this with our tests? >> >> I tried running tier1 tests with `JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+SafepointTimeout -XX:+AbortVMOnSafepointTimeout -XX:SafepointTimeoutDelay=1000`, and there were no failures. >> >> Running with `-XX:SafepointTimeoutDelay=500` caused 1 random JDK test case to fail. But then I tried to build a JDK without this patch, and it still had the random failure with this option. > > @MaxXSoft Would you mind improving the documentation comments, so that they are easier to understand? Maybe you can even add more comments around your code change, to "prove" why it is ok to do what we would do with your change? @eme64 Hello, I've updated the documentation comments to make them easier to understand. Could please continue to review this PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-2900256366 From epeter at openjdk.org Thu May 22 07:58:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 May 2025 07:58:51 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v56] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - remove comment line for Roberto - Apply suggestions from code review Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/62d4c499..5c277631 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=55 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=54-55 Stats: 15 lines in 2 files changed: 0 ins; 1 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Thu May 22 07:58:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 May 2025 07:58:54 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 17:06:34 GMT, Roberto Casta?eda Lozano wrote: >> A few more documentation suggestions, will continue reviewing this changeset over the next days. > >> @robcasloz I addressed all your comments :) > > Thanks @eme64! @robcasloz Thanks for the suggestions, I applied them all :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2900260836 From epeter at openjdk.org Thu May 22 07:58:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 May 2025 07:58:58 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v53] In-Reply-To: References: <2WOGu4F76zeKo3VTQe4GNuA1rTZKVbyyhUEcAhrmjt4=.66c348ca-0016-4c9b-8af4-3007bde64c71@github.com> Message-ID: On Wed, 21 May 2025 14:01:53 GMT, Roberto Casta?eda Lozano wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> improve documentation in TestTemplate.java > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 85: > >> 83: comp.compile(); >> 84: >> 85: // Object ret = p.xyz.InnerTest1.main(); > > This comment is incomplete, I'd suggest removing it. removed :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101894346 From fjiang at openjdk.org Thu May 22 08:02:10 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 22 May 2025 08:02:10 GMT Subject: RFR: 8357056: RISC-V: Asm fixes - load/store width [v3] In-Reply-To: <-pDBz-72cFZEUgkA92BL9cu5us7zSgHZJJZfYL4s2WM=.01bc67a1-d9e7-480b-82f7-2c432e02d6ad@github.com> References: <0nyEXe_gpWn158OHmSJMEAPk_MMn9EXLctWeipKZQd4=.b89cbb07-50a5-4466-862e-951d7a6a9059@github.com> <-pDBz-72cFZEUgkA92BL9cu5us7zSgHZJJZfYL4s2WM=.01bc67a1-d9e7-480b-82f7-2c432e02d6ad@github.com> Message-ID: On Thu, 22 May 2025 07:10:24 GMT, Hamlin Li wrote: > I think the original rd/rs is easier to understand the meaning of the parameters. Although the changed one is consistent with the specification, it seems to me that it is more error-prone to use it, maybe it's just my feeling? Currently, we only use the store instructions with `Address` or `address`, which is less confusing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25253#issuecomment-2900269135 From mli at openjdk.org Thu May 22 08:12:57 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 22 May 2025 08:12:57 GMT Subject: RFR: 8357056: RISC-V: Asm fixes - load/store width [v3] In-Reply-To: References: <0nyEXe_gpWn158OHmSJMEAPk_MMn9EXLctWeipKZQd4=.b89cbb07-50a5-4466-862e-951d7a6a9059@github.com> Message-ID: <8olAoLcPpEHAc5CaDhG1G1dMKAzUJTzJNcfaDqSrY7g=.f8299638-a68e-4ac5-9dc9-8a509de011b3@github.com> On Thu, 22 May 2025 07:38:19 GMT, Robbin Ehn wrote: > I can imagine something like: `void _sd(Register Rs2_value, Register Rs1_address, const int32_t offset)` It's much better! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25253#issuecomment-2900309942 From roland at openjdk.org Thu May 22 08:40:08 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 May 2025 08:40:08 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) Message-ID: The test case has an out of loop `Store` with an `AddP` address expression that has other uses and is in the loop body. Schematically, only showing the address subgraph and the bases for the `AddP`s: Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 -> CastPP#110 Both `AddP`s have the same base, a `CastPP` that's also in the loop body. That loop is a counted loop and only has 3 iterations so is fully unrolled. First, one iteration is peeled: /-> CastPP#110 Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 -> AddP#277 -> AddP#278 -> CastPP#283 -> CastPP#283 The `AddP`s and `CastPP` are cloned (because in the loop body). As part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is called. It finds the test that guards `CastPP#283` in the peeled iteration dominates and replaces the test that guards `CastPP#110` (the test in the peeled iteration is the clone of the test in the loop). That causes `CastPP#110`'s control to be updated to that of the test in the peeled iteration and to be yanked from the loop. So now `CastPP#283` and `CastPP#110` have the same inputs. Next unrolling happens: /-> CastPP#110 /-> AddP#400 -> AddP#401 -> CastPP#110 Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 \ -> CastPP#110 -> AddP#277 -> AddP#278 -> CastPP#283 -> CastPP#283 `AddP`s are cloned once more but not the `CastPP`s because they are both in the peeled iteration now. A new `Phi` is added. Next igvn runs. It's going to push the `AddP`s through the `Phi`s. Through `Phi#477`: /-> CastPP#110 Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 \ -> AddP#134 -> CastPP#110 -> AddP#277 -> AddP#278 -> CastPP#283 -> CastPP#283 Through `Phi#360`: /-> AddP#134 -> CastPP#110 /-> Phi#509 -> AddP#401 -> CastPP#110 Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 -> Phi#514 -> CastPP#283 -> CastP#110 Then `Phi#514` which has 2 `CastPP`s as input with identical inputs is transformed into another `CastPP` at the `Phi` constrol with the data control of the `CastPP` as input. `PhiNode::unique_input()` with `uncast = true` is where that happens. That's where things go wrong I think. /-> AddP#134 -> CastPP#110 /-> Phi#509 -> AddP#401 -> CastPP#110 Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 -> CastPP#529 Next `AddP`s pushed through `Phi#509`: /-> AddP#536 -> CastPP#110 Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 -> CastPP#529 `CastPP#110` and `CastPP#283` commoned (they have the same inputs): /-> AddP#536 -> CastPP#110 Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#110 -> CastPP#529 Finally, AddPs pushed through `Phi#515`: Store#195 -> AddP#516 -> AddP#544 -> CastPP#110 -> CastPP#529 And we end up with 2 `AddP`s with different bases. The 2 `CastPP`s have the same data input but not same control and igvn can't common them. The fix I propose is to delay the call to `PhiNode::unique_input()` with `uncast = true` if the `Phi`'s inputs are cast nodes and have yet to be processed by igvn. This causes identical `CastPP`s to common and then only the `Phi` has 2 identical inputs is transformed to that input (rather than have a new `CastPP`s be created at a different control). ------------- Commit messages: - more - test - fix Changes: https://git.openjdk.org/jdk/pull/25386/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25386&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351889 Stats: 99 lines in 3 files changed: 99 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25386.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25386/head:pull/25386 PR: https://git.openjdk.org/jdk/pull/25386 From duke at openjdk.org Thu May 22 09:10:12 2025 From: duke at openjdk.org (David Linus Briemann) Date: Thu, 22 May 2025 09:10:12 GMT Subject: RFR: 8357304: [PPC64] C2: Implement MinV, MaxV and Reduction nodes [v2] In-Reply-To: References: Message-ID: > The following nodes are added: > - MinV / MaxV > - AndV / OrV / XorV > - MinReductionV / MaxReductionV / AndReductionV / OrReductionV / XorReductionV > - AddReductionVI / MulReductionVI David Linus Briemann has updated the pull request incrementally with one additional commit since the last revision: remove TEMP_DEF effect for dst ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25318/files - new: https://git.openjdk.org/jdk/pull/25318/files/e3c5f9f1..8126d0db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25318&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25318&range=00-01 Stats: 33 lines in 2 files changed: 0 ins; 12 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/25318.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25318/head:pull/25318 PR: https://git.openjdk.org/jdk/pull/25318 From rcastanedalo at openjdk.org Thu May 22 09:21:04 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 22 May 2025 09:21:04 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v56] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 07:58:51 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - remove comment line for Roberto > - Apply suggestions from code review > > Co-authored-by: Roberto Casta?eda Lozano Looks good, thank you for your patience through multiple design and review iterations! A couple of observations: - I measured the time to generate the code using this framework for [my case study](https://github.com/robcasloz/jdk/blob/JDK-8344942-template-testing/test/hotspot/jtreg/compiler/loopopts/TestArrayFillAntiDependenceTemplatedWithDelimiters.java) (using as fastdebug build) and it seems acceptable (in the milliseconds - same order of magnitude as executing it). The time to compile it using the compile framework is noticeable though (a couple of seconds). If we start adopting this methodology in the large, we might have to find ways to mitigate the increased test execution time. - I struggled understanding the interplay between names, scopes, hooks, etc. (e.g. in the `generateWithDataNamesAndScopes2` example). I think what makes it difficult is that it requires understanding the internals of the framework (order of expansion/evaluation etc.). But I guess this is unavoidable complexity for "advanced" fuzzing use cases only, and I trust that with some effort, the available documentation and examples would be enough to understand the details if necessary. - The `DataName` and `StructuralName` concepts seem to belong to a lower level of abstraction (Java language) compared to the rest of the framework's API (abstract language). Just an observation, since I imagine it is difficult (or feels unnecessary) to abstract these concepts when we need to express very Java-like things like mutability or inheritance. test/hotspot/jtreg/compiler/lib/template_framework/DataName.java line 42: > 40: * {@link DataName} when sampling later on. > 41: */ > 42: public record DataName(String name, DataName.Type type, boolean mutable, int weight) implements Name { Since `DataName` represents "fields and variables", you might consider renaming it to `VariableName` or `VarName` for clarity - fields can be considered variables too (as in [instance variables](https://en.wikipedia.org/wiki/Instance_variable)). This is just a minor, optional suggestion, in case you still have some fuel left for this PR ;) ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2860094632 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2102054527 From rcastanedalo at openjdk.org Thu May 22 09:21:08 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 22 May 2025 09:21:08 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v55] In-Reply-To: <0jRlRRNQVdveMPwuxSXuX69ZZk5w5BpHeNtbMP03C2k=.b0830b58-f72e-43c0-bdaa-890748f962e7@github.com> References: <0jRlRRNQVdveMPwuxSXuX69ZZk5w5BpHeNtbMP03C2k=.b0830b58-f72e-43c0-bdaa-890748f962e7@github.com> Message-ID: On Wed, 21 May 2025 15:24:18 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > move order in tutorial test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 209: > 207: *

> 208: * Similarly, we may want to model method and class names, and possibly other structural names. We model > 209: * these names with {@link StructuralName}, which works analogous to {@link DataName}, except that they Suggestion: * these names with {@link StructuralName}, which works analogously to {@link DataName}, except that they test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 215: > 213: * When working with {@link DataName}s and {@link StructuralName}s, it is important to be aware of the > 214: * relevant scopes, as well as the execution order of the {@link Template} lambdas, as well as the evaluation > 215: * of the {@link Template#body} tokens. When a {@link Template} is rendered, its lambda is invoke. In the Suggestion: * of the {@link Template#body} tokens. When a {@link Template} is rendered, its lambda is invoked. In the test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 261: > 259: * // nested Template would observe an increment in the count. > 260: * anotherTemplate.asToken(), > 261: * // By this point, all methods are called, and the tokens generated. The Suggestion: * // By this point, all methods are called, and the tokens generated. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 608: > 606: * > 607: * @param tokens A list of tokens, which can be {@link String}s, boxed primitive types > 608: * (e.g. {@link Integer}), any {@link Token}, or {@link List}s For correct javadoc rendering: Suggestion: * (for example {@link Integer}), any {@link Token}, or {@link List}s test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 752: > 750: > 751: /** > 752: * Add a {@link DataName} in the current scope, i.e. the innermost of either For correct javadoc rendering: Suggestion: * Add a {@link DataName} in the current scope, that is the innermost of either test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 774: > 772: > 773: /** > 774: * Add a {@link DataName} in the current scope, i.e. the innermost of either For correct javadoc rendering: Suggestion: * Add a {@link DataName} in the current scope, that is the innermost of either test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 799: > 797: > 798: /** > 799: * Add a {@link StructuralName} in the current scope, i.e. the innermost of either For correct javadoc rendering: Suggestion: * Add a {@link StructuralName} in the current scope, that is the innermost of either test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 813: > 811: > 812: /** > 813: * Add a {@link StructuralName} in the current scope, i.e. the innermost of either For correct javadoc rendering: Suggestion: * Add a {@link StructuralName} in the current scope, that is the innermost of either test/hotspot/jtreg/compiler/lib/template_framework/TemplateBinding.java line 27: > 25: > 26: /** > 27: * To facilitate recursive uses of Templates, e.g. where a template uses To prevent `javadoc` from truncating the sentence in the Description column in the package summary view (`package-summary.html`), see https://stackoverflow.com/questions/18282086/how-tell-tell-javadoc-that-my-period-doesnt-end-a-sentence. Suggestion: * To facilitate recursive uses of Templates, for example where a template uses ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101857252 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101858376 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101862125 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101870441 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101865914 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101866887 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101867993 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101868739 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2101839097 From bkilambi at openjdk.org Thu May 22 09:28:55 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 22 May 2025 09:28:55 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v4] In-Reply-To: <2gqcpkBJkkrb21MZlPPHMWqcVWRNEz0KXdolQxd8CkI=.80a3abdb-7a11-4aec-b307-66874de81757@github.com> References: <5OezxGXLCvvauaNiX7FkOacjbwvvB-sc3k8MdEjKmwo=.8d69862a-feee-4dbe-bcf9-b53620f823f7@github.com> <2gqcpkBJkkrb21MZlPPHMWqcVWRNEz0KXdolQxd8CkI=.80a3abdb-7a11-4aec-b307-66874de81757@github.com> Message-ID: On Thu, 22 May 2025 06:17:40 GMT, Emanuel Peter wrote: >> Hi @eme64 I removed the `@Warmup` entirely and the test does pass on aarch64. Although I am a bit afraid to fully remove it as it could sometimes lead to the loop not being warm enough for c2 vectorization to kick in. I haven't tried with different values of the warmup iterations though. Do you think it's ok to remove it entirely? > > @Bhavana-Kilambi The TestFramework actually forces C2 compilation: > - runs warmup iterations, maybe C2 triggers automatically because there are enough iterations. > - Once warmup is over, the TestFramework checks if the method is already compiled, if not, it enqueues it. > - In the end, we know it is C2 compiled, which gives us the C2 IR we can match with. > > In my experience, having low warmup count works in most cases. Except when you need profiling data. If you have zero warup, we basically have compilation with `-Xcomp`. > > So it really depends on your specific case. In general, I would avoid doing an `Xcomp` compilation / zero warmup, because then we do not test normal compilation with profiling. And compilation with profiling is more important I think. > > But in cases where you have a large loop in the test method, we would trigger OSR and normal compilation with profiling rather soon anyway. So lowering the warmup is ok. How many loop iterations do we need for OSR? > `product(intx, Tier4BackEdgeThreshold, 40000`. We could round that up to `100_000`, just to be sure. With `LEN = 2048`, you would thus only need about `50` invocations of the tests during warmup to reach C2 compilation. Hence, the current `@Warmup(10000)` is much too high, I think. You could cut down the runtime by about a factor of `100` here, if my math is correct :exploding_head: > > What do you think? Hi @eme64 Thanks for the details and suggestions. I tried with a `@Warmpup(50) `(my calculation is 50 * 2048 = 102400 which is around 100_000) and the test passes on aarch64 (it passes even with 0 warmp though). Do you think we can go ahead with `@Warmup(50)` ? Also, can I ask if any other tests failed on your side (they shouldn't though as I havent touched any other code other than FP16)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25096#issuecomment-2900517830 From roland at openjdk.org Thu May 22 09:30:33 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 May 2025 09:30:33 GMT Subject: RFR: 8356989: Unexpected null in C2 compiled code Message-ID: In the test case, a non escaping array is initialized by an `arraycopy` that uses this array as source and destination. Following the `arraycopy`, one of the element of the array is tested for `null`. That null check is constant folded to always `null` by escape analysis. As I understand, the `Allocate` for the array should be marked by EA as destination of an array copy. That state should then be propagated by EA to uses and all destinations of an array copy should be marked as unknown value. But EA has logic that explicitly skips the case where an `ArrayCopy` has same source and destination. Removing that logic fixes the failure. ------------- Commit messages: - whitespaces - test - fix Changes: https://git.openjdk.org/jdk/pull/25389/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25389&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8356989 Stats: 69 lines in 2 files changed: 61 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25389/head:pull/25389 PR: https://git.openjdk.org/jdk/pull/25389 From roland at openjdk.org Thu May 22 09:30:33 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 May 2025 09:30:33 GMT Subject: RFR: 8356989: Unexpected null in C2 compiled code In-Reply-To: References: Message-ID: On Thu, 22 May 2025 09:22:08 GMT, Roland Westrelin wrote: > In the test case, a non escaping array is initialized by an > `arraycopy` that uses this array as source and destination. Following > the `arraycopy`, one of the element of the array is tested for > `null`. That null check is constant folded to always `null` by escape > analysis. As I understand, the `Allocate` for the array should be > marked by EA as destination of an array copy. That state should then > be propagated by EA to uses and all destinations of an array copy > should be marked as unknown value. But EA has logic that explicitly > skips the case where an `ArrayCopy` has same source and > destination. Removing that logic fixes the failure. @vnkozlov you added that code with 7147744. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25389#issuecomment-2900514521 From rehn at openjdk.org Thu May 22 09:31:56 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 22 May 2025 09:31:56 GMT Subject: RFR: 8357056: RISC-V: Asm fixes - load/store width [v3] In-Reply-To: <0nyEXe_gpWn158OHmSJMEAPk_MMn9EXLctWeipKZQd4=.b89cbb07-50a5-4466-862e-951d7a6a9059@github.com> References: <0nyEXe_gpWn158OHmSJMEAPk_MMn9EXLctWeipKZQd4=.b89cbb07-50a5-4466-862e-951d7a6a9059@github.com> Message-ID: On Thu, 22 May 2025 06:56:36 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> While working on https://github.com/openjdk/jdk/pull/25252, I notice: >> - Major op code was just repeat >> - Width coded in binary >> - Stores have mixed up rs1 and rs2 >> - Bonus, fsd used a macro for no reason >> >> I think this improves readability. >> >> Tested tier1 >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Rd/Rs->Rs2/Rs1 > - Merge branch 'master' into asm_fixes > - Fixed flh/flw/fld > - Merge branch 'master' into asm_fixes > - Fixes I'm fine either way. But as all other instructions plainly have Rd/Rs1/rs2, I think we should address all or none. E.g. maybe it's better to handle in a seperate issue? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25253#issuecomment-2900526965 From epeter at openjdk.org Thu May 22 09:33:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 May 2025 09:33:21 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v57] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/5c277631..bd79554d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=56 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=55-56 Stats: 9 lines in 2 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From aph at openjdk.org Thu May 22 09:50:58 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 22 May 2025 09:50:58 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 06:09:25 GMT, Amit Kumar wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > improved mvc implementation There's a lot of confusion about this. There is no requirement that all bytes before the non-writable address have been written when hitting a signal. Behaving nicely when writing beyond allocated memory is "best effort" only: we're trying to be nice, that's all. The atomicity requirement is here , in the specification of `Unsafe::SetMemory`: *

The stores are in coherent (atomic) units of a size determined * by the address and length parameters. If the effective address and * length are all even modulo 8, the stores take place in 'long' units. * If the effective address and length are (resp.) even modulo 4 or 2, * the stores take place in units of 'int' or 'short'. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2900591094 From mdoerr at openjdk.org Thu May 22 09:51:52 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 22 May 2025 09:51:52 GMT Subject: RFR: 8357304: [PPC64] C2: Implement MinV, MaxV and Reduction nodes [v2] In-Reply-To: References: Message-ID: <_6tD-WAMzku7CX3WVuXT49Y_pqFADlKeVWEuhQ-of7Q=.07cbe584-6d3d-4e00-841f-1ea2b8b9de51@github.com> On Thu, 22 May 2025 09:10:12 GMT, David Linus Briemann wrote: >> The following nodes are added: >> - MinV / MaxV >> - AndV / OrV / XorV >> - MinReductionV / MaxReductionV / AndReductionV / OrReductionV / XorReductionV >> - AddReductionVI / MulReductionVI > > David Linus Briemann has updated the pull request incrementally with one additional commit since the last revision: > > remove TEMP_DEF effect for dst LGTM. Thanks! ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25318#pullrequestreview-2860558528 From mdoerr at openjdk.org Thu May 22 09:54:55 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 22 May 2025 09:54:55 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 06:09:25 GMT, Amit Kumar wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > improved mvc implementation Ah, thanks! I was not aware of that. That means the current implementation is probably wrong in some cases (mvc generated by gcc). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2900603929 From rcastanedalo at openjdk.org Thu May 22 10:00:56 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 22 May 2025 10:00:56 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill In-Reply-To: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Wed, 21 May 2025 12:33:26 GMT, Jatin Bhateja wrote: > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. Have you checked that these tests exercise `ZRuntimeCallSpill` significantly? Most tests in that directory seem to exercise C2's generated ZGC barriers, which use other spilling/restoring logic across runtime calls (`SaveLiveRegisters`). Also, I expect the register pressure in these test cases to be minimal, so it could be good to randomize register assignment to improve the testing effectiveness. Finally, `ZRuntimeCallSpill` is typically used in slow paths, which are rarely exercised in short-lived test cases. Have you considered altering the users of `ZRuntimeCallSpill` so that they are forced to always, or at least more often, enter the slow path, for testing purposes? [This PR](https://github.com/openjdk/jdk/pull/18967) did something similar in the context of C2 ZGC barriers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25351#issuecomment-2900619263 From shade at openjdk.org Thu May 22 10:15:07 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 22 May 2025 10:15:07 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence In-Reply-To: References: Message-ID: On Tue, 20 May 2025 00:49:49 GMT, Vladimir Ivanov wrote: > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks This is an impressive piece of work. It still makes me yearn for simpler solution. (I am impressed how much it snowballed from my original `@DontInline` implementation!) > Representing ReachabilityFence as memory barrier (e.g., MemBarCPUOrder) would solve the issue, but performance costs are prohibitively high. How bad is it? `MemBarCPUOrder` pinches all memory, so I assume this breaks a lot of optimizations when `RF` is sitting in the hot loop? I remember we went through a similar exercise with `Blackholes`: [JDK-8296545](https://bugs.openjdk.org/browse/JDK-8296545) -- and decided to pinch only the control. I _guessing_ this is not enough to fix RF, or is it? Some other comments: src/hotspot/share/opto/block.cpp line 189: > 187: !get_node(end_idx)->is_Mach() && > 188: !get_node(end_idx)->is_BoxLock() && > 189: !(get_node(end_idx)->is_ReachabilityFence() && C->print_assembly())) { Um. So this fairly generic method is predicated of whether a diagnostic VM option is enabled. Which risks that the compiler behavior with/without printing assembly is different? Which might hide the very issues we are trying to diagnose with printing assembly? Can we handle `RF` here without checking for `print_assembly`? This will also obviate a need to pass `Compile* C` around. src/hotspot/share/opto/callnode.cpp line 969: > 967: projs->exobj = e; > 968: } else { > 969: // exception table for rethrow case Feels like we want to assert other values of `e->in(0)->as_CatchProj()->_con` here? From the switch statement in the previous hunk, there seems to be `fall_through_index` that is not "rethrow case" (or is it?). src/hotspot/share/opto/classes.cpp line 50: > 48: #include "opto/subtypenode.hpp" > 49: #include "opto/vectornode.hpp" > 50: #include "utilities/macros.hpp" This `macros.hpp` include is for `#if INCLUDE_SHENANDOAHGC` a line below, please keep it intact. src/hotspot/share/opto/compile.cpp line 2586: > 2584: C->node_hash()->clear(); > 2585: > 2586: // A method with only infinite loops has no edges entering loops from root Redundant. src/hotspot/share/opto/compile.cpp line 3912: > 3910: // requires that the walk visits a node's inputs before visiting the node. > 3911: > 3912: static bool has_non_debug_uses(Node* n) { This got inserted right between `------------------------------final_graph_reshaping_walk--------------------` comment and the `final_graph_reshaping_walk` implementation. Also, put `has_non_debug_uses` into `Compile`? src/hotspot/share/opto/compile.cpp line 3968: > 3966: return; > 3967: > 3968: // Go over ReachabilityFence nodes to skip DecodeN nodes for referents. This is a cute optimization. Does it happen in our code anywhere? I would have expected `DecodeN` to be near the heap loads, and suppose `RF` is mostly called on locals, which are already uncompressed? src/hotspot/share/opto/compile.cpp line 3995: > 3993: for (int j = start; j < end; j++) { > 3994: Node* in = n->in(j); > 3995: if (in->is_DecodeNarrowPtr() && (is_uncommon || has_non_debug_uses(in))) { The comment says we can skip when node is only referenced in debug info. Here we skip when there _are_ non-debug uses. Did you mean `!has_non_debug_uses(in)`? src/hotspot/share/opto/parse1.cpp line 1225: > 1223: > 1224: if (StressReachabilityFences) { > 1225: // Keep all oop arguments alive until method return. Comment says "arguments", but we save locals. Aren't arguments on "stack" in `JVMState`? For stress mode, would make sense to hook up both locals/stack from `JVMState`, maybe? src/hotspot/share/opto/reachability.cpp line 211: > 209: } > 210: } > 211: return found; `found` is always `false` here. I don't think this function even needs a return value, judging by the uses. src/hotspot/share/opto/reachability.cpp line 431: > 429: } > 430: } > 431: redundant_rfs.push(rf); I see `PhaseIdealLoop::optimize_reachability_fences` asks for `redundant_rfs.member(rf)` before going into this analysis. Is it because we can have duplicate RFs in the `C->reachability_fence` list? Can we have duplicate here? Should we check `.member(rf)` here as well? src/hotspot/share/opto/reachability.hpp line 48: > 46: // Fake the incoming arguments mask for blackholes: accept all registers > 47: // and all stack slots. This would avoid any redundant register moves > 48: // for blackhole inputs. Still mentions blackholes. src/java.base/share/classes/java/lang/ref/Reference.java line 662: > 660: * @since 9 > 661: */ > 662: @IntrinsicCandidate Sounds like we also want to restore `@DontInline` to cover the case when intrinsic is not available / disabled for some compiler. I vaguely remember some intrinsic handling code checks whether method is prohibited from inlining (maybe affects only global `-XX:-Inline`, not sure), so it might be as straightforward. ------------- PR Review: https://git.openjdk.org/jdk/pull/25315#pullrequestreview-2860354171 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2102055970 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2102067844 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2102002006 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2102069381 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2102071571 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2102080895 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2102104522 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2102130716 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2102147078 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2102170286 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2102011606 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2102028883 From shade at openjdk.org Thu May 22 10:15:07 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 22 May 2025 10:15:07 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence In-Reply-To: References: Message-ID: On Thu, 22 May 2025 09:26:55 GMT, Aleksey Shipilev wrote: >> This PR introduces C2 support for `Reference.reachabilityFence()`. >> >> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. >> >> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. >> >> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. >> >> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 >> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations >> - [x] java/lang/foreign microbenchmarks > > src/hotspot/share/opto/compile.cpp line 3968: > >> 3966: return; >> 3967: >> 3968: // Go over ReachabilityFence nodes to skip DecodeN nodes for referents. > > This is a cute optimization. Does it happen in our code anywhere? I would have expected `DecodeN` to be near the heap loads, and suppose `RF` is mostly called on locals, which are already uncompressed? Now that I read the next hunk, should `is_DecodeN` be `is_DecodeNarrowPtr` to capture class loads (however unlikely that one is)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2102106957 From epeter at openjdk.org Thu May 22 10:37:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 May 2025 10:37:06 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 17:06:34 GMT, Roberto Casta?eda Lozano wrote: >> A few more documentation suggestions, will continue reviewing this changeset over the next days. > >> @robcasloz I addressed all your comments :) > > Thanks @eme64! @robcasloz Thanks for all the suggestions and comments! > Looks good, thank you for your patience through multiple design and review iterations! > > A couple of observations: > > * I measured the time to generate the code using this framework for [my case study](https://github.com/robcasloz/jdk/blob/JDK-8344942-template-testing/test/hotspot/jtreg/compiler/loopopts/TestArrayFillAntiDependenceTemplatedWithDelimiters.java) (using as fastdebug build) and it seems acceptable (in the milliseconds - same order of magnitude as executing it). The time to compile it using the compile framework is noticeable though (a couple of seconds). If we start adopting this methodology in the large, we might have to find ways to mitigate the increased test execution time. In my experience, compilation has a significant overhead. But the biggest overhead is **starting** the compilation, and gets only slightly slower with a larger file. So if you have a test that is big enough, eventually running the compiled code is the biggest factor. > * I struggled understanding the interplay between names, scopes, hooks, etc. (e.g. in the `generateWithDataNamesAndScopes2` example). I think what makes it difficult is that it requires understanding the internals of the framework (order of expansion/evaluation etc.). But I guess this is unavoidable complexity for "advanced" fuzzing use cases only, and I trust that with some effort, the available documentation and examples would be enough to understand the details if necessary. Yes, the interplay between `DataName`s and scopes is non-trivial. It all comes from the fact that Templates are evaluated in two passes: lambda evaluation and token evaluation. My string based approach only had a single pass, that was an advantage there, then the order is very clear. An alternative is a StringBuilder based approach, where you have to serialize everything with `append` style calls. But that clutters the interface. The current approach is to have a comma separated list. But that means that those comma separated tokens are generated during the lambda evaluation, and then in a second pass, those tokens are evaluated. It's a nice interface, but leads to some possible confusion about the order. > * The `DataName` and `StructuralName` concepts seem to belong to a lower level of abstraction (Java language) compared to the rest of the framework's API (abstract language). Just an observation, since I imagine it is difficult (or feels unnecessary) to abstract these concepts when we need to express very Java-like things like mutability or inheritance. Right, Templates are just about string generation, and with the scopes/hook insertions it becomes sort of an abstract language. I suppose one could have a language without variables, class and function names... but that would be hard to imagine. I think that `DataName` and `StructuralName` are still at a very high level of abstraction, they could work for the languages I know at least. Plus: all we are currently trying to target is Java, and maybe Jasm. Maybe possibly one day other JVM languages. My goal was to keep the basic framework as simple as possible, and build everything Java in follow-up RFEs, to build up a library of Templates. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2900729675 From epeter at openjdk.org Thu May 22 10:41:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 May 2025 10:41:09 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v16] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 17:06:34 GMT, Roberto Casta?eda Lozano wrote: >> A few more documentation suggestions, will continue reviewing this changeset over the next days. > >> @robcasloz I addressed all your comments :) > > Thanks @eme64! @robcasloz Another comment about scopes and `DataName`s. I hope that most regular users don't have to directly interact with `addDataName` and `dataNames()...sample()`, but that these are wrapped into Templates. The hope is that Templates like `defineField(name, type)` and `loadValue.asToken(type)` could wrap the API calls, so that the user does not have to interact with `DataName` directly, but more with Java concepts, such as adding fields, variables, getting some random value (maybe constant, maybe field load, maybe variable load, ...). ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2900738850 From mli at openjdk.org Thu May 22 10:41:54 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 22 May 2025 10:41:54 GMT Subject: RFR: 8357056: RISC-V: Asm fixes - load/store width [v3] In-Reply-To: <0nyEXe_gpWn158OHmSJMEAPk_MMn9EXLctWeipKZQd4=.b89cbb07-50a5-4466-862e-951d7a6a9059@github.com> References: <0nyEXe_gpWn158OHmSJMEAPk_MMn9EXLctWeipKZQd4=.b89cbb07-50a5-4466-862e-951d7a6a9059@github.com> Message-ID: On Thu, 22 May 2025 06:56:36 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> While working on https://github.com/openjdk/jdk/pull/25252, I notice: >> - Major op code was just repeat >> - Width coded in binary >> - Stores have mixed up rs1 and rs2 >> - Bonus, fsd used a macro for no reason >> >> I think this improves readability. >> >> Tested tier1 >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Rd/Rs->Rs2/Rs1 > - Merge branch 'master' into asm_fixes > - Fixed flh/flw/fld > - Merge branch 'master' into asm_fixes > - Fixes Marked as reviewed by mli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25253#pullrequestreview-2860726465 From mli at openjdk.org Thu May 22 10:41:56 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 22 May 2025 10:41:56 GMT Subject: RFR: 8357056: RISC-V: Asm fixes - load/store width [v3] In-Reply-To: References: <0nyEXe_gpWn158OHmSJMEAPk_MMn9EXLctWeipKZQd4=.b89cbb07-50a5-4466-862e-951d7a6a9059@github.com> Message-ID: On Thu, 22 May 2025 09:29:00 GMT, Robbin Ehn wrote: > I'm fine either way. But as all other instructions plainly have Rd/Rs1/rs2, I think we should address all or none. E.g. maybe it's better to handle in a seperate issue? Sure, make sense to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25253#issuecomment-2900737088 From epeter at openjdk.org Thu May 22 10:42:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 May 2025 10:42:55 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v4] In-Reply-To: <2gqcpkBJkkrb21MZlPPHMWqcVWRNEz0KXdolQxd8CkI=.80a3abdb-7a11-4aec-b307-66874de81757@github.com> References: <5OezxGXLCvvauaNiX7FkOacjbwvvB-sc3k8MdEjKmwo=.8d69862a-feee-4dbe-bcf9-b53620f823f7@github.com> <2gqcpkBJkkrb21MZlPPHMWqcVWRNEz0KXdolQxd8CkI=.80a3abdb-7a11-4aec-b307-66874de81757@github.com> Message-ID: On Thu, 22 May 2025 06:17:40 GMT, Emanuel Peter wrote: >> Hi @eme64 I removed the `@Warmup` entirely and the test does pass on aarch64. Although I am a bit afraid to fully remove it as it could sometimes lead to the loop not being warm enough for c2 vectorization to kick in. I haven't tried with different values of the warmup iterations though. Do you think it's ok to remove it entirely? > > @Bhavana-Kilambi The TestFramework actually forces C2 compilation: > - runs warmup iterations, maybe C2 triggers automatically because there are enough iterations. > - Once warmup is over, the TestFramework checks if the method is already compiled, if not, it enqueues it. > - In the end, we know it is C2 compiled, which gives us the C2 IR we can match with. > > In my experience, having low warmup count works in most cases. Except when you need profiling data. If you have zero warup, we basically have compilation with `-Xcomp`. > > So it really depends on your specific case. In general, I would avoid doing an `Xcomp` compilation / zero warmup, because then we do not test normal compilation with profiling. And compilation with profiling is more important I think. > > But in cases where you have a large loop in the test method, we would trigger OSR and normal compilation with profiling rather soon anyway. So lowering the warmup is ok. How many loop iterations do we need for OSR? > `product(intx, Tier4BackEdgeThreshold, 40000`. We could round that up to `100_000`, just to be sure. With `LEN = 2048`, you would thus only need about `50` invocations of the tests during warmup to reach C2 compilation. Hence, the current `@Warmup(10000)` is much too high, I think. You could cut down the runtime by about a factor of `100` here, if my math is correct :exploding_head: > > What do you think? > Hi @eme64 Thanks for the details and suggestions. I tried with a `@Warmpup(50) `(my calculation is 50 * 2048 = 102400 which is around 100_000) and the test passes on aarch64 (it passes even with 0 warmp though). Do you think we can go ahead with `@Warmup(50)` ? Sounds good :) > Also, can I ask if any other tests failed on your side (they shouldn't though as I havent touched any other code other than FP16)? There was no other related test failure :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25096#issuecomment-2900746291 From duke at openjdk.org Thu May 22 10:55:19 2025 From: duke at openjdk.org (Anjian-Wen) Date: Thu, 22 May 2025 10:55:19 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v17] In-Reply-To: References: Message-ID: > From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time > > on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below > > before the patch > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op > MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op > MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op > MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op > MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op > MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op > MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op > MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op > MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op > MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op > MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op > MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op > MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op > MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op > MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op > MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op > MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op > MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op > MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/op > MemorySegmentZeroUnsafe.panama false 63 avgt 30... Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: add new line for bind ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23890/files - new: https://git.openjdk.org/jdk/pull/23890/files/fe8258ee..87165277 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23890&range=15-16 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23890.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23890/head:pull/23890 PR: https://git.openjdk.org/jdk/pull/23890 From duke at openjdk.org Thu May 22 10:55:20 2025 From: duke at openjdk.org (Anjian-Wen) Date: Thu, 22 May 2025 10:55:20 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v16] In-Reply-To: References: Message-ID: <3MURBnqVTRl_gJJjCRWIM87Et6oXeEaZEAeSVaw1Wsw=.9b6b9aa4-8284-48d8-a2d4-be87ce4e2ee1@github.com> On Thu, 22 May 2025 07:40:58 GMT, Fei Yang wrote: >> Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> delete the path for code size and test > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1700: > >> 1698: __ addi(dest, dest, 1); >> 1699: __ subi(count, count, 1); >> 1700: __ bind(L_skip_align1); > > Leave a new line before this bind. done > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 1707: > >> 1705: __ addi(dest, dest, 2); >> 1706: __ subi(count, count, 2); >> 1707: __ bind(L_skip_align2); > > Leave a new line before this bind. done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2102260790 PR Review Comment: https://git.openjdk.org/jdk/pull/23890#discussion_r2102260984 From epeter at openjdk.org Thu May 22 11:07:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 May 2025 11:07:11 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v25] In-Reply-To: References: Message-ID: On Thu, 15 May 2025 14:48:39 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix typo > > Thank you for the refactoring and your patience. I like the result and its simplicity a lot. > > I found a few typos, but otherwise it looks excellent. @mhaessig Not sure if you want to re-review? @chhagedorn I'll wait for your review ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2900811651 From epeter at openjdk.org Thu May 22 11:07:13 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 May 2025 11:07:13 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v56] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 09:14:24 GMT, Roberto Casta?eda Lozano wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - remove comment line for Roberto >> - Apply suggestions from code review >> >> Co-authored-by: Roberto Casta?eda Lozano > > test/hotspot/jtreg/compiler/lib/template_framework/DataName.java line 42: > >> 40: * {@link DataName} when sampling later on. >> 41: */ >> 42: public record DataName(String name, DataName.Type type, boolean mutable, int weight) implements Name { > > Since `DataName` represents "fields and variables", you might consider renaming it to `VariableName` or `VarName` for clarity - fields can be considered variables too (as in [instance variables](https://en.wikipedia.org/wiki/Instance_variable)). This is just a minor, optional suggestion, in case you still have some fuel left for this PR ;) Here some arguments for `DataName` vs `VariableName` : - If you hear `VariableName` , you may misunderstand that it means variables only. But what about method arguments and fields? - We also want to cover constants, i.e. final fields and variables. Someone may say that constants should not be called variables. - `DataName` is abstract enough, and the user may be slightly irritated because they don't exactly understand right away what it means. That way, they would end up consulting the API description, rather than making their own (possibly false) assumptions. @chhagedorn has no strong preference, but slightly leans toward `DataName`. @mhaessig came up with `DataName`, and is for it still. I feel the same as @chhagedorn , with a slight preference to `DataName`. And of course it would cost me a few hours to implement the change, making sure all the code comments are still correct ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2102282804 From rcastanedalo at openjdk.org Thu May 22 11:20:06 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 22 May 2025 11:20:06 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v57] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 09:33:21 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Roberto Casta?eda Lozano Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2860831257 From rcastanedalo at openjdk.org Thu May 22 11:20:07 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 22 May 2025 11:20:07 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v56] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 11:03:48 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/DataName.java line 42: >> >>> 40: * {@link DataName} when sampling later on. >>> 41: */ >>> 42: public record DataName(String name, DataName.Type type, boolean mutable, int weight) implements Name { >> >> Since `DataName` represents "fields and variables", you might consider renaming it to `VariableName` or `VarName` for clarity - fields can be considered variables too (as in [instance variables](https://en.wikipedia.org/wiki/Instance_variable)). This is just a minor, optional suggestion, in case you still have some fuel left for this PR ;) > > Here some arguments for `DataName` vs `VariableName` : > - If you hear `VariableName` , you may misunderstand that it means variables only. But what about method arguments and fields? > - We also want to cover constants, i.e. final fields and variables. Someone may say that constants should not be called variables. > - `DataName` is abstract enough, and the user may be slightly irritated because they don't exactly understand right away what it means. That way, they would end up consulting the API description, rather than making their own (possibly false) assumptions. > > @chhagedorn has no strong preference, but slightly leans toward `DataName`. > @mhaessig came up with `DataName`, and is for it still. > I feel the same as @chhagedorn , with a slight preference to `DataName`. And of course it would cost me a few hours to implement the change, making sure all the code comments are still correct ;) Fair enough, thanks for considering the suggestion! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2102304496 From mdoerr at openjdk.org Thu May 22 11:41:55 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 22 May 2025 11:41:55 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 06:09:25 GMT, Amit Kumar wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > improved mvc implementation The new proposal is probably ok, then. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24480#pullrequestreview-2860895458 From fjiang at openjdk.org Thu May 22 11:46:52 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 22 May 2025 11:46:52 GMT Subject: RFR: 8357056: RISC-V: Asm fixes - load/store width [v3] In-Reply-To: <0nyEXe_gpWn158OHmSJMEAPk_MMn9EXLctWeipKZQd4=.b89cbb07-50a5-4466-862e-951d7a6a9059@github.com> References: <0nyEXe_gpWn158OHmSJMEAPk_MMn9EXLctWeipKZQd4=.b89cbb07-50a5-4466-862e-951d7a6a9059@github.com> Message-ID: On Thu, 22 May 2025 06:56:36 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> While working on https://github.com/openjdk/jdk/pull/25252, I notice: >> - Major op code was just repeat >> - Width coded in binary >> - Stores have mixed up rs1 and rs2 >> - Bonus, fsd used a macro for no reason >> >> I think this improves readability. >> >> Tested tier1 >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Rd/Rs->Rs2/Rs1 > - Merge branch 'master' into asm_fixes > - Fixed flh/flw/fld > - Merge branch 'master' into asm_fixes > - Fixes Thanks! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/25253#pullrequestreview-2860909072 From bkilambi at openjdk.org Thu May 22 11:58:18 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 22 May 2025 11:58:18 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v5] In-Reply-To: References: Message-ID: <-LdLobbf_wMuaEd7e4ietBnJHhDBJFUWk7Hw2EdnmuY=.c7fa1951-8874-4f56-afe7-75341e748899@github.com> > This patch adds aarch64 backend (both Neon and SVE) for FP16 vector operations - add, mul, sub, div, min, max, sqrt and fma. > > Testing: > JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) pass on aarch64 which also includes the JTREG test to test the FP16 vector operations - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: Reduce @Warmup from 10000 to 50 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25096/files - new: https://git.openjdk.org/jdk/pull/25096/files/bb6235aa..710edeec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25096&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25096&range=03-04 Stats: 11 lines in 1 file changed: 0 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/25096.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25096/head:pull/25096 PR: https://git.openjdk.org/jdk/pull/25096 From bkilambi at openjdk.org Thu May 22 11:58:18 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 22 May 2025 11:58:18 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v4] In-Reply-To: References: <5OezxGXLCvvauaNiX7FkOacjbwvvB-sc3k8MdEjKmwo=.8d69862a-feee-4dbe-bcf9-b53620f823f7@github.com> <2gqcpkBJkkrb21MZlPPHMWqcVWRNEz0KXdolQxd8CkI=.80a3abdb-7a11-4aec-b307-66874de81757@github.com> Message-ID: <4DqS6Sfys6GAONfkAQk8KNGyuqD7UvgJ8t5taO99goU=.907d894e-872a-45de-b9cb-aaf44a3ffd35@github.com> On Thu, 22 May 2025 10:40:39 GMT, Emanuel Peter wrote: >> @Bhavana-Kilambi The TestFramework actually forces C2 compilation: >> - runs warmup iterations, maybe C2 triggers automatically because there are enough iterations. >> - Once warmup is over, the TestFramework checks if the method is already compiled, if not, it enqueues it. >> - In the end, we know it is C2 compiled, which gives us the C2 IR we can match with. >> >> In my experience, having low warmup count works in most cases. Except when you need profiling data. If you have zero warup, we basically have compilation with `-Xcomp`. >> >> So it really depends on your specific case. In general, I would avoid doing an `Xcomp` compilation / zero warmup, because then we do not test normal compilation with profiling. And compilation with profiling is more important I think. >> >> But in cases where you have a large loop in the test method, we would trigger OSR and normal compilation with profiling rather soon anyway. So lowering the warmup is ok. How many loop iterations do we need for OSR? >> `product(intx, Tier4BackEdgeThreshold, 40000`. We could round that up to `100_000`, just to be sure. With `LEN = 2048`, you would thus only need about `50` invocations of the tests during warmup to reach C2 compilation. Hence, the current `@Warmup(10000)` is much too high, I think. You could cut down the runtime by about a factor of `100` here, if my math is correct :exploding_head: >> >> What do you think? > >> Hi @eme64 Thanks for the details and suggestions. I tried with a `@Warmpup(50) `(my calculation is 50 * 2048 = 102400 which is around 100_000) and the test passes on aarch64 (it passes even with 0 warmp though). Do you think we can go ahead with `@Warmup(50)` ? > > Sounds good :) > >> Also, can I ask if any other tests failed on your side (they shouldn't though as I havent touched any other code other than FP16)? > > There was no other related test failure :) Hi @eme64 I have updated the testcase. Could you please test it now if it works? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25096#issuecomment-2900946724 From epeter at openjdk.org Thu May 22 12:02:15 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 May 2025 12:02:15 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v4] In-Reply-To: <4DqS6Sfys6GAONfkAQk8KNGyuqD7UvgJ8t5taO99goU=.907d894e-872a-45de-b9cb-aaf44a3ffd35@github.com> References: <5OezxGXLCvvauaNiX7FkOacjbwvvB-sc3k8MdEjKmwo=.8d69862a-feee-4dbe-bcf9-b53620f823f7@github.com> <2gqcpkBJkkrb21MZlPPHMWqcVWRNEz0KXdolQxd8CkI=.80a3abdb-7a11-4aec-b307-66874de81757@github.com> <4DqS6Sfys6GAONfkAQk8KNGyuqD7UvgJ8t5taO99goU=.907d894e-872a-45de-b9cb-aaf44a3ffd35@github.com> Message-ID: On Thu, 22 May 2025 11:54:35 GMT, Bhavana Kilambi wrote: >>> Hi @eme64 Thanks for the details and suggestions. I tried with a `@Warmpup(50) `(my calculation is 50 * 2048 = 102400 which is around 100_000) and the test passes on aarch64 (it passes even with 0 warmp though). Do you think we can go ahead with `@Warmup(50)` ? >> >> Sounds good :) >> >>> Also, can I ask if any other tests failed on your side (they shouldn't though as I havent touched any other code other than FP16)? >> >> There was no other related test failure :) > > Hi @eme64 I have updated the testcase. Could you please test it now if it works? Thanks! @Bhavana-Kilambi launched! ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25096#issuecomment-2900968904 From mli at openjdk.org Thu May 22 12:41:55 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 22 May 2025 12:41:55 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v14] In-Reply-To: References: <-Ig2biJjwMoR79hyYfSNxJLqavcMzVyLFZvnV0J_t90=.4eb702b5-2cbd-40c6-81fd-744a2fe98acf@github.com> Message-ID: On Wed, 21 May 2025 12:43:46 GMT, Hamlin Li wrote: >>> > Thanks for your review! I think the above test results may not fully reflect the difference in the impact of aligned and unaligned on the tail? I understand that if the dest address is aligned, the above aligned section has 0 to 4 less store instructions than the following section. I can remove it and test jmh to see how it performs >>> >>> Based on your last jmh data (check `MemorySegmentFillUnsafe.unsafe true 7` and `MemorySegmentFillUnsafe.unsafe false 7`, and others <= 7, they're the same. I guess the pipeline and store buffer deal with this continuous stores well enough. >> >> The last change on the code after L_fill_elements make the byte store 1 by 1, it seems that 'true' or 'false' may not affect it when the count is less or equal 7? >> I think the above align section is like a fast path which can reduce instruction for the count is large than 7, so we may should check the result when count > 7 ? besides, I'm testing on that right now(delete the align tail part), we can find it out later > >> The last change on the code after L_fill_elements make the byte store 1 by 1, it seems that 'true' or 'false' may not affect it when the count is less or equal 7? > > Ah, I see. > >> I think the above align section is like a fast path which can reduce instruction for the count is large than 7, so we may should check the result when count > 7 ? besides, I'm testing on that right now(delete the align tail part), we can find it out later > > Thanks, let's see the test result. > @Hamlin-Li The original branch rarely walks, which is almost never tested, and the performance impact of the ?fast? path is relatively small. I think you are right, it is suitable to be deleted Thanks for testing! As there is some regression, maybe it's better to keep it. Looks good to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2901081836 From duke at openjdk.org Thu May 22 12:47:10 2025 From: duke at openjdk.org (duke) Date: Thu, 22 May 2025 12:47:10 GMT Subject: Withdrawn: 8342662: C2: Add new phase for backend-specific lowering In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 04:11:03 GMT, Jasmine Karthikeyan wrote: > Hi all, > This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. > > In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21599 From rehn at openjdk.org Thu May 22 13:15:57 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 22 May 2025 13:15:57 GMT Subject: RFR: 8357056: RISC-V: Asm fixes - load/store width [v3] In-Reply-To: <0nyEXe_gpWn158OHmSJMEAPk_MMn9EXLctWeipKZQd4=.b89cbb07-50a5-4466-862e-951d7a6a9059@github.com> References: <0nyEXe_gpWn158OHmSJMEAPk_MMn9EXLctWeipKZQd4=.b89cbb07-50a5-4466-862e-951d7a6a9059@github.com> Message-ID: On Thu, 22 May 2025 06:56:36 GMT, Robbin Ehn wrote: >> Hi, please consider. >> >> While working on https://github.com/openjdk/jdk/pull/25252, I notice: >> - Major op code was just repeat >> - Width coded in binary >> - Stores have mixed up rs1 and rs2 >> - Bonus, fsd used a macro for no reason >> >> I think this improves readability. >> >> Tested tier1 >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Rd/Rs->Rs2/Rs1 > - Merge branch 'master' into asm_fixes > - Fixed flh/flw/fld > - Merge branch 'master' into asm_fixes > - Fixes Thanks all for reviewing! I created https://bugs.openjdk.org/browse/JDK-8357566. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25253#issuecomment-2901185959 From jbhateja at openjdk.org Thu May 22 13:21:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 22 May 2025 13:21:54 GMT Subject: RFR: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry In-Reply-To: References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: <1eBhyC6b5OkigQi60_Zh_0b9Sjwh_nEZ5797xehEBzI=.a1dd4a13-7b37-470d-a36a-c8561365ad69@github.com> On Wed, 21 May 2025 20:02:27 GMT, Sandhya Viswanathan wrote: >>> @jatin-bhateja I'll run some internal testing, please ping me in 24h for results! :) >> >> Please use the latest version > >> > @jatin-bhateja @sviswa7 Can you explain the impact of the `EVEX_HVM`, `EVEX_QVM` etc, and what is the impact if we get them wrong? Performance? Wrong results? How can we test that they are correct? >> >> @eme64 In EVEX the displacement for memory in the addressing mode is encoded using compressed disp8 encoding scheme. The EVEX_FVM, EVEX_HVM, EVEX_QVM etc denote tuple type and are used to determine the scaling factor for displacement. Please see section "2.7.5 Compressed Displacement (disp8*N) Support in EVEX" in [Intel SDM Volume 2](https://cdrdv2.intel.com/v1/dl/getContent/671110). So to answer your question, if the tuple type is incorrect we will see wrong results if the displacement is non zero. > > For testing, the best way would be to create a SIMD instruction encoding test tool on similar lines as https://github.com/openjdk/jdk/commit/52d752c43b3a9935ea97051c39adf381084035cc in a separate future PR. > @sviswa7 Thanks for the explanations! Could we also test it with Java code that generates all sorts of address shapes, e.g. with various offsets and scaling factors? > > I'll re-run testing now, just to be sure. Hi @eme64 , On targets with AVX512 features, compressed disp8 encoding is not an optional feature, an instruction with a memory operand has a displacement which is a multiple of scale (N determined using vector length, lane size embedded broadcast flag etc) then EVEX encoding always records compressed displacement i.e. effective displ = displacement / N. I agree with the suggestion of adding test points in the assembler test tool in a separate follow-up patch as it's an activity on its own, here is the JBS tracker for it https://bugs.openjdk.org/browse/JDK-8357567 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25021#issuecomment-2901204755 From jbhateja at openjdk.org Thu May 22 13:29:11 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 22 May 2025 13:29:11 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill [v2] In-Reply-To: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: > Patch spills APX EGPRs across runtime calls to slow-path barriers using PUSH2P/POP2 instructions with PPX hints. > These instructions operate over a pair of registers resulting into an smaller save/restoration JIT code, on the hind side they have hard alignment and balancing constraints, as they operate over 16-byte aligned stack address. > ZRuntimeCallSpill is agnostic to live register, thus resulting SPILL sequence should not modify the contents of the register. > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25351/files - new: https://git.openjdk.org/jdk/pull/25351/files/79d7778e..efc4f011 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25351&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25351&range=00-01 Stats: 22 lines in 1 file changed: 10 ins; 2 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/25351.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25351/head:pull/25351 PR: https://git.openjdk.org/jdk/pull/25351 From jbhateja at openjdk.org Thu May 22 13:32:54 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 22 May 2025 13:32:54 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill In-Reply-To: References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Thu, 22 May 2025 09:57:47 GMT, Roberto Casta?eda Lozano wrote: > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. > > Have you checked that these tests exercise `ZRuntimeCallSpill` significantly? Most tests in that directory seem to exercise C2's generated ZGC barriers, which use other spilling/restoring logic across runtime calls (`SaveLiveRegisters`). Also, I expect the register pressure in these test cases to be minimal, so it could be good to randomize register assignment to improve the testing effectiveness. Finally, `ZRuntimeCallSpill` is typically used in slow paths, which are rarely exercised in short-lived test cases. Have you considered altering the users of `ZRuntimeCallSpill` so that they are forced to always, or at least more often, enter the slow path, for testing purposes? [This PR](https://github.com/openjdk/jdk/pull/18967) did something similar in the context of C2 ZGC barriers. Intel SDE allows us to collect execution traces with _-itrace_execute_emulate_ and we found quite a lot of register save/ restorations around native method, there is already an existing test point for it https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/gcbarriers/UnsafeIntrinsicsTest.java ------------- PR Comment: https://git.openjdk.org/jdk/pull/25351#issuecomment-2901240075 From chagedorn at openjdk.org Thu May 22 13:52:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 22 May 2025 13:52:07 GMT Subject: RFR: 8357568: IGV: Show NULL and numbers up to 4 characters in "Condense graph" filter Message-ID: When using the "Condense graph" filter in IGV, it would be useful to show `NULL` and numbers wider than 2 characters instead of `P` and `I/L` (fallback for larger numbers), respectively. There is a comment in `idealGrapPrinter.cpp` which says that maximally 2 chars are allowed for numbers: https://github.com/openjdk/jdk/blob/428d33ef3ca0af34d8f164fe9d9b722e81e866a7/src/hotspot/share/opto/idealGraphPrinter.cpp#L646 But we already allow larger entries today: ![image](https://github.com/user-attachments/assets/e90d0518-148f-4a33-a9e8-0bdca14aa017) I there propose to use `NULL` and allow up to 4 characters for numbers which could be a good trade-off between shortness and expressiveness. This allows us to quickly see null checks and larger constants. Without patch: ![image](https://github.com/user-attachments/assets/e450f1d2-503c-4b84-8137-25892f8ab7f9) With patch: ![image](https://github.com/user-attachments/assets/81371a53-be7e-4acd-afbf-e5613e96815a) Thanks, Christian ------------- Commit messages: - 8357568: IGV: Show NULL and numbers up to 4 characters in "Condense graph" filter Changes: https://git.openjdk.org/jdk/pull/25393/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25393&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357568 Stats: 9 lines in 1 file changed: 4 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25393.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25393/head:pull/25393 PR: https://git.openjdk.org/jdk/pull/25393 From roland at openjdk.org Thu May 22 14:08:20 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 May 2025 14:08:20 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v28] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21630/files - new: https://git.openjdk.org/jdk/pull/21630/files/66e960aa..d409deb4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=26-27 Stats: 23 lines in 2 files changed: 11 ins; 4 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From roland at openjdk.org Thu May 22 14:08:20 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 May 2025 14:08:20 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v24] In-Reply-To: References: Message-ID: <6pb3IetxpG89G3u8BkV-5Lt8pQSG98qZyGq75TRxvOU=.0d9b6fc1-bd7c-4384-bcfc-7c8861d77abf@github.com> On Wed, 21 May 2025 12:47:21 GMT, Christian Hagedorn wrote: >> Sounds good. New commit has this renaming. Question now is what we do with `ParsePredicate::trace_cloned_parse_predicate()` that wouldn't always print a message that makes sense. > > Good catch. That is now off as well. Additionally, it should probably be `TraceLoopUnswitching` and not `TraceLoopPredicate`. > > We could return the `ParsePredicate` from `clone_parse_predicate()` which is called from `CloneUnswitchedLoopPredicatesVisitor::visit()` and then call it from there. Maybe something like below? > >

> Patch Suggestion (untested) > > > Index: src/hotspot/share/opto/predicates.hpp > IDEA additional info: > Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP > <+>UTF-8 > =================================================================== > diff --git a/src/hotspot/share/opto/predicates.hpp b/src/hotspot/share/opto/predicates.hpp > --- a/src/hotspot/share/opto/predicates.hpp (revision a0cdf36bdfeca9cd8b669859700d63d5ee627458) > +++ b/src/hotspot/share/opto/predicates.hpp (date 1747831252516) > @@ -288,8 +288,6 @@ > } > > static ParsePredicateNode* init_parse_predicate(const Node* parse_predicate_proj, Deoptimization::DeoptReason deopt_reason); > - NOT_PRODUCT(static void trace_cloned_parse_predicate(bool is_false_path_loop, > - const ParsePredicateSuccessProj* success_proj);) > > public: > ParsePredicate(Node* parse_predicate_proj, Deoptimization::DeoptReason deopt_reason) > @@ -320,8 +318,8 @@ > return _success_proj; > } > > - ParsePredicate clone_to_unswitched_loop(Node* new_control, bool is_false_path_loop, > - PhaseIdealLoop* phase) const; > + ParsePredicate clone_to_unswitched_loop(Node* new_control, bool is_false_path_loop, PhaseIdealLoop* phase) const; > + NOT_PRODUCT(void trace_cloned_parse_predicate(bool is_false_path_loop) const;) > > void kill(PhaseIterGVN& igvn) const; > }; > @@ -1158,10 +1156,11 @@ > ClonePredicateToTargetLoop(LoopNode* target_loop_head, const NodeInLoopBody& node_in_loop_body, PhaseIdealLoop* phase); > > // Clones the provided Parse Predicate to the head of the current predicate chain at the target loop. > - void clone_parse_predicate(const ParsePredicate& parse_predicate, bool is_false_path_loop) { > + ParsePredicate clone_parse_predicate(const ParsePredicate& parse_predicate, bool is_false_path_loop) { > ParsePredicate cloned_parse_predicate = parse_predicate.clone_to_unswitched_loop(_old_target_loop_entry, > is_false_path_loop, _phase); > _target_loop_predicate_chain.insert_predicate(cloned_parse_predicate); > + ... Thanks for the patch. I applied it and did some smoke testing. I think there's a mistake at the end: - _clone_predicate_to_true_path_loop.clone_parse_predicate(parse_predicate, false); - _clone_predicate_to_false_path_loop.clone_parse_predicate(parse_predicate, true); + clone_parse_predicate(parse_predicate, false); + clone_parse_predicate(parse_predicate, true); and +void CloneUnswitchedLoopPredicatesVisitor::clone_parse_predicate(const ParsePredicate& parse_predicate, + const bool is_false_path_loop) { + const ParsePredicate cloned_parse_predicate = + _clone_predicate_to_true_path_loop.clone_parse_predicate(parse_predicate, false); + NOT_PRODUCT(cloned_parse_predicate.trace_cloned_parse_predicate(is_false_path_loop);) +} lines added only use `_clone_predicate_to_true_path_loop` and not `_clone_predicate_to_false_path_loop`. Commit I pushed should fix that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2102659258 From chagedorn at openjdk.org Thu May 22 14:44:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 22 May 2025 14:44:07 GMT Subject: RFR: 8357568: IGV: Show NULL and numbers up to 4 characters in "Condense graph" filter [v2] In-Reply-To: References: Message-ID: <-wSA31a4omoLNUNuEBvmDcx3uv6-C5rNhCxNaO38pEE=.79e314cd-3400-4421-bc5a-2fc944bd117f@github.com> > When using the "Condense graph" filter in IGV, it would be useful to show `NULL` and numbers wider than 2 characters instead of `P` and `I/L` (fallback for larger numbers), respectively. There is a comment in `idealGrapPrinter.cpp` which says that maximally 2 chars are allowed for numbers: > https://github.com/openjdk/jdk/blob/428d33ef3ca0af34d8f164fe9d9b722e81e866a7/src/hotspot/share/opto/idealGraphPrinter.cpp#L646 > > But we already allow larger entries today: > ![image](https://github.com/user-attachments/assets/e90d0518-148f-4a33-a9e8-0bdca14aa017) > > I there propose to use `NULL` and allow up to 4 characters for numbers which could be a good trade-off between shortness and expressiveness. This allows us to quickly see null checks and larger constants. > > Without patch: > ![image](https://github.com/user-attachments/assets/e450f1d2-503c-4b84-8137-25892f8ab7f9) > > > With patch: > ![image](https://github.com/user-attachments/assets/81371a53-be7e-4acd-afbf-e5613e96815a) > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Increase number of parameters as suggested by Manuel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25393/files - new: https://git.openjdk.org/jdk/pull/25393/files/1de489e5..4f77d003 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25393&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25393&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 4 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25393.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25393/head:pull/25393 PR: https://git.openjdk.org/jdk/pull/25393 From mhaessig at openjdk.org Thu May 22 14:44:07 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Thu, 22 May 2025 14:44:07 GMT Subject: RFR: 8357568: IGV: Show NULL and numbers up to 4 characters in "Condense graph" filter [v2] In-Reply-To: <-wSA31a4omoLNUNuEBvmDcx3uv6-C5rNhCxNaO38pEE=.79e314cd-3400-4421-bc5a-2fc944bd117f@github.com> References: <-wSA31a4omoLNUNuEBvmDcx3uv6-C5rNhCxNaO38pEE=.79e314cd-3400-4421-bc5a-2fc944bd117f@github.com> Message-ID: On Thu, 22 May 2025 14:41:02 GMT, Christian Hagedorn wrote: >> When using the "Condense graph" filter in IGV, it would be useful to show `NULL` and numbers wider than 2 characters instead of `P` and `I/L` (fallback for larger numbers), respectively. There is a comment in `idealGrapPrinter.cpp` which says that maximally 2 chars are allowed for numbers: >> https://github.com/openjdk/jdk/blob/428d33ef3ca0af34d8f164fe9d9b722e81e866a7/src/hotspot/share/opto/idealGraphPrinter.cpp#L646 >> >> But we already allow larger entries today: >> ![image](https://github.com/user-attachments/assets/e90d0518-148f-4a33-a9e8-0bdca14aa017) >> >> I there propose to use `NULL` and allow up to 4 characters for numbers which could be a good trade-off between shortness and expressiveness. This allows us to quickly see null checks and larger constants. >> >> Without patch: >> ![image](https://github.com/user-attachments/assets/e450f1d2-503c-4b84-8137-25892f8ab7f9) >> >> >> With patch: >> ![image](https://github.com/user-attachments/assets/81371a53-be7e-4acd-afbf-e5613e96815a) >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Increase number of parameters as suggested by Manuel That is a nice convenience. Thank you for the improvement. It looks good to me :slightly_smiling_face: ------------- Marked as reviewed by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/25393#pullrequestreview-2861519452 From chagedorn at openjdk.org Thu May 22 14:44:07 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 22 May 2025 14:44:07 GMT Subject: RFR: 8357568: IGV: Show NULL and numbers up to 4 characters in "Condense graph" filter In-Reply-To: References: Message-ID: On Thu, 22 May 2025 13:46:39 GMT, Christian Hagedorn wrote: > When using the "Condense graph" filter in IGV, it would be useful to show `NULL` and numbers wider than 2 characters instead of `P` and `I/L` (fallback for larger numbers), respectively. There is a comment in `idealGrapPrinter.cpp` which says that maximally 2 chars are allowed for numbers: > https://github.com/openjdk/jdk/blob/428d33ef3ca0af34d8f164fe9d9b722e81e866a7/src/hotspot/share/opto/idealGraphPrinter.cpp#L646 > > But we already allow larger entries today: > ![image](https://github.com/user-attachments/assets/e90d0518-148f-4a33-a9e8-0bdca14aa017) > > I there propose to use `NULL` and allow up to 4 characters for numbers which could be a good trade-off between shortness and expressiveness. This allows us to quickly see null checks and larger constants. > > Without patch: > ![image](https://github.com/user-attachments/assets/e450f1d2-503c-4b84-8137-25892f8ab7f9) > > > With patch: > ![image](https://github.com/user-attachments/assets/81371a53-be7e-4acd-afbf-e5613e96815a) > > Thanks, > Christian Thanks Manuel for your review! As suggested, I also removed the limit for parameters which cannot exceed 255 which is more than 4 characters (e.g. `P123`). ------------- PR Comment: https://git.openjdk.org/jdk/pull/25393#issuecomment-2901490441 From bkilambi at openjdk.org Thu May 22 14:59:57 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 22 May 2025 14:59:57 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v4] In-Reply-To: References: <5OezxGXLCvvauaNiX7FkOacjbwvvB-sc3k8MdEjKmwo=.8d69862a-feee-4dbe-bcf9-b53620f823f7@github.com> <2gqcpkBJkkrb21MZlPPHMWqcVWRNEz0KXdolQxd8CkI=.80a3abdb-7a11-4aec-b307-66874de81757@github.com> <4DqS6Sfys6GAONfkAQk8KNGyuqD7UvgJ8t5taO99goU=.907d894e-872a-45de-b9cb-aaf44a3ffd35@github.com> Message-ID: On Thu, 22 May 2025 11:58:46 GMT, Emanuel Peter wrote: >> Hi @eme64 I have updated the testcase. Could you please test it now if it works? Thanks! > > @Bhavana-Kilambi launched! ? Hi @eme64 Hope the tests have passed ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25096#issuecomment-2901549543 From fjiang at openjdk.org Thu May 22 15:00:54 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 22 May 2025 15:00:54 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v17] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 10:55:19 GMT, Anjian-Wen wrote: >> From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time >> >> on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below >> >> before the patch >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op >> MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op >> MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op >> MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op >> MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op >> MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op >> MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op >> MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op >> MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op >> MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op >> MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op >> MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op >> MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op >> MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op >> MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op >> MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op >> MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op >> MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op >> MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/o... > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > add new line for bind Marked as reviewed by fjiang (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23890#pullrequestreview-2861583105 From dlunden at openjdk.org Thu May 22 15:04:01 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 22 May 2025 15:04:01 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v7] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 14:17:09 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. >> >> My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies. >> >> A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. >> >> - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive. >> - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly. >> - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix uncommon_freq Thanks for working on this @merykitty! Here is my long overdue review. I've studied both the existing logic in `PhaseChaitin::Split` (took quite some time) as well as your changes. First, let me explain my understanding of the problem. Let me know if I've misunderstood something. In pass 1 during splitting, we get to a loop Phi with (at least) one unknown input, and determine from the other known inputs that the Phi is UP. Later on, it turns out one of the unknown inputs actually went DOWN, and additionally in a high frequency block. The Phi prematurely set to UP now forces us to put an expensive reload in the high frequency block. > A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. I can reproduce the improvements for `LoopCounterBench.java`, so that looks good. Unfortunately, benchmarks (DaCapo, Renaissance, SPECjvm, SPECjbb) indicate more regressions than improvements. I've merged your changes with `master` and will rerun the benchmarks to double check. Regarding the changeset, I think the code looks good overall. Can you elaborate more regarding the motivation behind the decision table and 10% threshold in `should_spill_before_loop`? What experiments did you run to confirm these are good choices? Perhaps the table and/or the threshold can be tuned to avoid the regressions found during the benchmarking above? While I do think your suggested changes look reasonable, it seems to me that the fundamental problem is really that we only have partial reaching definitions information available when inserting Phis? I'm guessing it is too expensive to compute complete reaching defitions in the beginning of `PhaseChaitin::Split`, and to then update the reaching definitions continuously during splitting (because we add new definitions while splitting)? Would it perhaps be possible to first perform all the non-Phi splits, and then add required Phis afterwards? Just thinking out loud, there is likely some important interaction that I'm not aware of. src/hotspot/share/opto/gcm.cpp line 2305: > 2303: } > 2304: } > 2305: Can you explain this removal? src/hotspot/share/opto/reg_split.cpp line 493: > 491: > 492: // Decide the action at the loop entry based on whether the live range is used and whether it needs spilling > 493: // Common means that the frequency the action performed is more than 10% the frequency of the loop head Suggestion: // Common means that the frequency of the action is more than 10% the frequency of the loop head src/hotspot/share/opto/reg_split.cpp line 494: > 492: // Decide the action at the loop entry based on whether the live range is used and whether it needs spilling > 493: // Common means that the frequency the action performed is more than 10% the frequency of the loop head > 494: // Untaken means that the frequency the action performed is not more than the frequency of the loop entry (outside the loop) Suggestion: // Untaken means that the frequency of the action is not more than the frequency of the loop entry (outside the loop) src/hotspot/share/opto/reg_split.cpp line 507: > 505: // Untaken | Reload | Reload | None > 506: // > 507: // In general, if a live range is spilt more than it is used, we try to eagerly spill it, and vice versa, spilt (UK) -> spilled (US) to match the current spelling in the source code. Also in other places. src/hotspot/share/opto/reg_split.cpp line 513: > 511: static SpillAction should_spill_before_loop(const PhaseCFG& cfg, PhaseChaitin& chaitin, CFGLoop* loop, uint lrg_idx, const LRG& lrg) { > 512: constexpr double uncommon_threshold = 0.1; > 513: assert(&chaitin.lrgs(lrg_idx) == &lrg, "must be"); Please add a more descriptive assert message than the generic "must be". src/hotspot/share/opto/reg_split.cpp line 522: > 520: Block* b = cfg.get_block(bidx); > 521: if (!loop->in_loop_nest(b)) { > 522: continue; Is there not a more efficient way to iterate through all the loops in the loop nest? src/hotspot/share/opto/reg_split.cpp line 525: > 523: } > 524: > 525: // Implementation details: high pressure only records the start idx, not the end idx Suggestion: // Implementation detail: we only record the start idx of the high pressure within the block; there is no end idx This is my understanding of how the high pressure tracking works, but please correct me if I'm wrong. src/hotspot/share/opto/reg_split.cpp line 527: > 525: // Implementation details: high pressure only records the start idx, not the end idx > 526: if (is_high_pressure(b, &lrg, b->end_idx())) { > 527: // If a node needs spilling in a child loop, we can spill it at the child entry, too. Choose the best option. I'm a bit confused with "Choose the best option". Am I correct to intepret this as "do not spill at the loop entry if there is a better option further down the loop tree that we will reach later on in pass 1"? Would be great to elaborate a bit in the comment (also further down, for the detection of uses). ------------- Changes requested by dlunden (Committer). PR Review: https://git.openjdk.org/jdk/pull/21472#pullrequestreview-2861541732 PR Review Comment: https://git.openjdk.org/jdk/pull/21472#discussion_r2102752992 PR Review Comment: https://git.openjdk.org/jdk/pull/21472#discussion_r2102757048 PR Review Comment: https://git.openjdk.org/jdk/pull/21472#discussion_r2102757834 PR Review Comment: https://git.openjdk.org/jdk/pull/21472#discussion_r2102762088 PR Review Comment: https://git.openjdk.org/jdk/pull/21472#discussion_r2102763947 PR Review Comment: https://git.openjdk.org/jdk/pull/21472#discussion_r2102766052 PR Review Comment: https://git.openjdk.org/jdk/pull/21472#discussion_r2102770416 PR Review Comment: https://git.openjdk.org/jdk/pull/21472#discussion_r2102771540 From epeter at openjdk.org Thu May 22 15:07:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 22 May 2025 15:07:55 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v4] In-Reply-To: References: <5OezxGXLCvvauaNiX7FkOacjbwvvB-sc3k8MdEjKmwo=.8d69862a-feee-4dbe-bcf9-b53620f823f7@github.com> <2gqcpkBJkkrb21MZlPPHMWqcVWRNEz0KXdolQxd8CkI=.80a3abdb-7a11-4aec-b307-66874de81757@github.com> <4DqS6Sfys6GAONfkAQk8KNGyuqD7UvgJ8t5taO99goU=.907d894e-872a-45de-b9cb-aaf44a3ffd35@github.com> Message-ID: On Thu, 22 May 2025 14:57:39 GMT, Bhavana Kilambi wrote: >> @Bhavana-Kilambi launched! ? > > Hi @eme64 Hope the tests have passed ! @Bhavana-Kilambi We are at about `75%`, often the last `25%` takes a little longer when some platforms have a high load. Just ping me again tomorrow ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25096#issuecomment-2901576083 From fjiang at openjdk.org Thu May 22 15:10:11 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 22 May 2025 15:10:11 GMT Subject: RFR: 8357460: RISC-V: Optimize array fill stub for small size Message-ID: Please consider. As discussed in https://github.com/openjdk/jdk/pull/23890#discussion_r2094920943, we can also further optimize the array fill stub by unrolling the storage of values when the size is less than 8. This PR also removes the **aligned tail part** with the consideration of code size and testing coverage. As the test reveals there are no significant regressions. Before: Benchmark (size) Mode Cnt Score Error Units ArrayFill.fillByteArray 7 avgt 12 27.215 ? 0.073 ns/op ArrayFill.fillByteArray 15 avgt 12 32.687 ? 0.904 ns/op ArrayFill.fillIntArray 7 avgt 12 28.629 ? 0.006 ns/op ArrayFill.fillIntArray 15 avgt 12 29.351 ? 0.009 ns/op ArrayFill.fillShortArray 7 avgt 12 30.776 ? 0.006 ns/op ArrayFill.fillShortArray 15 avgt 12 31.724 ? 0.447 ns/op ArrayFill.zeroByteArray 7 avgt 12 27.199 ? 0.006 ns/op ArrayFill.zeroByteArray 15 avgt 12 32.685 ? 0.900 ns/op ArrayFill.zeroIntArray 7 avgt 12 28.630 ? 0.007 ns/op ArrayFill.zeroIntArray 15 avgt 12 29.352 ? 0.011 ns/op ArrayFill.zeroShortArray 7 avgt 12 30.776 ? 0.006 ns/op ArrayFill.zeroShortArray 15 avgt 12 31.497 ? 0.012 ns/op After: Benchmark (size) Mode Cnt Score Error Units ArrayFill.fillByteArray 7 avgt 12 20.137 ? 0.042 ns/op ArrayFill.fillByteArray 15 avgt 12 32.928 ? 0.004 ns/op ArrayFill.fillIntArray 7 avgt 12 28.630 ? 0.004 ns/op ArrayFill.fillIntArray 15 avgt 12 29.344 ? 0.005 ns/op ArrayFill.fillShortArray 7 avgt 12 31.494 ? 0.004 ns/op ArrayFill.fillShortArray 15 avgt 12 31.492 ? 0.008 ns/op ArrayFill.zeroByteArray 7 avgt 12 19.980 ? 0.164 ns/op ArrayFill.zeroByteArray 15 avgt 12 32.927 ? 0.004 ns/op ArrayFill.zeroIntArray 7 avgt 12 28.629 ? 0.005 ns/op ArrayFill.zeroIntArray 15 avgt 12 29.346 ? 0.006 ns/op ArrayFill.zeroShortArray 7 avgt 12 32.193 ? 0.027 ns/op ArrayFill.zeroShortArray 15 avgt 12 31.495 ? 0.010 ns/op Testing: - [x] tier1 ------------- Commit messages: - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-optimize-generate-fill - optimize array fill stub for small size Changes: https://git.openjdk.org/jdk/pull/25350/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25350&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357460 Stats: 61 lines in 1 file changed: 9 ins; 27 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/25350.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25350/head:pull/25350 PR: https://git.openjdk.org/jdk/pull/25350 From roland at openjdk.org Thu May 22 15:20:14 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 May 2025 15:20:14 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> > An `Initialize` node for an `Allocate` node is created with a memory > `Proj` of adr type raw memory. In order for stores to be captured, the > memory state out of the allocation is a `MergeMem` with slices for the > various object fields/array element set to the raw memory `Proj` of > the `Initialize` node. If `Phi`s need to be created during later > transformations from this memory state, The `Phi` for a particular > slice gets its adr type from the type of the `Proj` which is raw > memory. If during macro expansion, the `Allocate` is found to have no > use and so can be removed, the `Proj` out of the `Initialize` is > replaced by the memory state on input to the `Allocate`. A `Phi` for > some slice for a field of an object will end up with the raw memory > state on input to the `Allocate` node. As a result, memory state at > the `Phi` is incorrect and incorrect execution can happen. > > The fix I propose is, rather than have a single `Proj` for the memory > state out of the `Initialize` with adr type raw memory, to use one > `Proj` per slice added to the memory state after the `Initalize`. Each > of the `Proj` should return the right adr type for its slice. For that > I propose having a new type of `Proj`: `NarrowMemProj` that captures > the right adr type. > > Logic for the construction of the `Allocate`/`Initialize` subgraph is > tweaked so the right adr type captured in is own `NarrowMemProj` is > added to the memory sugraph. Code that removes an allocation or moves > it also has to be changed so it correctly takes the multiple memory > projections out of the `Initialize` node into account. > > One tricky issue is that when EA split types for a scalar replaceable > `Allocate` node: > > 1- the adr type captured in the `NarrowMemProj` becomes out of sync > with the type of the slices for the allocation > > 2- before EA, the memory state for one particular field out of the > `Initialize` node can be used for a `Store` to the just allocated > object or some other. So we can have a chain of `Store`s, some to > the newly allocated object, some to some other objects, all of them > using the state of `NarrowMemProj` out of the `Initialize`. After > split unique types, the `NarrowMemProj` is for the slice of a > particular allocation. So `Store`s to some other objects shouldn't > use that memory state but the memory state before the `Allocate`. > > For that, I added logic to update the adr type of `NarrowMemProj` > during split unique types and update the memory input of `Store`s that > don't depend on the memory state ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24570/files - new: https://git.openjdk.org/jdk/pull/24570/files/a6c6c044..43c6f822 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24570&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24570/head:pull/24570 PR: https://git.openjdk.org/jdk/pull/24570 From roland at openjdk.org Thu May 22 15:20:20 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 May 2025 15:20:20 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v7] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> Message-ID: <8oRjljJRefHHVQITVQLllm0M1wIts72_ViO3s8FQcwU=.8341c74b-ddfd-4155-a0b0-358f6d8063a2@github.com> On Wed, 21 May 2025 10:11:50 GMT, Roberto Casta?eda Lozano wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 33 additional commits since the last revision: >> >> - new test tweak >> - new test >> - Merge branch 'master' into JDK-8327963 >> - Merge branch 'master' into JDK-8327963 >> - typo >> - more >> - more >> - more >> - more >> - more >> - ... and 23 more: https://git.openjdk.org/jdk/compare/ab166f28...a6c6c044 > > test/hotspot/jtreg/compiler/macronodes/TestEarlyEliminationOfAllocationWithoutUse.java line 54: > >> 52: private static void test1(A otherA, boolean[] flags) { >> 53: if (flags == null) { >> 54: } > > Consider removing these two lines, which do not seem essential to reproduce the issue. Removing them also gets rid of the deoptimization, which simplifies the failure analysis. Good catch. Done in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2102822733 From dskantz at openjdk.org Thu May 22 15:24:31 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Thu, 22 May 2025 15:24:31 GMT Subject: RFR: 8357105: C2: compilation fails with "assert(false) failed: empty program detected during loop optimization" Message-ID: This pull request contains a fix for JDK-8357105. The problem is performing stacked string concatenation optimization between a pair of StringBuilder.append().toString()-links SB1 and SB2, where the parameter of an append call in SB2 has a complex dependency on the result of SB1, which in turn is replaced by top() during stringopts -- similar to JDK-8271341, which had a diamond if-structure using the result of SB1, while in this case the use is an unstable If. In the attached regression test, a live part of the graph gets optimized away during later phases and ultimately the whole graph vanishes. The proposed solution is to simply exclude this specific case. This bug has existed for a long time and stacked concats is a niche optimization. Testing: Tier1-4. Extra testing: Ran Tier1-4 with an instrumented build and observed that we do not disable stacked concatenation in any previously known case after the fix. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/25395/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25395&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357105 Stats: 58 lines in 2 files changed: 56 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25395.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25395/head:pull/25395 PR: https://git.openjdk.org/jdk/pull/25395 From never at openjdk.org Thu May 22 15:34:00 2025 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 22 May 2025 15:34:00 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v5] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 20:59:33 GMT, Doug Simon wrote: >> As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: >> >> >> Error occurred during initialization of VM >> java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) >> >> >> This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. >> Since these tests were passing on libgraal prior to JDK-8356447, they obviously do not require JIT compilation. The simplest fix is to then use `-Xint` to disable the JIT. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > removed trailing space This seems reasonable to me. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25307#pullrequestreview-2861697506 From mchevalier at openjdk.org Thu May 22 15:34:15 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 22 May 2025 15:34:15 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll [v2] In-Reply-To: References: Message-ID: > This assert seems a bit too tight. See the JBS issue to check the math: the bound of `trip_count` should be `<= 2^31`, while the current bound is ` < (julong)max_juint/2` = floor((2^32-1)/2) = (2^32-2) / 2 = 2^31-1. Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - A test - Merge branch 'master' into fix/do_unroll-assert - Relax the assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25295/files - new: https://git.openjdk.org/jdk/pull/25295/files/f4084179..9c44f068 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25295&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25295&range=00-01 Stats: 60447 lines in 903 files changed: 35531 ins; 20408 del; 4508 mod Patch: https://git.openjdk.org/jdk/pull/25295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25295/head:pull/25295 PR: https://git.openjdk.org/jdk/pull/25295 From mchevalier at openjdk.org Thu May 22 15:39:51 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 22 May 2025 15:39:51 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll In-Reply-To: <8q08MdGP6Oo_brVI0kxsPYm1XSH1wYEguhJbnD0i1LI=.ea512821-a21c-453b-96e6-b64f7e5fb94d@github.com> References: <8q08MdGP6Oo_brVI0kxsPYm1XSH1wYEguhJbnD0i1LI=.ea512821-a21c-453b-96e6-b64f7e5fb94d@github.com> Message-ID: On Wed, 21 May 2025 14:48:16 GMT, Christian Hagedorn wrote: >> This assert seems a bit too tight. See the JBS issue to check the math: the bound of `trip_count` should be `<= 2^31`, while the current bound is ` < (julong)max_juint/2` = floor((2^32-1)/2) = (2^32-2) / 2 = 2^31-1. > > Drive-by comment: Were you able to extract a regression test that does not require the stress peeling flag? Thanks to @chhagedorn's help, there comes a test without stress flag. Thanks! We also looked more in details, and we cannot reach the case where new trip count would be 2^31 because this branch is taken only if `has_exact_trip_count()` is true, which means the trip count was set with `set_exact_trip_count` which happens only here: https://github.com/openjdk/jdk/blob/e961b13cd68bc352b86af17c7e53df8537519beb/src/hotspot/share/opto/loopTransform.cpp#L133-L141 so only with a trip count < `max_juint` = 2^32-1, so at most 2^32-2. We are safe! Overall, the patch is now: - an additional assert about the old trip count - changing `<` into `<=` making the sanity just as obvious - adding a test ------------- PR Comment: https://git.openjdk.org/jdk/pull/25295#issuecomment-2901694311 From roland at openjdk.org Thu May 22 15:57:02 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 22 May 2025 15:57:02 GMT Subject: RFR: 8354383: C2: enable sinking of Type nodes out of loop Message-ID: `PhaseIdealLoop::try_sink_out_of_loop()` excludes `Type` nodes because we ran into some issues where a `Type` node is sunk and then becomes `top` but the control path of its uses doesn't become unreachable. 8349479 should have fixed that so that exception no longer makes sense. ------------- Commit messages: - more - fix Changes: https://git.openjdk.org/jdk/pull/25396/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25396&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354383 Stats: 6 lines in 1 file changed: 5 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25396.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25396/head:pull/25396 PR: https://git.openjdk.org/jdk/pull/25396 From dnsimon at openjdk.org Thu May 22 17:04:00 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 22 May 2025 17:04:00 GMT Subject: Integrated: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 In-Reply-To: References: Message-ID: On Mon, 19 May 2025 17:50:21 GMT, Doug Simon wrote: > As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: > > > Error occurred during initialization of VM > java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) > > > This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. > Since these tests were passing on libgraal prior to JDK-8356447, they obviously do not require JIT compilation. The simplest fix is to then use `-Xint` to disable the JIT. This pull request has now been integrated. Changeset: 1258af42 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/1258af42bec92a2797897cb6126b60b582a29d76 Stats: 7 lines in 2 files changed: 7 ins; 0 del; 0 mod 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 Reviewed-by: never, yzheng ------------- PR: https://git.openjdk.org/jdk/pull/25307 From dnsimon at openjdk.org Thu May 22 17:03:59 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 22 May 2025 17:03:59 GMT Subject: RFR: 8357135: java.lang.OutOfMemoryError: Error creating or attaching to libjvmci after JDK-8356447 [v5] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 20:59:33 GMT, Doug Simon wrote: >> As of [JDK-8356447](https://bugs.openjdk.org/browse/JDK-8356447), libgraal initialization happens during VM startup. If during this initialization, the libgraal heap cannot be created due to lack of virtual address space, the VM will exit with: >> >> >> Error occurred during initialization of VM >> java.lang.OutOfMemoryError: Error creating or attaching to libjvmci (err: -1000000801, description: Reserving address space for the new isolate failed.) >> >> >> This causes problems for tests that limit the virtual address space with `ulimit -v` such as `gc/arguments/TestUseCompressedOopsFlagsWithUlimit.java` and `vmTestbase/nsk/jvmti/Allocate/alloc001/alloc001.java`. >> Since these tests were passing on libgraal prior to JDK-8356447, they obviously do not require JIT compilation. The simplest fix is to then use `-Xint` to disable the JIT. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > removed trailing space Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25307#issuecomment-2901962121 From jbhateja at openjdk.org Thu May 22 17:42:06 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 22 May 2025 17:42:06 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill [v3] In-Reply-To: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: > Patch spills APX EGPRs across runtime calls to slow-path barriers using PUSH2P/POP2 instructions with PPX hints. > These instructions operate over a pair of registers resulting into an smaller save/restoration JIT code, on the hind side they have hard alignment and balancing constraints, as they operate over 16-byte aligned stack address. > ZRuntimeCallSpill is agnostic to live register, thus resulting SPILL sequence should not modify the contents of the register. > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25351/files - new: https://git.openjdk.org/jdk/pull/25351/files/efc4f011..9b5c2ac4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25351&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25351&range=01-02 Stats: 7 lines in 1 file changed: 3 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25351.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25351/head:pull/25351 PR: https://git.openjdk.org/jdk/pull/25351 From sviswanathan at openjdk.org Thu May 22 17:49:53 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 22 May 2025 17:49:53 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill [v3] In-Reply-To: References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Thu, 22 May 2025 17:42:06 GMT, Jatin Bhateja wrote: >> Patch spills APX EGPRs across runtime calls to slow-path barriers using PUSH2P/POP2 instructions with PPX hints. >> These instructions operate over a pair of registers resulting into an smaller save/restoration JIT code, on the hind side they have hard alignment and balancing constraints, as they operate over 16-byte aligned stack address. >> ZRuntimeCallSpill is agnostic to live register, thus resulting SPILL sequence should not modify the contents of the register. >> >> Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25351#pullrequestreview-2862099217 From dnsimon at openjdk.org Thu May 22 18:03:31 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 22 May 2025 18:03:31 GMT Subject: RFR: 8357581: [JVMCI] Add ProfilingInfo.getDecompileCount Message-ID: Graal is adding enhanced logic to detect deoptimization cycles and needs to be able to query a method's decompilation counter (i.e. `MethodData::_compiler_counters._nof_decompiles`). This PR adds the `HotSpotProfilingInfo` interface so that such HotSpot-specific profiling info can be accessed. The change looks bigger in the GitHub review UI than it really is. I have simply renamed the pre-existing `HotSpotProfilingInfo` private class as `HotSpotProfilingInfoImpl` and repurposed the `HotSpotProfilingInfo` name for the *new* public interface. ------------- Commit messages: - added HotSpotProfilingInfo Changes: https://git.openjdk.org/jdk/pull/25397/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25397&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357581 Stats: 235 lines in 5 files changed: 17 ins; 194 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/25397.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25397/head:pull/25397 PR: https://git.openjdk.org/jdk/pull/25397 From kbarrett at openjdk.org Thu May 22 18:08:52 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 22 May 2025 18:08:52 GMT Subject: RFR: 8357568: IGV: Show NULL and numbers up to 4 characters in "Condense graph" filter [v2] In-Reply-To: <-wSA31a4omoLNUNuEBvmDcx3uv6-C5rNhCxNaO38pEE=.79e314cd-3400-4421-bc5a-2fc944bd117f@github.com> References: <-wSA31a4omoLNUNuEBvmDcx3uv6-C5rNhCxNaO38pEE=.79e314cd-3400-4421-bc5a-2fc944bd117f@github.com> Message-ID: On Thu, 22 May 2025 14:44:07 GMT, Christian Hagedorn wrote: >> When using the "Condense graph" filter in IGV, it would be useful to show `NULL` and numbers wider than 2 characters instead of `P` and `I/L` (fallback for larger numbers), respectively. There is a comment in `idealGrapPrinter.cpp` which says that maximally 2 chars are allowed for numbers: >> https://github.com/openjdk/jdk/blob/428d33ef3ca0af34d8f164fe9d9b722e81e866a7/src/hotspot/share/opto/idealGraphPrinter.cpp#L646 >> >> But we already allow larger entries today: >> ![image](https://github.com/user-attachments/assets/e90d0518-148f-4a33-a9e8-0bdca14aa017) >> >> I there propose to use `NULL` and allow up to 4 characters for numbers which could be a good trade-off between shortness and expressiveness. This allows us to quickly see null checks and larger constants. >> >> Without patch: >> ![image](https://github.com/user-attachments/assets/e450f1d2-503c-4b84-8137-25892f8ab7f9) >> >> >> With patch: >> ![image](https://github.com/user-attachments/assets/81371a53-be7e-4acd-afbf-e5613e96815a) >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Increase number of parameters as suggested by Manuel src/hotspot/share/opto/idealGraphPrinter.cpp line 676: > 674: } else if (t->base() == Type::AnyPtr) { > 675: if (t->is_ptr()->ptr() == TypePtr::Null) { > 676: print_prop(short_name, "NULL"); I'm surprised this doesn't trip over sources/TestNoNULL.java. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25393#discussion_r2103131624 From never at openjdk.org Thu May 22 18:32:53 2025 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 22 May 2025 18:32:53 GMT Subject: RFR: 8357581: [JVMCI] Add ProfilingInfo.getDecompileCount In-Reply-To: References: Message-ID: On Thu, 22 May 2025 17:12:34 GMT, Doug Simon wrote: > Graal is adding enhanced logic to detect deoptimization cycles and needs to be able to query a method's decompilation counter (i.e. `MethodData::_compiler_counters._nof_decompiles`). > This PR adds the `HotSpotProfilingInfo` interface so that such HotSpot-specific profiling info can be accessed. > The change looks bigger in the GitHub review UI than it really is. I have simply renamed the pre-existing `HotSpotProfilingInfo` private class as `HotSpotProfilingInfoImpl` and repurposed the `HotSpotProfilingInfo` name for the *new* public interface. Looks good. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25397#pullrequestreview-2862216006 From jbhateja at openjdk.org Thu May 22 19:34:58 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 22 May 2025 19:34:58 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v34] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 23:35:39 GMT, Srinivas Vamsi Parasa wrote: >> Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. >> >> For example: >> >> `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding >> `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding > > Srinivas Vamsi Parasa has updated the pull request incrementally with five additional commits since the last revision: > > - refactor to use is_P6_or_later() > - rename byte1 to opcode_byte > - rename evex_opcode_prefix_and_encode as emit_eevex_or_demote > - rename evex to eevex in method names > - reset swap=false as default src/hotspot/cpu/x86/assembler_x86.cpp line 12862: > 12860: emit_prefix_and_int8(get_prefixq(src2, dst, is_map1), opcode_byte); > 12861: } > 12862: else { Suggestion: } else { src/hotspot/cpu/x86/assembler_x86.cpp line 12968: > 12966: encode = is_prefixq ? prefixq_and_encode(dst_enc, src_enc, is_map1) : prefix_and_encode(dst_enc, src_enc, is_map1); > 12967: } > 12968: else { Suggestion: } else { src/hotspot/cpu/x86/assembler_x86.cpp line 12978: > 12976: encode = vex_prefix_and_encode(nds_enc, dst_enc, src_enc, pre, opc, &attributes, /* src_is_gpr */ true, /* nds_is_ndd */ true, no_flags); > 12977: } > 12978: else { Suggestion: } else { src/hotspot/cpu/x86/vm_version_x86.hpp line 680: > 678: static int cpu_family() { return _cpu;} > 679: static bool is_P6() { return cpu_family() >= 6; } > 680: static bool is_P6_or_later() { return cpu_family() == 6 || cpu_family() == 18 || cpu_family() == 19; } Do we need cpu_family() == 18 check ? 19 is for Diamond Rapids and 6 for all Xerons before it, including E-core variants. src/hotspot/cpu/x86/vm_version_x86.hpp line 680: > 678: static int cpu_family() { return _cpu;} > 679: static bool is_P6() { return cpu_family() >= 6; } > 680: static bool is_P6_or_later() { return cpu_family() == 6 || cpu_family() == 18 || cpu_family() == 19; } ``` suggestion static bool is_intel_server_family() { return cpu_family() == 6 || cpu_family() == 18 || cpu_family() == 19; } We already have is_P6(), which returns true for CPU family >=6, a minor name change suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2103266719 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2103267701 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2103268316 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2103252756 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2103260881 From sparasa at openjdk.org Thu May 22 19:51:38 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 22 May 2025 19:51:38 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v35] In-Reply-To: References: Message-ID: <9ktu7V6RTRTpsYRsIL2VU9CeRSTnWdMHm3LTeL8H4zY=.4d3057ef-181f-483e-9293-0aeb1669db80@github.com> > Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. > > For example: > > `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding > `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding Srinivas Vamsi Parasa has updated the pull request incrementally with three additional commits since the last revision: - Update src/hotspot/cpu/x86/assembler_x86.cpp Co-authored-by: Jatin Bhateja - Update src/hotspot/cpu/x86/assembler_x86.cpp Co-authored-by: Jatin Bhateja - Update src/hotspot/cpu/x86/assembler_x86.cpp coding style Co-authored-by: Jatin Bhateja ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/110db142..c6718a15 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=34 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=33-34 Stats: 6 lines in 1 file changed: 0 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From duke at openjdk.org Thu May 22 20:23:22 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 22 May 2025 20:23:22 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v16] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with three additional commits since the last revision: - Update tests - Exclude JVMCI methods - Create nmethod relocation stress test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/9ca3563a..398a4dc4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=14-15 Stats: 312 lines in 10 files changed: 261 ins; 34 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Thu May 22 20:23:29 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 22 May 2025 20:23:29 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v15] In-Reply-To: References: Message-ID: <5WkvCTqK-OpOPBmO0PruIiRLPIUydIW8-c0JY_c-EkQ=.72ad2e70-267d-407b-97c4-4cf2cd4e36e9@github.com> On Thu, 8 May 2025 19:31:43 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 54 additional commits since the last revision: > > - Fix null check > - Remove unnecessary include > - Add nullptr check to relocate > - Fix JVMCI nmethod data > - Unexclude JVMCI methods > - Add relocate_nmethod_mirror > - Only hold NMethodState_lock when needed > - Exclude JVMCI nmethods > - Remove StressNMethodRelocation > - Fix branch_range revert > - ... and 44 more: https://git.openjdk.org/jdk/compare/1666d4f3...9ca3563a JVMCI compiled nmethods have been excluded from relocation due to concerns about safely updating their mirror fields. Graal can deoptimize methods and update these fields without acquiring locks, which introduces a potential race condition between relocation and field resetting. One possible solution is to introduce a lock when updating these fields. The lock must not block safepoints, since updates can be triggered from Java code. However, this would require that mirror updates do not hold any `_no_safepoint_check` locks. This is currently not the case, such as during deoptimization caused by uncommon traps ([source](https://github.com/openjdk/jdk/blob/139a05d05959a84541a29dfae6151f92ce579ae6/src/hotspot/share/runtime/deoptimization.cpp#L2456)). Another potential approach is to perform relocation at a safepoint, which would guarantee that Graal is not concurrently updating the fields. For now, excluding JVMCI methods from relocation appears to be the safest option. I?m in the process of writing a JBS issue to track this and eventually re-enable relocation for these methods. If anyone has suggestions for alternative solutions or improvements, I?d greatly appreciate the feedback. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2902454015 From iveresov at openjdk.org Thu May 22 20:25:21 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Thu, 22 May 2025 20:25:21 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v23] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 88 commits: - Merge branch 'master' into pp2 - Merge branch 'master' into pp2 - 8357284: runtime/cds/appcds/aotProfile/AOTProfileFlags.java fails on non-debug platform - 8357283: compiler/debug/TestStressBailout.java hangs when running with AOT cache - Merge branch 'master' into pp2 - Address Ioi's comments - Merge branch 'master' into pp2 - Address Ioi's comments - 8356885: Don't emit C1 profiling for casts if TypeProfileCasts is off Reviewed-by: vlivanov, kvn - 8352755: Misconceptions about j.text.DecimalFormat digits during parsing Reviewed-by: naoto - ... and 78 more: https://git.openjdk.org/jdk/compare/139a05d0...7a350671 ------------- Changes: https://git.openjdk.org/jdk/pull/24886/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=22 Stats: 3324 lines in 59 files changed: 3111 ins; 100 del; 113 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From jrose at openjdk.org Thu May 22 20:27:01 2025 From: jrose at openjdk.org (John R Rose) Date: Thu, 22 May 2025 20:27:01 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v8] In-Reply-To: References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: On Thu, 15 May 2025 16:03:44 GMT, Andrew Haley wrote: >> This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). >> >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op >> MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op >> MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op >> MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Copyright format correction Nice! There's a nicely written loop tail that handles power-of-two chunks from 32 bytes (stpq) down to a single byte. Like many such tails, it is O(lg N), N being the max tail size, and that can be annoying when the loop tail is most or all of the work. One thing that sometimes helps is a count leading zeroes followed by a multiway switch at the start, or just before the tail, to get started at the right place in the tail (its log-size cascade), for very small inputs. This PR https://github.com/openjdk/jdk/pull/25383 uses clz in that way. It also uses an overlapping-store technique to reduce an O(lg N) tail to an O(1) tail, which also depends on the clz step. When atomicity is not an issue, the overlapping-store technique is faster on my MacBook M1. It lets you (say) store 7 bytes in two cycles and no extra branches. The downside is some bytes get stored twice (in the overlap), so it only works on unshared memory. My rough notes on the relative performance of overlapping loads and stores are here FWIW: https://cr.openjdk.org/~jrose/jvm/PartialMemoryWord.cpp BTW, overlapping loads (properly bit-masked) are just as atomic as loads of individual bytes, and much faster. But that's not the topic here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25147#issuecomment-2902463076 From sparasa at openjdk.org Thu May 22 20:48:38 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 22 May 2025 20:48:38 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v34] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 19:26:36 GMT, Jatin Bhateja wrote: >> Srinivas Vamsi Parasa has updated the pull request incrementally with five additional commits since the last revision: >> >> - refactor to use is_P6_or_later() >> - rename byte1 to opcode_byte >> - rename evex_opcode_prefix_and_encode as emit_eevex_or_demote >> - rename evex to eevex in method names >> - reset swap=false as default > > src/hotspot/cpu/x86/vm_version_x86.hpp line 680: > >> 678: static int cpu_family() { return _cpu;} >> 679: static bool is_P6() { return cpu_family() >= 6; } >> 680: static bool is_P6_or_later() { return cpu_family() == 6 || cpu_family() == 18 || cpu_family() == 19; } > > ``` suggestion > static bool is_intel_server_family() { return cpu_family() == 6 || cpu_family() == 18 || cpu_family() == 19; } > > We already have is_P6(), which returns true for CPU family >=6, a minor name change suggestion. Please see the updated code with the naming change as suggested. > src/hotspot/cpu/x86/vm_version_x86.hpp line 680: > >> 678: static int cpu_family() { return _cpu;} >> 679: static bool is_P6() { return cpu_family() >= 6; } >> 680: static bool is_P6_or_later() { return cpu_family() == 6 || cpu_family() == 18 || cpu_family() == 19; } > > Do we need cpu_family() == 18 check ? > 19 is for Diamond Rapids and 6 for all Xerons before it, including E-core variants. Removed cpu_family()==18. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2103369018 PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2103369953 From sparasa at openjdk.org Thu May 22 20:48:38 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 22 May 2025 20:48:38 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v36] In-Reply-To: References: Message-ID: > Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. > > For example: > > `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding > `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: refactor is_P6_or_later and remove cpu_family==18 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24431/files - new: https://git.openjdk.org/jdk/pull/24431/files/c6718a15..3378eaee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=35 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=34-35 Stats: 8 lines in 2 files changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From sparasa at openjdk.org Thu May 22 20:48:38 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Thu, 22 May 2025 20:48:38 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v23] In-Reply-To: References: <6aZaHfVvUJFLz83fyZ42bnoSGseaRBYd0jEg_VLdS2Q=.4c681def-ee7c-4fcd-b147-348d317ac58f@github.com> <1e-92EcDWshsTiFbEmJt8z5SAVfhf5vpr8sgbEq3BbQ=.25d6d5f7-48d3-4a13-ac7d-8844844490fa@github.com> Message-ID: On Fri, 16 May 2025 16:07:32 GMT, Jatin Bhateja wrote: >>> We are only handling first variant of NDD instruction[1] in python test script , please extend the script to cover second variant[2] also. eaddq(Register dst, Register src1, Address src2, bool no_flags) - [1] eaddq(Register dst, Address src1, Register src2, bool no_flags) - [2] >> >> Hank's script is already handling the variant[2] in the `RegMemRegNddInstruction` class, for which no demotion is enabled. The demotion is enabled only for variant[1]. > >> > Hi @vamsi-parasa , I don't see demotion tests being generated with full mode gtest, i.e. python3 x86-asmtest.py --full >> >> Please see the updated `x86-asmtest.py` refactored to work with full set (`--full`). Please let me know if anything is missing. > > Hi @vamsi-parasa , > I am seeing some failures with --full mode when ENABLE_DEMOTION=False > /home/jatinbha/sandboxes/apx-release/jdk/test/hotspot/gtest/x86/test_assembler_x86.cpp:61: Failure > Failed > __ ecmovq (Assembler::Condition::greater, r31, r31, Address(rcx, rdx, (Address::ScaleFactor)0, +0x3c8d1915)); > OpenJDK: cc cc cc cc cc cc cc cc cc cc cc > GNU Assembler: 62 64 84 10 4f bc 11 15 19 8d 3c > [ FAILED ] AssemblerX86.validate_vm (13562 ms) > [----------] 1 test from AssemblerX86 (13708 ms total) Hi Jatin (@jatin-bhateja), Incorporated the changes suggested for cpu_family and is_P6_or_later() and other minor changes. Please let me know if everything looks good. Thanks, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2902523505 From kvn at openjdk.org Thu May 22 22:01:57 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 22 May 2025 22:01:57 GMT Subject: RFR: 8357581: [JVMCI] Add HotSpotProfilingInfo In-Reply-To: References: Message-ID: On Thu, 22 May 2025 17:12:34 GMT, Doug Simon wrote: > Graal is adding enhanced logic to detect deoptimization cycles and needs to be able to query a method's decompilation counter (i.e. `MethodData::_compiler_counters._nof_decompiles`). > This PR adds the `HotSpotProfilingInfo` interface so that such HotSpot-specific profiling info can be accessed. > The change looks bigger in the GitHub review UI than it really is. I have simply renamed the pre-existing `HotSpotProfilingInfo` private class as `HotSpotProfilingInfoImpl` and repurposed the `HotSpotProfilingInfo` name for the *new* public interface. Just one cosmetic comment about copyright year. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotProfilingInfo.java line 2: > 1: /* > 2: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. Please, keep 2 years: 2012, 2025. Even if you changed content the file is still present. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotProfilingInfoImpl.java line 2: > 1: /* > 2: * Copyright (c) 2012, 2025, Oracle and/or its affiliates. All rights reserved. this is one is fine since you copied it from an other file. ------------- PR Review: https://git.openjdk.org/jdk/pull/25397#pullrequestreview-2862651511 PR Review Comment: https://git.openjdk.org/jdk/pull/25397#discussion_r2103452147 PR Review Comment: https://git.openjdk.org/jdk/pull/25397#discussion_r2103452864 From sviswanathan at openjdk.org Thu May 22 22:10:58 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 22 May 2025 22:10:58 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v36] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 20:48:38 GMT, Srinivas Vamsi Parasa wrote: >> Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. >> >> For example: >> >> `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding >> `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > refactor is_P6_or_later and remove cpu_family==18 Updates look good. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24431#pullrequestreview-2862671947 From vlivanov at openjdk.org Thu May 22 22:44:11 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 22 May 2025 22:44:11 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v2] In-Reply-To: References: Message-ID: <0WKwHjzEn5dxYLkonrk4h9yfMI3r3bKDdqgG06J69N4=.e19e9441-6197-4d53-a4f4-b196a81f69d8@github.com> > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25315/files - new: https://git.openjdk.org/jdk/pull/25315/files/8d8f2c45..ad314e05 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=00-01 Stats: 78 lines in 13 files changed: 32 ins; 17 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/25315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315 PR: https://git.openjdk.org/jdk/pull/25315 From vlivanov at openjdk.org Thu May 22 22:44:11 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 22 May 2025 22:44:11 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v2] In-Reply-To: <0WKwHjzEn5dxYLkonrk4h9yfMI3r3bKDdqgG06J69N4=.e19e9441-6197-4d53-a4f4-b196a81f69d8@github.com> References: <0WKwHjzEn5dxYLkonrk4h9yfMI3r3bKDdqgG06J69N4=.e19e9441-6197-4d53-a4f4-b196a81f69d8@github.com> Message-ID: On Thu, 22 May 2025 22:41:33 GMT, Vladimir Ivanov wrote: >> This PR introduces C2 support for `Reference.reachabilityFence()`. >> >> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. >> >> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. >> >> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. >> >> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 >> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations >> - [x] java/lang/foreign microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > review feedback Thanks for the feedback, Aleksey, Tobias, and Christian. ------------- PR Review: https://git.openjdk.org/jdk/pull/25315#pullrequestreview-2862600156 From vlivanov at openjdk.org Thu May 22 22:44:12 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 22 May 2025 22:44:12 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v2] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 09:15:03 GMT, Aleksey Shipilev wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> review feedback > > src/hotspot/share/opto/block.cpp line 189: > >> 187: !get_node(end_idx)->is_Mach() && >> 188: !get_node(end_idx)->is_BoxLock() && >> 189: !(get_node(end_idx)->is_ReachabilityFence() && C->print_assembly())) { > > Um. So this fairly generic method is predicated of whether a diagnostic VM option is enabled. Which risks that the compiler behavior with/without printing assembly is different? Which might hide the very issues we are trying to diagnose with printing assembly? Can we handle `RF` here without checking for `print_assembly`? > > This will also obviate a need to pass `Compile* C` around. Fair enough. Treating RFs uniformly should be fine, since they are attached to safepoints now. > src/hotspot/share/opto/callnode.cpp line 969: > >> 967: projs->exobj = e; >> 968: } else { >> 969: // exception table for rethrow case > > Feels like we want to assert other values of `e->in(0)->as_CatchProj()->_con` here? From the switch statement in the previous hunk, there seems to be `fall_through_index` that is not "rethrow case" (or is it?). Makes sense. Added an assert. > src/hotspot/share/opto/compile.cpp line 3912: > >> 3910: // requires that the walk visits a node's inputs before visiting the node. >> 3911: >> 3912: static bool has_non_debug_uses(Node* n) { > > This got inserted right between `------------------------------final_graph_reshaping_walk--------------------` comment and the `final_graph_reshaping_walk` implementation. Also, put `has_non_debug_uses` into `Compile`? I moved it to `Node::has_non_debug_uses()`. > src/hotspot/share/opto/compile.cpp line 3995: > >> 3993: for (int j = start; j < end; j++) { >> 3994: Node* in = n->in(j); >> 3995: if (in->is_DecodeNarrowPtr() && (is_uncommon || has_non_debug_uses(in))) { > > The comment says we can skip when node is only referenced in debug info. Here we skip when there _are_ non-debug uses. Did you mean `!has_non_debug_uses(in)`? Good point. Fixed. Funnily enough, it did the right job here (but not in RF case!), because `has_non_debug_uses` reported the opposite answer to what can be expected based on its name :-) > src/hotspot/share/opto/parse1.cpp line 1225: > >> 1223: >> 1224: if (StressReachabilityFences) { >> 1225: // Keep all oop arguments alive until method return. > > Comment says "arguments", but we save locals. Aren't arguments on "stack" in `JVMState`? For stress mode, would make sense to hook up both locals/stack from `JVMState`, maybe? It happens inside callee context, so all arguments are already moved to locals. The code could explicitly iterate over arguments (using `argument(uint)` query) or enumerate only those locals which hold arguments, but that would require a special case for receiver. Iteration over locals (`[0 ... max_locals)`) is uniform and enumerates only arguments since everything else is top. > src/hotspot/share/opto/reachability.cpp line 211: > >> 209: } >> 210: } >> 211: return found; > > `found` is always `false` here. I don't think this function even needs a return value, judging by the uses. My intention was to signal when not-yet-known redundant fences are found. I prefer to keep it even if it is not used right now. Fixed by updating `found` along with pushing newly discovered redundant node on the list. > src/hotspot/share/opto/reachability.cpp line 431: > >> 429: } >> 430: } >> 431: redundant_rfs.push(rf); > > I see `PhaseIdealLoop::optimize_reachability_fences` asks for `redundant_rfs.member(rf)` before going into this analysis. Is it because we can have duplicate RFs in the `C->reachability_fence` list? Can we have duplicate here? Should we check `.member(rf)` here as well? No duplicated entries are allowed in `Compile::_reachability_fences`. `eliminate_reachability_fences()` is called at the end of loop opts after `optimize_reachability_fences()` is done (possibly, multiple times). `eliminate_reachability_fences()` migrates all referents from RFs to safepoints. So, every RF node is placed on `redundant_rfs` list and there are no RF nodes left after `eliminate_reachability_fences()` is over (there's `C->reachability_fences_count() == 0` assert at the end). After safepoints are directly linked to the referent, the corresponding RF node becomes redundant. So, I kept `redundant_rfs` as a name. I can choose a different one if you find it confusing. Another important aspect when it comes to determining RF redundancy is that `eliminate_reachability_fences()` takes all users into account while `optimize_reachability_fences()` trusts only `ReachabilityFence`s. > src/java.base/share/classes/java/lang/ref/Reference.java line 662: > >> 660: * @since 9 >> 661: */ >> 662: @IntrinsicCandidate > > Sounds like we also want to restore `@DontInline` to cover the case when intrinsic is not available / disabled for some compiler. I vaguely remember some intrinsic handling code checks whether method is prohibited from inlining (maybe affects only global `-XX:-Inline`, not sure), so it might be as straightforward. I'd like to use `-XX:DisableIntrinsic=_Reference_reachabilityFence` to switch to current behavior (no fence). Also, `@DontInline` would require special handling in C1 to unconditionally inline it. `@ForceInline` was there primarily to communicate the interaction with JVM. (Existing inlining heuristics should just unconditionally inline empty methods.) Once `@IntrinsicCandidate` is there, I don't see much value in any other annotations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2103490049 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2103488472 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2103480961 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2103482636 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2103418703 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2103447864 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2103457174 PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2103467390 From vlivanov at openjdk.org Thu May 22 22:44:12 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 22 May 2025 22:44:12 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v2] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 09:37:45 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/opto/compile.cpp line 3968: >> >>> 3966: return; >>> 3967: >>> 3968: // Go over ReachabilityFence nodes to skip DecodeN nodes for referents. >> >> This is a cute optimization. Does it happen in our code anywhere? I would have expected `DecodeN` to be near the heap loads, and suppose `RF` is mostly called on locals, which are already uncompressed? > > Now that I read the next hunk, should `is_DecodeN` be `is_DecodeNarrowPtr` to capture class loads (however unlikely that one is)? `ReachabilityFence` accepts only OOPs as a referent and `DecodeNKlass` produces `Klass` pointer. I suspect it may be the case for safepoints as well (and `is_DecodeNarrowPtr()` is a a leftover from PermGen world), but I didn't check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2103480358 From vlivanov at openjdk.org Thu May 22 22:44:12 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 22 May 2025 22:44:12 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v2] In-Reply-To: <_y8LMhDg7b8EuWLikdsmgK0nUCCh0Y2PH0LX_-TpsD4=.fa9bb975-a850-4223-825e-726dcc5a74f2@github.com> References: <_y8LMhDg7b8EuWLikdsmgK0nUCCh0Y2PH0LX_-TpsD4=.fa9bb975-a850-4223-825e-726dcc5a74f2@github.com> Message-ID: On Wed, 21 May 2025 05:12:06 GMT, Christian Hagedorn wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> review feedback > > src/hotspot/share/opto/phasetype.hpp line 88: > >> 86: flags(PHASEIDEALLOOP2, "PhaseIdealLoop 2") \ >> 87: flags(PHASEIDEALLOOP3, "PhaseIdealLoop 3") \ >> 88: flags(OPTIMIZE_RF, "Optimize Reachability Fences") \ > > Another drive-by comment: I suggest to use the full word since most people are probably not aware of this abbreviation when looking at graph dumps in IGV. You should also add this phase to the IR framework [CompilePhases](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/CompilePhase.java). > > Suggestion: > > flags(OPTIMIZE_REACHABILITY_FENCES, "Optimize Reachability Fences") \ IMO it doesn't clarify things much. It's enum constant name and it's not shown in IGV. It is used in a single place [1] where it's clear what it refers to. [1] { // No more loop opts. It is safe to eliminate reachability fence nodes. TracePhase tp(_t_idealLoop); PhaseIdealLoop::optimize(igvn, LoopOptsEliminateRFs); print_method(PHASE_OPTIMIZE_RF, 2); if (failing()) return; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2103476549 From vlivanov at openjdk.org Thu May 22 22:44:12 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 22 May 2025 22:44:12 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v2] In-Reply-To: References: Message-ID: On Tue, 20 May 2025 08:49:58 GMT, Tobias Hartmann wrote: >> Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: >> >> review feedback > > test/hotspot/jtreg/compiler/c2/TestReachabilityFence.java line 93: > >> 91: return payload[id][offset]; >> 92: } finally { >> 93: // Reference.reachabilityFence(this); > > Drive-by comment: Is this intentionally disabled? Good catch. Uncommented. (FTR I wasn't able to reproduce the problem with on-heap backing storage. Choosing between removing the test case which doesn't provoke the problem and keeping it, I chose the latter.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2103471176 From cslucas at openjdk.org Thu May 22 22:45:30 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 22 May 2025 22:45:30 GMT Subject: RFR: 8357600: Patch nmethod flushing message to include more details Message-ID: Please review this patch for adding more details to nmethod flushing message. These details are particularly important when investigating interaction of JVMCI compiled code and code cache flushing heuristics. Tested on Linux x64 with JTREG tier1-3 using fastdebug and release builds. ------------- Commit messages: - Add more info to nmethod flushing. Changes: https://git.openjdk.org/jdk/pull/25402/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25402&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357600 Stats: 23 lines in 1 file changed: 16 ins; 3 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25402.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25402/head:pull/25402 PR: https://git.openjdk.org/jdk/pull/25402 From dlong at openjdk.org Thu May 22 23:47:24 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 22 May 2025 23:47:24 GMT Subject: RFR: 8357468: [asan] heap buffer overflow reported in PcDesc::pc_offset() pcDesc.hpp:57 Message-ID: This appears to be mostly harmless, but we should fix it anyway. The initial sentinel PcDesc has a pc_offset of -1. We can prevent looking before the sentinel by reversing the condition so that pc[0] is checked before pc[-1]. ------------- Commit messages: - check pc[0] before pc[-1] Changes: https://git.openjdk.org/jdk/pull/25404/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25404&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357468 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25404/head:pull/25404 PR: https://git.openjdk.org/jdk/pull/25404 From dlong at openjdk.org Fri May 23 00:12:51 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 23 May 2025 00:12:51 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll [v2] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 15:34:15 GMT, Marc Chevalier wrote: >> This assert seems a bit too tight. See the JBS issue to check the math: the bound of `trip_count` should be `<= 2^31`, while the current bound is ` < (julong)max_juint/2` = floor((2^32-1)/2) = (2^32-2) / 2 = 2^31-1. > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - A test > - Merge branch 'master' into fix/do_unroll-assert > - Relax the assert Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25295#pullrequestreview-2862864207 From dlong at openjdk.org Fri May 23 00:29:53 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 23 May 2025 00:29:53 GMT Subject: RFR: 8352141: UBSAN: fix the left shift of negative value in relocInfo.cpp, internal_word_Relocation::pack_data_to() In-Reply-To: References: Message-ID: On Mon, 24 Mar 2025 13:18:25 GMT, Afshin Zafari wrote: > The `offset` variable used in left-shift op can be a large number with its sign-bit set. This makes a negative value which is UB for left-shift and is reported as > `runtime error: left shift of negative value -25 at relocInfo.cpp:...` > > Using `java_left_shif()` function is the workaround to avoid UB. This function uses reinterpret_cast to cast from signed to unsigned and back. > > Tests: > linux-x64-debug tier1 on a UBSAN enabled build. I suspect that the `&= right_n_bits()` trick I proposed for 8352140 would also work for this case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24196#issuecomment-2902937612 From kvn at openjdk.org Fri May 23 00:42:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 23 May 2025 00:42:51 GMT Subject: RFR: 8357468: [asan] heap buffer overflow reported in PcDesc::pc_offset() pcDesc.hpp:57 In-Reply-To: References: Message-ID: On Thu, 22 May 2025 23:43:09 GMT, Dean Long wrote: > This appears to be mostly harmless, but we should fix it anyway. The initial sentinel PcDesc has a pc_offset of -1. We can prevent looking before the sentinel by reversing the condition so that pc[0] is checked before pc[-1]. Good ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25404#pullrequestreview-2862889558 From jbhateja at openjdk.org Fri May 23 01:48:01 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 23 May 2025 01:48:01 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v36] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 20:48:38 GMT, Srinivas Vamsi Parasa wrote: >> Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. >> >> For example: >> >> `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding >> `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > refactor is_P6_or_later and remove cpu_family==18 Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24431#pullrequestreview-2862950073 From dlong at openjdk.org Fri May 23 01:51:50 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 23 May 2025 01:51:50 GMT Subject: RFR: 8357468: [asan] heap buffer overflow reported in PcDesc::pc_offset() pcDesc.hpp:57 In-Reply-To: References: Message-ID: On Thu, 22 May 2025 23:43:09 GMT, Dean Long wrote: > This appears to be mostly harmless, but we should fix it anyway. The initial sentinel PcDesc has a pc_offset of -1. We can prevent looking before the sentinel by reversing the condition so that pc[0] is checked before pc[-1]. Thanks Vladimir. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25404#issuecomment-2903029898 From syan at openjdk.org Fri May 23 02:44:51 2025 From: syan at openjdk.org (SendaoYan) Date: Fri, 23 May 2025 02:44:51 GMT Subject: RFR: 8357105: C2: compilation fails with "assert(false) failed: empty program detected during loop optimization" In-Reply-To: References: Message-ID: On Thu, 22 May 2025 15:19:08 GMT, Daniel Skantz wrote: > This pull request contains a fix for JDK-8357105. > > The problem is performing stacked string concatenation optimization between a pair of StringBuilder.append().toString()-links SB1 and SB2, where the parameter of an append call in SB2 has a complex dependency on the result of SB1, which in turn is replaced by top() during stringopts -- similar to JDK-8271341, which had a diamond if-structure using the result of SB1, while in this case the use is an unstable If. In the attached regression test, a live part of the graph gets optimized away during later phases and ultimately the whole graph vanishes. > > The proposed solution is to simply exclude this specific case. This bug has existed for a long time and stacked concats is a niche optimization. > > Testing: > Tier1-4. > > Extra testing: > Ran Tier1-4 with an instrumented build and observed that we do not disable stacked concatenation in any previously known case after the fix. Changes requested by syan (Committer). test/hotspot/jtreg/compiler/stringopts/TestStackedConcatsAppendUncommonTrap.java line 38: > 36: > 37: public static void main (String... args) { > 38: for (int i = 0; i < 1_000_000; i ++) { How about `i ++` replaced as `i++`. The whitespace seems do not need. test/hotspot/jtreg/compiler/stringopts/TestStackedConcatsAppendUncommonTrap.java line 39: > 37: public static void main (String... args) { > 38: for (int i = 0; i < 1_000_000; i ++) { > 39: f(" "); Should we use the function return value, to avoid the compiler do the dead code elimination optimization ------------- PR Review: https://git.openjdk.org/jdk/pull/25395#pullrequestreview-2863000077 PR Review Comment: https://git.openjdk.org/jdk/pull/25395#discussion_r2103686850 PR Review Comment: https://git.openjdk.org/jdk/pull/25395#discussion_r2103690822 From fyang at openjdk.org Fri May 23 02:52:54 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 23 May 2025 02:52:54 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v17] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 10:55:19 GMT, Anjian-Wen wrote: >> From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time >> >> on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below >> >> before the patch >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op >> MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op >> MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op >> MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op >> MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op >> MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op >> MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op >> MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op >> MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op >> MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op >> MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op >> MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op >> MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op >> MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op >> MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op >> MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op >> MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op >> MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op >> MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/o... > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > add new line for bind Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23890#pullrequestreview-2863012393 From iveresov at openjdk.org Fri May 23 02:54:55 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Fri, 23 May 2025 02:54:55 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v24] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request incrementally with one additional commit since the last revision: Missing part of the merge ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24886/files - new: https://git.openjdk.org/jdk/pull/24886/files/7a350671..a1958ece Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=22-23 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From duke at openjdk.org Fri May 23 03:42:58 2025 From: duke at openjdk.org (duke) Date: Fri, 23 May 2025 03:42:58 GMT Subject: RFR: 8351140: RISC-V: Intrinsify Unsafe::setMemory [v17] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 10:55:19 GMT, Anjian-Wen wrote: >> From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time >> >> on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below >> >> before the patch >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op >> MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op >> MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op >> MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op >> MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op >> MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op >> MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op >> MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op >> MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op >> MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op >> MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op >> MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op >> MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op >> MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op >> MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op >> MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op >> MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op >> MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op >> MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op >> MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op >> MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/o... > > Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: > > add new line for bind @Anjian-Wen Your change (at version 87165277c1d29166aab69e2456db539ccedd0380) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23890#issuecomment-2903153814 From duke at openjdk.org Fri May 23 03:50:58 2025 From: duke at openjdk.org (Anjian-Wen) Date: Fri, 23 May 2025 03:50:58 GMT Subject: Integrated: 8351140: RISC-V: Intrinsify Unsafe::setMemory In-Reply-To: References: Message-ID: On Tue, 4 Mar 2025 09:46:53 GMT, Anjian-Wen wrote: > From [JDK-8329331](https://bugs.openjdk.org/browse/JDK-8329331), add riscv unsafe::setMemory intrinsic?s generator generate_unsafe_setmemory. This intrinsic optimizes about quite a lot unsafe setmemory time > > on my musebook, the JMH test micro:java.lang.foreign.MemorySegmentZeroUnsafe shows below > > before the patch > > Benchmark (aligned) (size) Mode Cnt Score Error Units > MemorySegmentZeroUnsafe.panama true 1 avgt 30 24.198 ? 0.392 ns/op > MemorySegmentZeroUnsafe.panama true 2 avgt 30 20.688 ? 0.013 ns/op > MemorySegmentZeroUnsafe.panama true 3 avgt 30 20.703 ? 0.045 ns/op > MemorySegmentZeroUnsafe.panama true 4 avgt 30 20.053 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 5 avgt 30 20.682 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama true 6 avgt 30 20.732 ? 0.061 ns/op > MemorySegmentZeroUnsafe.panama true 7 avgt 30 21.403 ? 0.096 ns/op > MemorySegmentZeroUnsafe.panama true 8 avgt 30 25.268 ? 0.197 ns/op > MemorySegmentZeroUnsafe.panama true 15 avgt 30 27.481 ? 0.195 ns/op > MemorySegmentZeroUnsafe.panama true 16 avgt 30 27.577 ? 0.019 ns/op > MemorySegmentZeroUnsafe.panama true 63 avgt 30 208.893 ? 2.795 ns/op > MemorySegmentZeroUnsafe.panama true 64 avgt 30 199.167 ? 0.936 ns/op > MemorySegmentZeroUnsafe.panama true 255 avgt 30 220.672 ? 0.879 ns/op > MemorySegmentZeroUnsafe.panama true 256 avgt 30 246.256 ? 0.756 ns/op > MemorySegmentZeroUnsafe.panama false 1 avgt 30 23.849 ? 0.088 ns/op > MemorySegmentZeroUnsafe.panama false 2 avgt 30 20.671 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 3 avgt 30 20.694 ? 0.037 ns/op > MemorySegmentZeroUnsafe.panama false 4 avgt 30 20.048 ? 0.010 ns/op > MemorySegmentZeroUnsafe.panama false 5 avgt 30 20.684 ? 0.020 ns/op > MemorySegmentZeroUnsafe.panama false 6 avgt 30 20.685 ? 0.016 ns/op > MemorySegmentZeroUnsafe.panama false 7 avgt 30 21.383 ? 0.086 ns/op > MemorySegmentZeroUnsafe.panama false 8 avgt 30 25.684 ? 0.006 ns/op > MemorySegmentZeroUnsafe.panama false 15 avgt 30 27.593 ? 0.043 ns/op > MemorySegmentZeroUnsafe.panama false 16 avgt 30 28.437 ? 0.228 ns/op > MemorySegmentZeroUnsafe.panama false 63 avgt 30... This pull request has now been integrated. Changeset: 1fd65b7a Author: Anjian-Wen Committer: Feilong Jiang URL: https://git.openjdk.org/jdk/commit/1fd65b7a7b0ec38fde79aa4f5e53506d28893439 Stats: 125 lines in 1 file changed: 125 ins; 0 del; 0 mod 8351140: RISC-V: Intrinsify Unsafe::setMemory Reviewed-by: fjiang, fyang ------------- PR: https://git.openjdk.org/jdk/pull/23890 From kvn at openjdk.org Fri May 23 04:35:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 23 May 2025 04:35:54 GMT Subject: RFR: 8356989: Unexpected null in C2 compiled code In-Reply-To: References: Message-ID: On Thu, 22 May 2025 09:22:08 GMT, Roland Westrelin wrote: > In the test case, a non escaping array is initialized by an > `arraycopy` that uses this array as source and destination. Following > the `arraycopy`, one of the element of the array is tested for > `null`. That null check is constant folded to always `null` by escape > analysis. As I understand, the `Allocate` for the array should be > marked by EA as destination of an array copy. That state should then > be propagated by EA to uses and all destinations of an array copy > should be marked as unknown value. But EA has logic that explicitly > skips the case where an `ArrayCopy` has same source and > destination. Removing that logic fixes the failure. Marked as reviewed by kvn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25389#pullrequestreview-2863129237 From kvn at openjdk.org Fri May 23 04:35:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 23 May 2025 04:35:54 GMT Subject: RFR: 8356989: Unexpected null in C2 compiled code In-Reply-To: References: Message-ID: On Thu, 22 May 2025 09:24:52 GMT, Roland Westrelin wrote: >> In the test case, a non escaping array is initialized by an >> `arraycopy` that uses this array as source and destination. Following >> the `arraycopy`, one of the element of the array is tested for >> `null`. That null check is constant folded to always `null` by escape >> analysis. As I understand, the `Allocate` for the array should be >> marked by EA as destination of an array copy. That state should then >> be propagated by EA to uses and all destinations of an array copy >> should be marked as unknown value. But EA has logic that explicitly >> skips the case where an `ArrayCopy` has same source and >> destination. Removing that logic fixes the failure. > > @vnkozlov you added that code with 7147744. What do you think? Hi @rwestrel The comment simple said that the state of destination's fields (array elements) should be tracked by adding special (arracopy) edge when destination and source are different. If source and destination are the same the connected edge is not needed because they were argument-escape already. In original EA code we always set all arguments of arraycopy as argument-escape: [escape.cpp#L852](https://github.com/openjdk/jdk8u-dev/blob/e5f92a2396e9b0922c5e42dc809ad827052a9352/hotspot/src/share/vm/opto/escape.cpp#L852). Then you optimized it [JDK-8076188](https://github.com/openjdk/jdk/commit/a9cdbd04076149927fc7c13704eb01ea32e2ca6f) by setting arguments of arraycopy to NoEscape state. That invalidated original assumption and now we should take into account arraycopy. Your fix looks reasonable for current code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25389#issuecomment-2903218802 From kvn at openjdk.org Fri May 23 04:40:50 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 23 May 2025 04:40:50 GMT Subject: RFR: 8356989: Unexpected null in C2 compiled code In-Reply-To: References: Message-ID: On Thu, 22 May 2025 09:22:08 GMT, Roland Westrelin wrote: > In the test case, a non escaping array is initialized by an > `arraycopy` that uses this array as source and destination. Following > the `arraycopy`, one of the element of the array is tested for > `null`. That null check is constant folded to always `null` by escape > analysis. As I understand, the `Allocate` for the array should be > marked by EA as destination of an array copy. That state should then > be propagated by EA to uses and all destinations of an array copy > should be marked as unknown value. But EA has logic that explicitly > skips the case where an `ArrayCopy` has same source and > destination. Removing that logic fixes the failure. May be more accurate would be to check escape state: if (arg_ptn != src_ptn || es == PointsToNode::NoEscape) { ------------- Changes requested by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25389#pullrequestreview-2863138100 From vlivanov at openjdk.org Fri May 23 04:44:52 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 23 May 2025 04:44:52 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v2] In-Reply-To: <0WKwHjzEn5dxYLkonrk4h9yfMI3r3bKDdqgG06J69N4=.e19e9441-6197-4d53-a4f4-b196a81f69d8@github.com> References: <0WKwHjzEn5dxYLkonrk4h9yfMI3r3bKDdqgG06J69N4=.e19e9441-6197-4d53-a4f4-b196a81f69d8@github.com> Message-ID: On Thu, 22 May 2025 22:44:11 GMT, Vladimir Ivanov wrote: >> This PR introduces C2 support for `Reference.reachabilityFence()`. >> >> After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. >> >> `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. >> >> Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. >> >> Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 >> "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." >> >> Testing: >> - [x] hs-tier1 - hs-tier8 >> - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations >> - [x] java/lang/foreign microbenchmarks > > Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: > > review feedback >> Representing ReachabilityFence as memory barrier (e.g., MemBarCPUOrder) would solve the issue, but performance costs are prohibitively high. > How bad is it? MemBarCPUOrder pinches all memory, so I assume this breaks a lot of optimizations when RF is sitting in the hot loop? I remember we went through a similar exercise with Blackholes: [JDK-8296545](https://bugs.openjdk.org/browse/JDK-8296545) -- and decided to pinch only the control. I guessing this is not enough to fix RF, or is it? Yes, if a barrier stays inside loop body, it breaks a lot of important optimizations. It may end up almost as bad as a full-blown call (except a barrier can be moved around while a call can't). And moving a node when it depends both on control and memory is more complicated than just a CFG node. Moreover, as you can see in the proposed solution, even CFG-only representation is problematic for loop opts, so additional care is needed to ensure RFs are moved out of loops. As an alternative approach, I thought about reifying RF as a data node (think of `CastPP`) and then linking its referent to all safepoints it dominates after loop opts are over. But that would only affect `optimize_reachability_fences()`. Everything else would stay the same. So, I decided to stay with CFG-only representation for now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25315#issuecomment-2903245196 From kvn at openjdk.org Fri May 23 04:51:50 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 23 May 2025 04:51:50 GMT Subject: RFR: 8356989: Unexpected null in C2 compiled code In-Reply-To: References: Message-ID: <14ISbyfFWRwTEptGP6f9zzLdcCW9-JnkBmqYE3Fh37s=.9c2e9da1-d036-455d-a5da-84cd64cf4e9d@github.com> On Thu, 22 May 2025 09:22:08 GMT, Roland Westrelin wrote: > In the test case, a non escaping array is initialized by an > `arraycopy` that uses this array as source and destination. Following > the `arraycopy`, one of the element of the array is tested for > `null`. That null check is constant folded to always `null` by escape > analysis. As I understand, the `Allocate` for the array should be > marked by EA as destination of an array copy. That state should then > be propagated by EA to uses and all destinations of an array copy > should be marked as unknown value. But EA has logic that explicitly > skips the case where an `ArrayCopy` has same source and > destination. Removing that logic fixes the failure. Marked as reviewed by kvn (Reviewer). On other hand we may not propagate global-escape state which may affect locking optimizations. Okay your fix is good. ------------- PR Review: https://git.openjdk.org/jdk/pull/25389#pullrequestreview-2863156117 PR Comment: https://git.openjdk.org/jdk/pull/25389#issuecomment-2903254599 From fjiang at openjdk.org Fri May 23 06:19:37 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 23 May 2025 06:19:37 GMT Subject: RFR: 8357460: RISC-V: Optimize array fill stub for small size [v2] In-Reply-To: References: Message-ID: > Please consider. > As discussed in https://github.com/openjdk/jdk/pull/23890#discussion_r2094920943, we can also further optimize the array fill stub by unrolling the storage of values when the size is less than 8. > > This PR also removes the **aligned tail part** with the consideration of code size and testing coverage. As the test reveals there are no significant regressions. > > > Before: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 7 avgt 12 27.215 ? 0.073 ns/op > ArrayFill.fillByteArray 15 avgt 12 32.687 ? 0.904 ns/op > ArrayFill.fillIntArray 7 avgt 12 28.629 ? 0.006 ns/op > ArrayFill.fillIntArray 15 avgt 12 29.351 ? 0.009 ns/op > ArrayFill.fillShortArray 7 avgt 12 30.776 ? 0.006 ns/op > ArrayFill.fillShortArray 15 avgt 12 31.724 ? 0.447 ns/op > ArrayFill.zeroByteArray 7 avgt 12 27.199 ? 0.006 ns/op > ArrayFill.zeroByteArray 15 avgt 12 32.685 ? 0.900 ns/op > ArrayFill.zeroIntArray 7 avgt 12 28.630 ? 0.007 ns/op > ArrayFill.zeroIntArray 15 avgt 12 29.352 ? 0.011 ns/op > ArrayFill.zeroShortArray 7 avgt 12 30.776 ? 0.006 ns/op > ArrayFill.zeroShortArray 15 avgt 12 31.497 ? 0.012 ns/op > > After: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 7 avgt 12 20.137 ? 0.042 ns/op > ArrayFill.fillByteArray 15 avgt 12 32.928 ? 0.004 ns/op > ArrayFill.fillIntArray 7 avgt 12 28.630 ? 0.004 ns/op > ArrayFill.fillIntArray 15 avgt 12 29.344 ? 0.005 ns/op > ArrayFill.fillShortArray 7 avgt 12 31.494 ? 0.004 ns/op > ArrayFill.fillShortArray 15 avgt 12 31.492 ? 0.008 ns/op > ArrayFill.zeroByteArray 7 avgt 12 19.980 ? 0.164 ns/op > ArrayFill.zeroByteArray 15 avgt 12 32.927 ? 0.004 ns/op > ArrayFill.zeroIntArray 7 avgt 12 28.629 ? 0.005 ns/op > ArrayFill.zeroIntArray 15 avgt 12 29.346 ? 0.006 ns/op > ArrayFill.zeroShortArray 7 avgt 12 32.193 ? 0.027 ns/op > ArrayFill.zeroShortArray 15 avgt 12 31.495 ? 0.010 ns/op > > > Testing: > - [x] tier1 Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'openjdk:master' into riscv-optimize-generate-fill - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-optimize-generate-fill - optimize array fill stub for small size ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25350/files - new: https://git.openjdk.org/jdk/pull/25350/files/5deb7146..79e09023 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25350&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25350&range=00-01 Stats: 2391 lines in 61 files changed: 1664 ins; 474 del; 253 mod Patch: https://git.openjdk.org/jdk/pull/25350.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25350/head:pull/25350 PR: https://git.openjdk.org/jdk/pull/25350 From dnsimon at openjdk.org Fri May 23 06:19:42 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 23 May 2025 06:19:42 GMT Subject: RFR: 8357581: [JVMCI] Add HotSpotProfilingInfo [v2] In-Reply-To: References: Message-ID: > Graal is adding enhanced logic to detect deoptimization cycles and needs to be able to query a method's decompilation counter (i.e. `MethodData::_compiler_counters._nof_decompiles`). > This PR adds the `HotSpotProfilingInfo` interface so that such HotSpot-specific profiling info can be accessed. > The change looks bigger in the GitHub review UI than it really is. I have simply renamed the pre-existing `HotSpotProfilingInfo` private class as `HotSpotProfilingInfoImpl` and repurposed the `HotSpotProfilingInfo` name for the *new* public interface. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: fix copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25397/files - new: https://git.openjdk.org/jdk/pull/25397/files/d95475b0..12a9a059 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25397&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25397&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25397.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25397/head:pull/25397 PR: https://git.openjdk.org/jdk/pull/25397 From dnsimon at openjdk.org Fri May 23 06:19:42 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 23 May 2025 06:19:42 GMT Subject: RFR: 8357581: [JVMCI] Add HotSpotProfilingInfo [v2] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 21:56:06 GMT, Vladimir Kozlov wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> fix copyright > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotProfilingInfo.java line 2: > >> 1: /* >> 2: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. > > Please, keep 2 years: 2012, 2025. Even if you changed content the file is still present. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25397#discussion_r2103880075 From chagedorn at openjdk.org Fri May 23 06:32:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 23 May 2025 06:32:59 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll [v2] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 15:34:15 GMT, Marc Chevalier wrote: >> This assert seems a bit too tight. See the JBS issue to check the math: the bound of `trip_count` should be `<= 2^31`, while the current bound is ` < (julong)max_juint/2` = floor((2^32-1)/2) = (2^32-2) / 2 = 2^31-1. > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - A test > - Merge branch 'master' into fix/do_unroll-assert > - Relax the assert Looks good to me, too! Thanks for spending more time on the reproducer and the interesting discussion offline! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25295#pullrequestreview-2863341196 From thartmann at openjdk.org Fri May 23 06:49:00 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 23 May 2025 06:49:00 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll [v2] In-Reply-To: References: Message-ID: <2dyLRjqXvCTkW8UGbCqjyOQG6mNv7y6NPZf_YmQyBYA=.97c0c810-d82f-4748-9994-09ad6c32ae98@github.com> On Thu, 22 May 2025 15:34:15 GMT, Marc Chevalier wrote: >> This assert seems a bit too tight. See the JBS issue to check the math: the bound of `trip_count` should be `<= 2^31`, while the current bound is ` < (julong)max_juint/2` = floor((2^32-1)/2) = (2^32-2) / 2 = 2^31-1. > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - A test > - Merge branch 'master' into fix/do_unroll-assert > - Relax the assert test/hotspot/jtreg/compiler/loopopts/UnrollWideLoopHitsTooStrictAssert.java line 27: > 25: * @test > 26: * @bug 8356647 > 27: * @summary C2's unrolling code has a too strict assert when a counted loop's range as wide as int's. Suggestion: * @summary C2's unrolling code has a too strict assert when a counted loop's range is as wide as an int. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25295#discussion_r2103918418 From chagedorn at openjdk.org Fri May 23 06:49:03 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 23 May 2025 06:49:03 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v24] In-Reply-To: <6pb3IetxpG89G3u8BkV-5Lt8pQSG98qZyGq75TRxvOU=.0d9b6fc1-bd7c-4384-bcfc-7c8861d77abf@github.com> References: <6pb3IetxpG89G3u8BkV-5Lt8pQSG98qZyGq75TRxvOU=.0d9b6fc1-bd7c-4384-bcfc-7c8861d77abf@github.com> Message-ID: On Thu, 22 May 2025 14:05:09 GMT, Roland Westrelin wrote: >> Good catch. That is now off as well. Additionally, it should probably be `TraceLoopUnswitching` and not `TraceLoopPredicate`. >> >> We could return the `ParsePredicate` from `clone_parse_predicate()` which is called from `CloneUnswitchedLoopPredicatesVisitor::visit()` and then call it from there. Maybe something like below? >> >>
>> Patch Suggestion (untested) >> >> >> Index: src/hotspot/share/opto/predicates.hpp >> IDEA additional info: >> Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP >> <+>UTF-8 >> =================================================================== >> diff --git a/src/hotspot/share/opto/predicates.hpp b/src/hotspot/share/opto/predicates.hpp >> --- a/src/hotspot/share/opto/predicates.hpp (revision a0cdf36bdfeca9cd8b669859700d63d5ee627458) >> +++ b/src/hotspot/share/opto/predicates.hpp (date 1747831252516) >> @@ -288,8 +288,6 @@ >> } >> >> static ParsePredicateNode* init_parse_predicate(const Node* parse_predicate_proj, Deoptimization::DeoptReason deopt_reason); >> - NOT_PRODUCT(static void trace_cloned_parse_predicate(bool is_false_path_loop, >> - const ParsePredicateSuccessProj* success_proj);) >> >> public: >> ParsePredicate(Node* parse_predicate_proj, Deoptimization::DeoptReason deopt_reason) >> @@ -320,8 +318,8 @@ >> return _success_proj; >> } >> >> - ParsePredicate clone_to_unswitched_loop(Node* new_control, bool is_false_path_loop, >> - PhaseIdealLoop* phase) const; >> + ParsePredicate clone_to_unswitched_loop(Node* new_control, bool is_false_path_loop, PhaseIdealLoop* phase) const; >> + NOT_PRODUCT(void trace_cloned_parse_predicate(bool is_false_path_loop) const;) >> >> void kill(PhaseIterGVN& igvn) const; >> }; >> @@ -1158,10 +1156,11 @@ >> ClonePredicateToTargetLoop(LoopNode* target_loop_head, const NodeInLoopBody& node_in_loop_body, PhaseIdealLoop* phase); >> >> // Clones the provided Parse Predicate to the head of the current predicate chain at the target loop. >> - void clone_parse_predicate(const ParsePredicate& parse_predicate, bool is_false_path_loop) { >> + ParsePredicate clone_parse_predicate(const ParsePredicate& parse_predicate, bool is_false_path_loop) { >> ParsePredicate cloned_parse_predicate = parse_predicate.clone_to_unswitched_loop(_old_target_loop_entry, >> is_false_path_loop, _p... > > Thanks for the patch. I applied it and did some smoke testing. I think there's a mistake at the end: > > - _clone_predicate_to_true_path_loop.clone_parse_predicate(parse_predicate, false); > - _clone_predicate_to_false_path_loop.clone_parse_predicate(parse_predicate, true); > + clone_parse_predicate(parse_predicate, false); > + clone_parse_predicate(parse_predicate, true); > > and > > +void CloneUnswitchedLoopPredicatesVisitor::clone_parse_predicate(const ParsePredicate& parse_predicate, > + const bool is_false_path_loop) { > + const ParsePredicate cloned_parse_predicate = > + _clone_predicate_to_true_path_loop.clone_parse_predicate(parse_predicate, false); > + NOT_PRODUCT(cloned_parse_predicate.trace_cloned_parse_predicate(is_false_path_loop);) > +} > > lines added only use `_clone_predicate_to_true_path_loop` and not `_clone_predicate_to_false_path_loop`. Commit I pushed should fix that. Good catch, thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2103918467 From thartmann at openjdk.org Fri May 23 06:50:51 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 23 May 2025 06:50:51 GMT Subject: RFR: 8357468: [asan] heap buffer overflow reported in PcDesc::pc_offset() pcDesc.hpp:57 In-Reply-To: References: Message-ID: On Thu, 22 May 2025 23:43:09 GMT, Dean Long wrote: > This appears to be mostly harmless, but we should fix it anyway. The initial sentinel PcDesc has a pc_offset of -1. We can prevent looking before the sentinel by reversing the condition so that pc[0] is checked before pc[-1]. Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25404#pullrequestreview-2863375804 From thartmann at openjdk.org Fri May 23 06:57:51 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 23 May 2025 06:57:51 GMT Subject: RFR: 8354383: C2: enable sinking of Type nodes out of loop In-Reply-To: References: Message-ID: <2T83tcCFOvCp4BElLS5ufAb7RR2ZkEUFXPOWEOAhVYg=.ed4262e0-5f36-4f95-9c4c-b4f750b6e555@github.com> On Thu, 22 May 2025 15:53:18 GMT, Roland Westrelin wrote: > `PhaseIdealLoop::try_sink_out_of_loop()` excludes `Type` nodes because > we ran into some issues where a `Type` node is sunk and then becomes > `top` but the control path of its uses doesn't become unreachable. > > 8349479 should have fixed that so that exception no longer makes > sense. That looks good to me but given that we had quite a few bugs in that area in the past, I would suggest to only integrate into JDK 26 after the fork on June 05, 2025. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25396#pullrequestreview-2863393272 From chagedorn at openjdk.org Fri May 23 07:03:00 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 23 May 2025 07:03:00 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v28] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 14:08:20 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review A few minor last comments but otherwise, it looks good to me! Thanks for all the updates, the patience and the credit! src/hotspot/share/opto/loopnode.cpp line 1235: > 1233: // Template Assertion Predicates > 1234: // | > 1235: // Loop Suggestion: // Existing Hoisted // Check Predicates // | // New Short Running Long // Loop Predicate // | // Cloned Parse Predicates and // Template Assertion Predicates // | // Loop src/hotspot/share/opto/predicates.cpp line 86: > 84: > 85: ParsePredicate ParsePredicate::clone_to_loop(Node* new_control, const bool rewire_uncommon_proj_phi_inputs, > 86: PhaseIdealLoop* phase) const { Suggestion: ParsePredicate ParsePredicate::clone_to_loop(Node* new_control, const bool rewire_uncommon_proj_phi_inputs, PhaseIdealLoop* phase) const { test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningIntLoopWithLongChecksPredicates.java line 2: > 1: /* > 2: * Copyright (c) 2024, Red Hat, Inc. All rights reserved. Suggestion: * Copyright (c) 2025, Red Hat, Inc. All rights reserved. test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningIntLoopWithLongChecksPredicates.java line 39: > 37: // possible to optimize long RC. Finally unrolling happen which > 38: // require the Assert Predicates to have been properly copied when the > 39: // loop was transformed for the long range check. Suggestion: // require the Assertion Predicates to have been properly copied when // the loop was transformed for the long range check. test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningLongCountedLoop.java line 2: > 1: /* > 2: * Copyright (c) 2024, Red Hat, Inc. All rights reserved. Suggestion: * Copyright (c) 2025, Red Hat, Inc. All rights reserved. test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningLongCountedLoopPredicatesClone.java line 2: > 1: /* > 2: * Copyright (c) 2024, Red Hat, Inc. All rights reserved. Suggestion: * Copyright (c) 2025, Red Hat, Inc. All rights reserved. test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningLongCountedLoopScaleOverflow.java line 2: > 1: /* > 2: * Copyright (c) 2024, Red Hat, Inc. All rights reserved. Suggestion: * Copyright (c) 2025, Red Hat, Inc. All rights reserved. test/micro/org/openjdk/bench/java/lang/foreign/HeapMismatchManualLoopTest.java line 2: > 1: /* > 2: * Copyright (c) 2020, 2023, Oracle and/or its affiliates. All rights reserved. Should probably also be 2025? Suggestion: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21630#pullrequestreview-2863375615 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2103928708 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2103920931 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2103933089 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2103923180 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2103933286 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2103933791 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2103933982 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2103934808 From thartmann at openjdk.org Fri May 23 07:03:50 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 23 May 2025 07:03:50 GMT Subject: RFR: 8356989: Unexpected null in C2 compiled code In-Reply-To: References: Message-ID: On Thu, 22 May 2025 09:22:08 GMT, Roland Westrelin wrote: > In the test case, a non escaping array is initialized by an > `arraycopy` that uses this array as source and destination. Following > the `arraycopy`, one of the element of the array is tested for > `null`. That null check is constant folded to always `null` by escape > analysis. As I understand, the `Allocate` for the array should be > marked by EA as destination of an array copy. That state should then > be propagated by EA to uses and all destinations of an array copy > should be marked as unknown value. But EA has logic that explicitly > skips the case where an `ArrayCopy` has same source and > destination. Removing that logic fixes the failure. Looks good to me too. I'll run some testing and report back once it passed. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25389#pullrequestreview-2863405720 From epeter at openjdk.org Fri May 23 07:12:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 May 2025 07:12:59 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll [v2] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 15:34:15 GMT, Marc Chevalier wrote: >> This assert seems a bit too tight. See the JBS issue to check the math: the bound of `trip_count` should be `<= 2^31`, while the current bound is ` < (julong)max_juint/2` = floor((2^32-1)/2) = (2^32-2) / 2 = 2^31-1. > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - A test > - Merge branch 'master' into fix/do_unroll-assert > - Relax the assert src/hotspot/share/opto/loopTransform.cpp line 1911: > 1909: } else if (loop_head->has_exact_trip_count() && init->is_Con()) { > 1910: // We should not be here if we have old_trip_count == max_juint > 1911: // it would make trip_count == 2^31 which causes overflow and the situation is overall weird Can you say something a little more specific than "weird"? As a reader, it's not immediately clear what that may imply. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25295#discussion_r2103956123 From chagedorn at openjdk.org Fri May 23 07:15:51 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 23 May 2025 07:15:51 GMT Subject: RFR: 8354383: C2: enable sinking of Type nodes out of loop In-Reply-To: References: Message-ID: <6x8dbtEZAnT_977LpZL_0SLggYCz9Q8IgQYbYWkoQuI=.f2c683a2-7cf6-46c3-ae03-f1ee117efd75@github.com> On Thu, 22 May 2025 15:53:18 GMT, Roland Westrelin wrote: > `PhaseIdealLoop::try_sink_out_of_loop()` excludes `Type` nodes because > we ran into some issues where a `Type` node is sunk and then becomes > `top` but the control path of its uses doesn't become unreachable. > > 8349479 should have fixed that so that exception no longer makes > sense. Otherwise, looks reasonable to me, too. src/hotspot/share/opto/loopopts.cpp line 1688: > 1686: !n->is_OpaqueInitializedAssertionPredicate() && > 1687: !n->is_OpaqueTemplateAssertionPredicate() && > 1688: !n->is_Type()) { I cannot remember exactly, how often was it a problem without JDK-8349479? If it was more common, we might want to only allow it when `KillPathsReachableByDeadTypeNode` is set. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25396#pullrequestreview-2863435232 PR Review Comment: https://git.openjdk.org/jdk/pull/25396#discussion_r2103959595 From chagedorn at openjdk.org Fri May 23 07:19:52 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 23 May 2025 07:19:52 GMT Subject: RFR: 8356989: Unexpected null in C2 compiled code In-Reply-To: References: Message-ID: On Thu, 22 May 2025 09:22:08 GMT, Roland Westrelin wrote: > In the test case, a non escaping array is initialized by an > `arraycopy` that uses this array as source and destination. Following > the `arraycopy`, one of the element of the array is tested for > `null`. That null check is constant folded to always `null` by escape > analysis. As I understand, the `Allocate` for the array should be > marked by EA as destination of an array copy. That state should then > be propagated by EA to uses and all destinations of an array copy > should be marked as unknown value. But EA has logic that explicitly > skips the case where an `ArrayCopy` has same source and > destination. Removing that logic fixes the failure. Looks good to me, too. test/hotspot/jtreg/compiler/escapeAnalysis/TestArrayCopySameSrcDstInitializesNonEscapingArray.java line 29: > 27: * @summary Unexpected null in C2 compiled code > 28: * @run main/othervm -XX:-BackgroundCompilation TestArrayCopySameSrcDstInitializesNonEscapingArray > 29: * @run main/othervm TestArrayCopySameSrcDstInitializesNonEscapingArray `othervm` can be removed: Suggestion: * @run main TestArrayCopySameSrcDstInitializesNonEscapingArray test/hotspot/jtreg/compiler/escapeAnalysis/TestArrayCopySameSrcDstInitializesNonEscapingArray.java line 59: > 57: field = null; > 58: } > 59: Suggestion: ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25389#pullrequestreview-2863439351 PR Review Comment: https://git.openjdk.org/jdk/pull/25389#discussion_r2103962368 PR Review Comment: https://git.openjdk.org/jdk/pull/25389#discussion_r2103962530 From thartmann at openjdk.org Fri May 23 07:20:53 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 23 May 2025 07:20:53 GMT Subject: RFR: 8357568: IGV: Show NULL and numbers up to 4 characters in "Condense graph" filter [v2] In-Reply-To: References: <-wSA31a4omoLNUNuEBvmDcx3uv6-C5rNhCxNaO38pEE=.79e314cd-3400-4421-bc5a-2fc944bd117f@github.com> Message-ID: On Thu, 22 May 2025 18:05:54 GMT, Kim Barrett wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Increase number of parameters as suggested by Manuel > > src/hotspot/share/opto/idealGraphPrinter.cpp line 676: > >> 674: } else if (t->base() == Type::AnyPtr) { >> 675: if (t->is_ptr()->ptr() == TypePtr::Null) { >> 676: print_prop(short_name, "NULL"); > > I'm surprised this doesn't trip over sources/TestNoNULL.java. Actually it does, see failures in github actions testing: > Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: Test found 1 usages of 'NULL' in source files. See errors above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25393#discussion_r2103968467 From thartmann at openjdk.org Fri May 23 07:25:51 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 23 May 2025 07:25:51 GMT Subject: RFR: 8357568: IGV: Show NULL and numbers up to 4 characters in "Condense graph" filter [v2] In-Reply-To: <-wSA31a4omoLNUNuEBvmDcx3uv6-C5rNhCxNaO38pEE=.79e314cd-3400-4421-bc5a-2fc944bd117f@github.com> References: <-wSA31a4omoLNUNuEBvmDcx3uv6-C5rNhCxNaO38pEE=.79e314cd-3400-4421-bc5a-2fc944bd117f@github.com> Message-ID: On Thu, 22 May 2025 14:44:07 GMT, Christian Hagedorn wrote: >> When using the "Condense graph" filter in IGV, it would be useful to show `NULL` and numbers wider than 2 characters instead of `P` and `I/L` (fallback for larger numbers), respectively. There is a comment in `idealGrapPrinter.cpp` which says that maximally 2 chars are allowed for numbers: >> https://github.com/openjdk/jdk/blob/428d33ef3ca0af34d8f164fe9d9b722e81e866a7/src/hotspot/share/opto/idealGraphPrinter.cpp#L646 >> >> But we already allow larger entries today: >> ![image](https://github.com/user-attachments/assets/e90d0518-148f-4a33-a9e8-0bdca14aa017) >> >> I there propose to use `NULL` and allow up to 4 characters for numbers which could be a good trade-off between shortness and expressiveness. This allows us to quickly see null checks and larger constants. >> >> Without patch: >> ![image](https://github.com/user-attachments/assets/e450f1d2-503c-4b84-8137-25892f8ab7f9) >> >> >> With patch: >> ![image](https://github.com/user-attachments/assets/81371a53-be7e-4acd-afbf-e5613e96815a) >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Increase number of parameters as suggested by Manuel Looks good to me otherwise. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25393#pullrequestreview-2863460785 From chagedorn at openjdk.org Fri May 23 07:26:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 23 May 2025 07:26:53 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v2] In-Reply-To: References: <_y8LMhDg7b8EuWLikdsmgK0nUCCh0Y2PH0LX_-TpsD4=.fa9bb975-a850-4223-825e-726dcc5a74f2@github.com> Message-ID: On Thu, 22 May 2025 22:19:25 GMT, Vladimir Ivanov wrote: >> src/hotspot/share/opto/phasetype.hpp line 88: >> >>> 86: flags(PHASEIDEALLOOP2, "PhaseIdealLoop 2") \ >>> 87: flags(PHASEIDEALLOOP3, "PhaseIdealLoop 3") \ >>> 88: flags(OPTIMIZE_RF, "Optimize Reachability Fences") \ >> >> Another drive-by comment: I suggest to use the full word since most people are probably not aware of this abbreviation when looking at graph dumps in IGV. You should also add this phase to the IR framework [CompilePhases](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/CompilePhase.java). >> >> Suggestion: >> >> flags(OPTIMIZE_REACHABILITY_FENCES, "Optimize Reachability Fences") \ > > IMO it doesn't clarify things much. It's enum constant name and it's not shown in IGV. > > It is used in a single place [1] where it's clear what it refers to. > > [1] > > { // No more loop opts. It is safe to eliminate reachability fence nodes. > TracePhase tp(_t_idealLoop); > PhaseIdealLoop::optimize(igvn, LoopOptsEliminateRFs); > print_method(PHASE_OPTIMIZE_RF, 2); > if (failing()) return; > } That's a good point, there it really does not matter. Another thought I've just had: When you add it to the `CompilePhase` class in the IR framework and it would be nice to have the same name. There, it would be beneficial to have the full name since people then only see `CompilePhase::OPTIMIZE_RF`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2103978331 From mchevalier at openjdk.org Fri May 23 07:32:51 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 23 May 2025 07:32:51 GMT Subject: RFR: 8357568: IGV: Show NULL and numbers up to 4 characters in "Condense graph" filter [v2] In-Reply-To: <-wSA31a4omoLNUNuEBvmDcx3uv6-C5rNhCxNaO38pEE=.79e314cd-3400-4421-bc5a-2fc944bd117f@github.com> References: <-wSA31a4omoLNUNuEBvmDcx3uv6-C5rNhCxNaO38pEE=.79e314cd-3400-4421-bc5a-2fc944bd117f@github.com> Message-ID: On Thu, 22 May 2025 14:44:07 GMT, Christian Hagedorn wrote: >> When using the "Condense graph" filter in IGV, it would be useful to show `NULL` and numbers wider than 2 characters instead of `P` and `I/L` (fallback for larger numbers), respectively. There is a comment in `idealGrapPrinter.cpp` which says that maximally 2 chars are allowed for numbers: >> https://github.com/openjdk/jdk/blob/428d33ef3ca0af34d8f164fe9d9b722e81e866a7/src/hotspot/share/opto/idealGraphPrinter.cpp#L646 >> >> But we already allow larger entries today: >> ![image](https://github.com/user-attachments/assets/e90d0518-148f-4a33-a9e8-0bdca14aa017) >> >> I there propose to use `NULL` and allow up to 4 characters for numbers which could be a good trade-off between shortness and expressiveness. This allows us to quickly see null checks and larger constants. >> >> Without patch: >> ![image](https://github.com/user-attachments/assets/e450f1d2-503c-4b84-8137-25892f8ab7f9) >> >> >> With patch: >> ![image](https://github.com/user-attachments/assets/81371a53-be7e-4acd-afbf-e5613e96815a) >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Increase number of parameters as suggested by Manuel I like it. src/hotspot/share/opto/idealGraphPrinter.cpp line 657: > 655: > 656: // Only use up to 4 chars and fall back to a generic "L" to keep it short. > 657: if (value >= -999 && value <= 9999) { Just for the sake of saying something, can we somewhat factor this test? And ideally, it could be marginally nicer, if I want to see 6 chars, to just replace a 4 into a 6 and it'd work magically. But I don't really see a way beside length(to_string(value)) <= max_length, and it's not really efficient so... If you have a brilliant idea, good. Otherwise, feel free to ignore: changing it as it is now is easy too. ------------- Marked as reviewed by mchevalier (Committer). PR Review: https://git.openjdk.org/jdk/pull/25393#pullrequestreview-2863481391 PR Review Comment: https://git.openjdk.org/jdk/pull/25393#discussion_r2103989707 From chagedorn at openjdk.org Fri May 23 07:37:50 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 23 May 2025 07:37:50 GMT Subject: RFR: 8357568: IGV: Show NULL and numbers up to 4 characters in "Condense graph" filter [v2] In-Reply-To: References: <-wSA31a4omoLNUNuEBvmDcx3uv6-C5rNhCxNaO38pEE=.79e314cd-3400-4421-bc5a-2fc944bd117f@github.com> Message-ID: On Fri, 23 May 2025 07:18:35 GMT, Tobias Hartmann wrote: >> src/hotspot/share/opto/idealGraphPrinter.cpp line 676: >> >>> 674: } else if (t->base() == Type::AnyPtr) { >>> 675: if (t->is_ptr()->ptr() == TypePtr::Null) { >>> 676: print_prop(short_name, "NULL"); >> >> I'm surprised this doesn't trip over sources/TestNoNULL.java. > > Actually it does, see failures in github actions testing: >> Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: Test found 1 usages of 'NULL' in source files. See errors above. Interesting! You're right. I haven't checked the testing results, yet. Should we just change to `Null` or make an exclusion in `TestNoNULL`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25393#discussion_r2104000789 From kbarrett at openjdk.org Fri May 23 08:01:56 2025 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 23 May 2025 08:01:56 GMT Subject: RFR: 8357568: IGV: Show NULL and numbers up to 4 characters in "Condense graph" filter [v2] In-Reply-To: References: <-wSA31a4omoLNUNuEBvmDcx3uv6-C5rNhCxNaO38pEE=.79e314cd-3400-4421-bc5a-2fc944bd117f@github.com> Message-ID: <2Gzi0NMd1u_7TiRDwZYJrQ-nvb8FYtkz_7L8R0p5_gw=.19a91a5c-f266-47c3-b1f8-d4a34e3ac47a@github.com> On Fri, 23 May 2025 07:35:00 GMT, Christian Hagedorn wrote: >> Actually it does, see failures in github actions testing: >>> Failed. Execution failed: `main' threw exception: java.lang.RuntimeException: Test found 1 usages of 'NULL' in source files. See errors above. > > Interesting! You're right. I haven't checked the testing results, yet. Should we just change to `Null` or make an exclusion in `TestNoNULL`? I don't think this should be given an exclusion in TestNoNULL. If the printed name is supposed to be related to the compiler object `TypePtr::Null` then I suggest it should be "Null". If it's supposed to be related to the Java null value, then "null" seems appropriate. And in the diagram in the PR intro I see things printed as "NotNull" which might argue for "Null". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25393#discussion_r2104038585 From mchevalier at openjdk.org Fri May 23 08:09:36 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 23 May 2025 08:09:36 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll [v3] In-Reply-To: References: Message-ID: > This assert seems a bit too tight. See the JBS issue to check the math: the bound of `trip_count` should be `<= 2^31`, while the current bound is ` < (julong)max_juint/2` = floor((2^32-1)/2) = (2^32-2) / 2 = 2^31-1. Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25295/files - new: https://git.openjdk.org/jdk/pull/25295/files/9c44f068..4da34fca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25295&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25295&range=01-02 Stats: 13 lines in 2 files changed: 11 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25295/head:pull/25295 PR: https://git.openjdk.org/jdk/pull/25295 From mchevalier at openjdk.org Fri May 23 08:09:38 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 23 May 2025 08:09:38 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll [v2] In-Reply-To: References: Message-ID: <6FOXAuT4a0YIf6D0s6bUqpD2vsV-KUySZ8gVo7KK9PU=.e7e8b30f-8202-44ac-a8f6-b04e252f400d@github.com> On Fri, 23 May 2025 07:10:30 GMT, Emanuel Peter wrote: >> Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - A test >> - Merge branch 'master' into fix/do_unroll-assert >> - Relax the assert > > src/hotspot/share/opto/loopTransform.cpp line 1911: > >> 1909: } else if (loop_head->has_exact_trip_count() && init->is_Con()) { >> 1910: // We should not be here if we have old_trip_count == max_juint >> 1911: // it would make trip_count == 2^31 which causes overflow and the situation is overall weird > > Can you say something a little more specific than "weird"? As a reader, it's not immediately clear what that may imply. I've tried to explain the concerns better but tbh, it's also part of the weirdness: it's not clear how inconsistent the state would be and what would and should happen. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25295#discussion_r2104051975 From aph at openjdk.org Fri May 23 08:21:03 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 23 May 2025 08:21:03 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 09:52:22 GMT, Martin Doerr wrote: > Ah, thanks! I was not aware of that. That means the current implementation is probably wrong in some cases (mvc generated by gcc). Or is mvc only used in the single Byte aligned case? Yes, that's right, just for the byte-aligned case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2903653531 From duke at openjdk.org Fri May 23 08:28:54 2025 From: duke at openjdk.org (Anjian-Wen) Date: Fri, 23 May 2025 08:28:54 GMT Subject: RFR: 8357460: RISC-V: Optimize array fill stub for small size [v2] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 06:19:37 GMT, Feilong Jiang wrote: >> Please consider. >> As discussed in https://github.com/openjdk/jdk/pull/23890#discussion_r2094920943, we can also further optimize the array fill stub by unrolling the storage of values when the size is less than 8. >> >> This PR also removes the **aligned tail part** with the consideration of code size and testing coverage. As the test reveals there are no significant regressions. >> >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 7 avgt 12 27.215 ? 0.073 ns/op >> ArrayFill.fillByteArray 15 avgt 12 32.687 ? 0.904 ns/op >> ArrayFill.fillIntArray 7 avgt 12 28.629 ? 0.006 ns/op >> ArrayFill.fillIntArray 15 avgt 12 29.351 ? 0.009 ns/op >> ArrayFill.fillShortArray 7 avgt 12 30.776 ? 0.006 ns/op >> ArrayFill.fillShortArray 15 avgt 12 31.724 ? 0.447 ns/op >> ArrayFill.zeroByteArray 7 avgt 12 27.199 ? 0.006 ns/op >> ArrayFill.zeroByteArray 15 avgt 12 32.685 ? 0.900 ns/op >> ArrayFill.zeroIntArray 7 avgt 12 28.630 ? 0.007 ns/op >> ArrayFill.zeroIntArray 15 avgt 12 29.352 ? 0.011 ns/op >> ArrayFill.zeroShortArray 7 avgt 12 30.776 ? 0.006 ns/op >> ArrayFill.zeroShortArray 15 avgt 12 31.497 ? 0.012 ns/op >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 7 avgt 12 20.137 ? 0.042 ns/op >> ArrayFill.fillByteArray 15 avgt 12 32.928 ? 0.004 ns/op >> ArrayFill.fillIntArray 7 avgt 12 28.630 ? 0.004 ns/op >> ArrayFill.fillIntArray 15 avgt 12 29.344 ? 0.005 ns/op >> ArrayFill.fillShortArray 7 avgt 12 31.494 ? 0.004 ns/op >> ArrayFill.fillShortArray 15 avgt 12 31.492 ? 0.008 ns/op >> ArrayFill.zeroByteArray 7 avgt 12 19.980 ? 0.164 ns/op >> ArrayFill.zeroByteArray 15 avgt 12 32.927 ? 0.004 ns/op >> ArrayFill.zeroIntArray 7 avgt 12 28.629 ? 0.005 ns/op >> ArrayFill.zeroIntArray 15 avgt 12 29.346 ? 0.006 ns/op >> ArrayFill.zeroShortArray 7 avgt 12 32.193 ? 0.027 ns/op >> ArrayFill.zeroShortArray 15 avgt 12 31.495 ? 0.010 ns/op >> >> >> Testing: >> - [x] tier1 > > Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'openjdk:master' into riscv-optimize-generate-fill > - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-optimize-generate-fill > - optimize array fill stub for small size Marked as reviewed by Anjian-Wen at github.com (no known OpenJDK username). Looks good to me! ------------- PR Review: https://git.openjdk.org/jdk/pull/25350#pullrequestreview-2863636519 PR Comment: https://git.openjdk.org/jdk/pull/25350#issuecomment-2903676067 From chagedorn at openjdk.org Fri May 23 08:46:56 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 23 May 2025 08:46:56 GMT Subject: RFR: 8357568: IGV: Show NULL and numbers up to 4 characters in "Condense graph" filter [v2] In-Reply-To: <2Gzi0NMd1u_7TiRDwZYJrQ-nvb8FYtkz_7L8R0p5_gw=.19a91a5c-f266-47c3-b1f8-d4a34e3ac47a@github.com> References: <-wSA31a4omoLNUNuEBvmDcx3uv6-C5rNhCxNaO38pEE=.79e314cd-3400-4421-bc5a-2fc944bd117f@github.com> <2Gzi0NMd1u_7TiRDwZYJrQ-nvb8FYtkz_7L8R0p5_gw=.19a91a5c-f266-47c3-b1f8-d4a34e3ac47a@github.com> Message-ID: <2y1Gze-r-muBgGVNEtbtIZVs3Cny12fZFurAjcI_q8E=.33cd6967-548f-4cef-a295-6e98db46e4bf@github.com> On Fri, 23 May 2025 07:57:39 GMT, Kim Barrett wrote: >> Interesting! You're right. I haven't checked the testing results, yet. Should we just change to `Null` or make an exclusion in `TestNoNULL`? > > I don't think this should be given an exclusion in TestNoNULL. > > If the printed name is supposed to be related to the compiler object `TypePtr::Null` then I suggest it should > be "Null". If it's supposed to be related to the Java null value, then "null" seems appropriate. And in the > diagram in the PR intro I see things printed as "NotNull" which might argue for "Null". I guess `Null` is fine then. I only chose `NULL` to be in line with the other capitalized letters shown in IGV in the condensed view. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25393#discussion_r2104122586 From chagedorn at openjdk.org Fri May 23 09:07:17 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 23 May 2025 09:07:17 GMT Subject: RFR: 8357568: IGV: Show NULL and numbers up to 4 characters in "Condense graph" filter [v3] In-Reply-To: References: Message-ID: > When using the "Condense graph" filter in IGV, it would be useful to show `NULL` and numbers wider than 2 characters instead of `P` and `I/L` (fallback for larger numbers), respectively. There is a comment in `idealGrapPrinter.cpp` which says that maximally 2 chars are allowed for numbers: > https://github.com/openjdk/jdk/blob/428d33ef3ca0af34d8f164fe9d9b722e81e866a7/src/hotspot/share/opto/idealGraphPrinter.cpp#L646 > > But we already allow larger entries today: > ![image](https://github.com/user-attachments/assets/e90d0518-148f-4a33-a9e8-0bdca14aa017) > > I there propose to use `NULL` and allow up to 4 characters for numbers which could be a good trade-off between shortness and expressiveness. This allows us to quickly see null checks and larger constants. > > Without patch: > ![image](https://github.com/user-attachments/assets/e450f1d2-503c-4b84-8137-25892f8ab7f9) > > > With patch: > ![image](https://github.com/user-attachments/assets/81371a53-be7e-4acd-afbf-e5613e96815a) > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: better way to check nof chars, also print narrow oop null ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25393/files - new: https://git.openjdk.org/jdk/pull/25393/files/4f77d003..c4381a03 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25393&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25393&range=01-02 Stats: 7 lines in 1 file changed: 2 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25393.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25393/head:pull/25393 PR: https://git.openjdk.org/jdk/pull/25393 From chagedorn at openjdk.org Fri May 23 09:07:17 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 23 May 2025 09:07:17 GMT Subject: RFR: 8357568: IGV: Show NULL and numbers up to 4 characters in "Condense graph" filter [v2] In-Reply-To: References: <-wSA31a4omoLNUNuEBvmDcx3uv6-C5rNhCxNaO38pEE=.79e314cd-3400-4421-bc5a-2fc944bd117f@github.com> Message-ID: On Fri, 23 May 2025 07:30:18 GMT, Marc Chevalier wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Increase number of parameters as suggested by Manuel > > I like it. Pushed an update to use "Null", an update to count the number of chars written as suggested by @marc-chevalier, and also added a missing case when we just have object = null somewhere and having narrow oops. We currently miss this because we would have `t->base() == NarrowOop`. > src/hotspot/share/opto/idealGraphPrinter.cpp line 657: > >> 655: >> 656: // Only use up to 4 chars and fall back to a generic "L" to keep it short. >> 657: if (value >= -999 && value <= 9999) { > > Just for the sake of saying something, can we somewhat factor this test? And ideally, it could be marginally nicer, if I want to see 6 chars, to just replace a 4 into a 6 and it'd work magically. But I don't really see a way beside length(to_string(value)) <= max_length, and it's not really efficient so... If you have a brilliant idea, good. Otherwise, feel free to ignore: changing it as it is now is easy too. Good point! I think we can directly check how many chars are written with `snprintf_checked` and use that as an indicator. I pushed an update. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25393#issuecomment-2903777492 PR Review Comment: https://git.openjdk.org/jdk/pull/25393#discussion_r2104157456 From mchevalier at openjdk.org Fri May 23 09:26:52 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 23 May 2025 09:26:52 GMT Subject: RFR: 8357568: IGV: Show NULL and numbers up to 4 characters in "Condense graph" filter [v3] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 09:07:17 GMT, Christian Hagedorn wrote: >> When using the "Condense graph" filter in IGV, it would be useful to show `NULL` and numbers wider than 2 characters instead of `P` and `I/L` (fallback for larger numbers), respectively. There is a comment in `idealGrapPrinter.cpp` which says that maximally 2 chars are allowed for numbers: >> https://github.com/openjdk/jdk/blob/428d33ef3ca0af34d8f164fe9d9b722e81e866a7/src/hotspot/share/opto/idealGraphPrinter.cpp#L646 >> >> But we already allow larger entries today: >> ![image](https://github.com/user-attachments/assets/e90d0518-148f-4a33-a9e8-0bdca14aa017) >> >> I there propose to use `NULL` and allow up to 4 characters for numbers which could be a good trade-off between shortness and expressiveness. This allows us to quickly see null checks and larger constants. >> >> Without patch: >> ![image](https://github.com/user-attachments/assets/e450f1d2-503c-4b84-8137-25892f8ab7f9) >> >> >> With patch: >> ![image](https://github.com/user-attachments/assets/81371a53-be7e-4acd-afbf-e5613e96815a) >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > better way to check nof chars, also print narrow oop null Marked as reviewed by mchevalier (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25393#pullrequestreview-2863799888 From mhaessig at openjdk.org Fri May 23 09:30:14 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 23 May 2025 09:30:14 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v57] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 09:33:21 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Roberto Casta?eda Lozano Thank you, @eme64, for bearing with us. The API has come a long way since v1. I found some more typos, but otherwise it looks mighty fine. test/hotspot/jtreg/compiler/lib/template_framework/DataName.java line 32: > 30: * scope with {@link Template#addDataName}, and accessed with {@link Template#dataNames}, from where > 31: * count, list or even sample random {@link DataName}s. Every {@link DataName} has a {@link DataName.Type}, > 32: * so that sampling can be restricted to these types. Suggestion: * {@link DataName}s represent things like fields and local variables, and can be added to the local * scope with {@link Template#addDataName}, and accessed with {@link Template#dataNames}, to * count, list or even sample random {@link DataName}s. Every {@link DataName} has a {@link DataName.Type}, * so that sampling can be restricted to these types. The text flow was a bit weird. I hope this keeps the original meaning. test/hotspot/jtreg/compiler/lib/template_framework/Name.java line 35: > 33: * The name of the type, that can be used in code. > 34: * > 35: * @return The {@String} name of the name, that can be used in code. Suggestion: * The name of the name, that can be used in code. * * @return The {@String} name of the name, that can be used in code. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 150: > 148: * {@link Template#make(String, Function)}. For each number of arguments there is an implementation > 149: * (e.g. {@link Template.TwoArgs} for two arguments). This allows the use of Generics for the > 150: * Template argument types which enables type checking of the Template arguments. Suggestion: * (e.g. {@link Template.TwoArgs} for two arguments). This allows the use of generics for the * {@link Template} argument types which enables type checking of the {@link Template} arguments. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 214: > 212: *

> 213: * When working with {@link DataName}s and {@link StructuralName}s, it is important to be aware of the > 214: * relevant scopes, as well as the execution order of the {@link Template} lambdas, as well as the evaluation Suggestion: * relevant scopes, as well as the execution order of the {@link Template} lambdas and the evaluation Text flow nit test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 217: > 215: * of the {@link Template#body} tokens. When a {@link Template} is rendered, its lambda is invoked. In the > 216: * lambda, we generate the tokens, and create the {@link Template#body}. Once the lambda returns, the > 217: * tokens are evaluated one by one. While evaluating the tokens, the Renderer might encounter a nested Suggestion: * tokens are evaluated one by one. While evaluating the tokens, the {@link Renderer} might encounter a nested test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 696: > 694: * rendering a template with {@code render(fuel)} (e.g. {@link ZeroArgs#render(float)}). > 695: */ > 696: float DEFAULT_FUEL = 100.0f; Suggestion: static final float DEFAULT_FUEL = 100.0f; Is not mutated as far as i can tell. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 703: > 701: * with {@link #setFuelCost(float)} inside {@link #body(Object...)}. > 702: */ > 703: float DEFAULT_FUEL_COST = 10.0f; Suggestion: static final float DEFAULT_FUEL_COST = 10.0f; test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 761: > 759: * or if we also allow it to be mutated. > 760: * @param weight The weight of the {@link DataName}, which correlates to the probability > 761: * of this {@link DataName} being chosen when we sample. Should document the weight limit. Suggestion: * @param weight The weight of the {@link DataName}, which correlates to the probability * of this {@link DataName} being chosen when we sample. Must be smaller than 1000. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 770: > 768: } > 769: boolean mutable = mutability == DataName.Mutability.MUTABLE; > 770: return new AddNameToken(new DataName(name, type, mutable, weight)); "Input vaildation" for mutablity happens here for mutability, but inside the constructor for weight. Should both happen in the same place? test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 332: > 330: // The Hook is set for the Tokens inside the set braces. > 331: // As long as the hook is anchored, we can insert code into the hook, > 332: // here we can define static fields for example. Perhaps this comment should mention that the code is inserted at the point where `myHook.anchor` is located. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 627: > 625: // the current scope. > 626: var templateSample = Template.make("type", (DataName.Type type) -> body( > 627: let("name", dataNames(MUTABLE).exactOf(type).sample().name()), Suggestion: let("name", dataNames(MUTABLE_OR_IMMUTABLE).exactOf(type).sample().name()), To get all available names you also need immutable names. There is no difference in this case, but this being a tutorial, the semantics should match the explanations. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 636: > 634: var templateStatus = Template.make(() -> body( > 635: let("ints", dataNames(MUTABLE).exactOf(myInt).count()), > 636: let("longs", dataNames(MUTABLE).exactOf(myLong).count()), Suggestion: let("ints", dataNames(MUTABLE_OR_IMMUTABLE).exactOf(myInt).count()), let("longs", dataNames(MUTABLE_OR_IMMUTABLE).exactOf(myLong).count()), ------------- Changes requested by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2863614753 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104076338 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104100212 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104137360 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104147066 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104147982 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104164360 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104166142 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104126982 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104129597 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104181130 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104189394 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104190251 From mhaessig at openjdk.org Fri May 23 09:30:15 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 23 May 2025 09:30:15 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v57] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 08:51:09 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Roberto Casta?eda Lozano > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 150: > >> 148: * {@link Template#make(String, Function)}. For each number of arguments there is an implementation >> 149: * (e.g. {@link Template.TwoArgs} for two arguments). This allows the use of Generics for the >> 150: * Template argument types which enables type checking of the Template arguments. > > Suggestion: > > * (e.g. {@link Template.TwoArgs} for two arguments). This allows the use of generics for the > * {@link Template} argument types which enables type checking of the {@link Template} arguments. Perhaps `s/ Template / {@link Template} /g` might be good for consistency. > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 627: > >> 625: // the current scope. >> 626: var templateSample = Template.make("type", (DataName.Type type) -> body( >> 627: let("name", dataNames(MUTABLE).exactOf(type).sample().name()), > > Suggestion: > > let("name", dataNames(MUTABLE_OR_IMMUTABLE).exactOf(type).sample().name()), > > To get all available names you also need immutable names. There is no difference in this case, but this being a tutorial, the semantics should match the explanations. Then again, the mutability concept is introduced further down. I'm a bit torn... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104139554 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104195281 From shade at openjdk.org Fri May 23 09:31:52 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 23 May 2025 09:31:52 GMT Subject: RFR: 8357600: Patch nmethod flushing message to include more details In-Reply-To: References: Message-ID: <4BJrQqFckuixWGmZqmFzd-rTJ-mJrlo17pNl_mIWn-M=.d5f308f3-3aba-44ec-9ea7-d030d97a997e@github.com> On Thu, 22 May 2025 22:40:51 GMT, Cesar Soares Lucas wrote: > Please review this patch for adding more details to nmethod flushing message. These details are particularly important when investigating interaction of JVMCI compiled code and code cache flushing heuristics. > > Tested on Linux x64 with JTREG tier1-3 using fastdebug and release builds. A bit concerned about performance impact of this logging, especially since we are under `CodeCache_lock`. So I would suggest two improvements: 1. Maybe move logging before acquiring `CodeCache_lock`? Not sure if it is safe for various `CodeCache::*` getters. 2. Predicate the argument preparation/logging with: ``` LogTarget(Debug, codecache) lt; if (lt.is_enabled()) { ... ------------- PR Review: https://git.openjdk.org/jdk/pull/25402#pullrequestreview-2863813220 From epeter at openjdk.org Fri May 23 09:49:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 May 2025 09:49:47 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v58] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - static final - Apply suggestions from code review Co-authored-by: Manuel H?ssig ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/bd79554d..8892a3ac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=57 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=56-57 Stats: 9 lines in 3 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Fri May 23 09:49:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 May 2025 09:49:49 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v57] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 09:06:20 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Roberto Casta?eda Lozano > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 696: > >> 694: * rendering a template with {@code render(fuel)} (e.g. {@link ZeroArgs#render(float)}). >> 695: */ >> 696: float DEFAULT_FUEL = 100.0f; > > Suggestion: > > static final float DEFAULT_FUEL = 100.0f; > > Is not mutated as far as i can tell. applied! > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 703: > >> 701: * with {@link #setFuelCost(float)} inside {@link #body(Object...)}. >> 702: */ >> 703: float DEFAULT_FUEL_COST = 10.0f; > > Suggestion: > > static final float DEFAULT_FUEL_COST = 10.0f; applied! > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 770: > >> 768: } >> 769: boolean mutable = mutability == DataName.Mutability.MUTABLE; >> 770: return new AddNameToken(new DataName(name, type, mutable, weight)); > > "Input vaildation" for mutablity happens here for mutability, but inside the constructor for weight. Should both happen in the same place? `mutability` only applies to `DataName`. `weight` applies also to `StructuralName`, so I put it in the shared code. I'd rather avoid duplicating that code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104232423 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104232536 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104224150 From shade at openjdk.org Fri May 23 10:05:26 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 23 May 2025 10:05:26 GMT Subject: RFR: 8357473: Compilation spike leaves many CompileTasks in free list Message-ID: See bug for more discussion. This PR implements the "all the way" solution by removing the free list completely. It complements https://github.com/openjdk/jdk/pull/25364, and can go either first, or second. We will remerge the other one once either integrates. Additional testing: - [x] Linux x86_64 server fastdebug, `compiler` - [ ] Linux AArch64 server fastdebug, `all` ------------- Commit messages: - Comments and indenting - Basic deletion Changes: https://git.openjdk.org/jdk/pull/25409/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25409&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357473 Stats: 113 lines in 4 files changed: 14 ins; 62 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/25409.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25409/head:pull/25409 PR: https://git.openjdk.org/jdk/pull/25409 From epeter at openjdk.org Fri May 23 10:06:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 May 2025 10:06:11 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v57] In-Reply-To: References: Message-ID: <9h9r5j_cQJ9V2AHWdX-qwFKu64yXgDIErlCIcmrKVVA=.2bb8daa9-b616-4c01-902a-af91ac06463a@github.com> On Fri, 23 May 2025 09:15:52 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Roberto Casta?eda Lozano > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 332: > >> 330: // The Hook is set for the Tokens inside the set braces. >> 331: // As long as the hook is anchored, we can insert code into the hook, >> 332: // here we can define static fields for example. > > Perhaps this comment should mention that the code is inserted at the point where `myHook.anchor` is located. I though that's what I said ? I reformulated it, and hope it is clearer now. ![image](https://github.com/user-attachments/assets/fb51a98d-d0c6-44ae-a414-557c38ff9639) > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 636: > >> 634: var templateStatus = Template.make(() -> body( >> 635: let("ints", dataNames(MUTABLE).exactOf(myInt).count()), >> 636: let("longs", dataNames(MUTABLE).exactOf(myLong).count()), > > Suggestion: > > let("ints", dataNames(MUTABLE_OR_IMMUTABLE).exactOf(myInt).count()), > let("longs", dataNames(MUTABLE_OR_IMMUTABLE).exactOf(myLong).count()), Same here. Want to keep it simple for now, we cover the concept only later on. Added a comment for that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104255537 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104262459 From epeter at openjdk.org Fri May 23 10:06:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 May 2025 10:06:12 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v57] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 09:24:15 GMT, Manuel H?ssig wrote: >> test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 627: >> >>> 625: // the current scope. >>> 626: var templateSample = Template.make("type", (DataName.Type type) -> body( >>> 627: let("name", dataNames(MUTABLE).exactOf(type).sample().name()), >> >> Suggestion: >> >> let("name", dataNames(MUTABLE_OR_IMMUTABLE).exactOf(type).sample().name()), >> >> To get all available names you also need immutable names. There is no difference in this case, but this being a tutorial, the semantics should match the explanations. > > Then again, the mutability concept is introduced further down. I'm a bit torn... I'd like to keep it simple at that point, yes. But I can leave a comment saying that we will look into that later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104258195 From epeter at openjdk.org Fri May 23 10:12:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 May 2025 10:12:22 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v59] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more from Manuel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/8892a3ac..8acc300c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=58 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=57-58 Stats: 17 lines in 1 file changed: 9 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Fri May 23 10:12:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 May 2025 10:12:23 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v57] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 09:27:09 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Roberto Casta?eda Lozano > > Thank you, @eme64, for bearing with us. The API has come a long way since v1. > > I found some more typos, but otherwise it looks mighty fine. @mhaessig Thanks for looking at this again, and all the suggestions! I applied most, and responded to the rest :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2903947854 From duke at openjdk.org Fri May 23 10:17:05 2025 From: duke at openjdk.org (Axel Hultin) Date: Fri, 23 May 2025 10:17:05 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v59] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 10:12:22 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more from Manuel test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 331: > 329: // We anchor a Hook outside the main method, but inside the Class. > 330: // Anchoring a Hook creates a scope, spanning the braces of the > 331: // "anchor" call. Any Hool.insert that happens inside this scope Nit: `Hool.insert` -> `Hook.insert` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104280973 From azafari at openjdk.org Fri May 23 10:20:52 2025 From: azafari at openjdk.org (Afshin Zafari) Date: Fri, 23 May 2025 10:20:52 GMT Subject: RFR: 8352141: UBSAN: fix the left shift of negative value in relocInfo.cpp, internal_word_Relocation::pack_data_to() In-Reply-To: References: Message-ID: On Fri, 23 May 2025 00:26:55 GMT, Dean Long wrote: > I suspect that the `&= right_n_bits()` trick I proposed for 8352140 would also work for this case. It's a nice fix for the left-shift cases, IMO. I wasn't aware of `right_n_bits()`. +1 vote for. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24196#issuecomment-2903969449 From epeter at openjdk.org Fri May 23 10:22:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 May 2025 10:22:33 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v60] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/8acc300c..e380a156 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=59 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=58-59 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Fri May 23 10:22:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 May 2025 10:22:34 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v59] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 10:12:22 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more from Manuel test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 331: > 329: // We anchor a Hook outside the main method, but inside the Class. > 330: // Anchoring a Hook creates a scope, spanning the braces of the > 331: // "anchor" call. Any Hool.insert that happens inside this scope Suggestion: // "anchor" call. Any Hook.insert that happens inside this scope ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104285680 From epeter at openjdk.org Fri May 23 10:22:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 May 2025 10:22:34 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v59] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 10:17:34 GMT, Emanuel Peter wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more from Manuel > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 331: > >> 329: // We anchor a Hook outside the main method, but inside the Class. >> 330: // Anchoring a Hook creates a scope, spanning the braces of the >> 331: // "anchor" call. Any Hool.insert that happens inside this scope > > Suggestion: > > // "anchor" call. Any Hook.insert that happens inside this scope Fix this ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104286647 From mhaessig at openjdk.org Fri May 23 10:46:07 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 23 May 2025 10:46:07 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v57] In-Reply-To: References: Message-ID: <0C9cjq726UlMZZX2mPZkr8Vl303XNislrRW_PueNNWs=.59be81ac-f638-41d0-ba84-640cb547779c@github.com> On Fri, 23 May 2025 09:41:00 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 770: >> >>> 768: } >>> 769: boolean mutable = mutability == DataName.Mutability.MUTABLE; >>> 770: return new AddNameToken(new DataName(name, type, mutable, weight)); >> >> "Input vaildation" for mutablity happens here for mutability, but inside the constructor for weight. Should both happen in the same place? > > `mutability` only applies to `DataName`. `weight` applies also to `StructuralName`, so I put it in the shared code. I'd rather avoid duplicating that code. It is duplicated in the constructors of `DataName` and `StructuralName` respectively. If those checks are moved to `Name`, I do understand the separation. >> test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 332: >> >>> 330: // The Hook is set for the Tokens inside the set braces. >>> 331: // As long as the hook is anchored, we can insert code into the hook, >>> 332: // here we can define static fields for example. >> >> Perhaps this comment should mention that the code is inserted at the point where `myHook.anchor` is located. > > I though that's what I said ? > I reformulated it, and hope it is clearer now. > > ![image](https://github.com/user-attachments/assets/fb51a98d-d0c6-44ae-a414-557c38ff9639) That is as clear as can be. Thank you ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104318644 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104316322 From epeter at openjdk.org Fri May 23 10:46:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 May 2025 10:46:07 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v57] In-Reply-To: <0C9cjq726UlMZZX2mPZkr8Vl303XNislrRW_PueNNWs=.59be81ac-f638-41d0-ba84-640cb547779c@github.com> References: <0C9cjq726UlMZZX2mPZkr8Vl303XNislrRW_PueNNWs=.59be81ac-f638-41d0-ba84-640cb547779c@github.com> Message-ID: On Fri, 23 May 2025 10:39:22 GMT, Manuel H?ssig wrote: >> `mutability` only applies to `DataName`. `weight` applies also to `StructuralName`, so I put it in the shared code. I'd rather avoid duplicating that code. > > It is duplicated in the constructors of `DataName` and `StructuralName` respectively. If those checks are moved to `Name`, I do understand the separation. Oh. I should check my own code rather than just try to remember what I did a wile ago ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104321846 From epeter at openjdk.org Fri May 23 10:46:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 May 2025 10:46:08 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v57] In-Reply-To: References: <0C9cjq726UlMZZX2mPZkr8Vl303XNislrRW_PueNNWs=.59be81ac-f638-41d0-ba84-640cb547779c@github.com> Message-ID: On Fri, 23 May 2025 10:41:33 GMT, Emanuel Peter wrote: >> It is duplicated in the constructors of `DataName` and `StructuralName` respectively. If those checks are moved to `Name`, I do understand the separation. > > Oh. I should check my own code rather than just try to remember what I did a wile ago ? It's hard to move the verification to `Name`, because it is only an iterface... I'll move it to `add...Name` as you suggested. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104323907 From mhaessig at openjdk.org Fri May 23 10:46:08 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 23 May 2025 10:46:08 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v57] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 10:00:55 GMT, Emanuel Peter wrote: >> Then again, the mutability concept is introduced further down. I'm a bit torn... > > I'd like to keep it simple at that point, yes. But I can leave a comment saying that we will look into that later. Keeping it simple sounds good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2104320523 From epeter at openjdk.org Fri May 23 10:52:20 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 May 2025 10:52:20 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8344942-TemplateFramework-v3' of https://github.com/eme64/jdk into JDK-8344942-TemplateFramework-v3 - move verification ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/e380a156..3c4e1ce2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=60 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=59-60 Stats: 15 lines in 3 files changed: 8 ins; 6 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Fri May 23 10:52:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 May 2025 10:52:21 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v57] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 09:27:09 GMT, Manuel H?ssig wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review >> >> Co-authored-by: Roberto Casta?eda Lozano > > Thank you, @eme64, for bearing with us. The API has come a long way since v1. > > I found some more typos, but otherwise it looks mighty fine. @mhaessig Ok, I moved the verification code. I think I addressed everything, right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24217#issuecomment-2904043523 From mhaessig at openjdk.org Fri May 23 11:19:07 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 23 May 2025 11:19:07 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: <4oIKTOxwJqvGuFeD4XWMrXoTj5BglEjE8vS8ZpJVFCY=.43387593-c138-418f-b0b2-93f4a635b7d8@github.com> On Fri, 23 May 2025 10:52:20 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8344942-TemplateFramework-v3' of https://github.com/eme64/jdk into JDK-8344942-TemplateFramework-v3 > - move verification > I think I addressed everything, right? Indeed. Thank you! ------------- Marked as reviewed by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2864070602 From epeter at openjdk.org Fri May 23 13:26:34 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 May 2025 13:26:34 GMT Subject: RFR: 8357530: C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability Message-ID: <-K3da45mMblFzdAPag8RqzEKg7F2gM_nnN2HlGK36uk=.4d9855ad-b6ed-4a72-a712-242e5f1f93f8@github.com> I'm adding a diagnostic flag `AutoVectorizationOverrideProfitability`. The goal is that with it, we can systematically benchmark our Auto Vectorization profitability heuristics. In all cases, we run Auto Vectorization, including packing. - `0`: abort vectorization, as if it was not profitable. - `1`: default, use profitability heuristics to determine if we should vectorize. - `2`: always vectorize when possible, even if profitability heuristic would say that it is not profitable. In the future, we may change our heuristics. We may for example introduce a cost model [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093). But at any rate, we need this flag, so that we can override these profitability heuristics, even if just for benchmarking. I did not yet go through all of `SuperWord` to check if there may be other decisions that could go under this flag. If we find any later, we can still add them. Below, I'm showing how it helps to benchmark the some reduction cases we have been working on. And if you want a small test to experiement with, I have one at the end for you. **Note to reviewer:** This patch should not make any behavioral difference, i.e. with the default `AutoVectorizationOverrideProfitability=1` the behavior should be as before this patch. -------------------------------------- **Use-Case: investigate Reduction Heuristics** A while back, I have written a comprehensive benchmark for Reductions https://github.com/openjdk/jdk/pull/21032. I saw that some cases might possibly be profitable, but we have disabled vectorization because of a heuristic. This heuristic was added a long time ago. The observation at the time was that simple add and mul reductions were not profitable. - https://bugs.openjdk.org/browse/JDK-8078563 - https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2015-April/017740.html >From the comments, it becomes clear that "simple reductions" are not profitable, that's why we check if there are more work vectors than reduction vectors. But I'm not sure why 2-element reductions are deemed always not profitable. Maybe it fit the benchmarks at the time, but now with moving reductions out of the loop, this probably does not make sense any more, at least for int/long. But in the meantime, I have added an improvement, where we move int/long reductions out of the loop. We can do that because int/long reductions can be reordered. See https://github.com/openjdk/jdk/pull/13056 . We cannot do that with float/double reductions, because there we must keep the strict order of reductions. Otherwise we risk wrong rounding results. Since then, we have had multiple reports that simple reductions are not vectorized, and I am working on it: https://bugs.openjdk.org/browse/JDK-8307516 Running the reduction benchmarks from https://github.com/openjdk/jdk/pull/21032 (please have a look at it now, the results below are only going to be more complicated!), like this: make test TEST="micro:vm.compiler.VectorReduction2.WithSuperword" CONF=linux-x64 TEST_VM_OPTS="-XX:+UnlockDiagnosticVMOptions -XX:AutoVectorizationOverrideProfitability=2" I ran the experiments on my `x64 / AVX512` machine, and a `aarch64 / neon` machine. For each I ran with `SuperWord` disabled (`no`), and with `SuperWord` and `AutoVectorizationOverrideProfitability` set to 1 (default), 0 (abort vectorization), and 2 (force vectorization). ![image](https://github.com/user-attachments/assets/38f87e05-f179-42db-ab9a-42ace206ecc4) ![image](https://github.com/user-attachments/assets/bc56a4fd-a020-4108-9876-a082758d0c77) The orange `heuristic` tags show where the heuristic makes a difference - in this case we prevent vectorization even though it is would be faster. This is evidence that we need to update the heuristic. Interestingly, forcing vectorization in the `strict` cases did not lead to any performance drop. It seems that forced vectorization is only problematic in one case: `longMulSimple` on `aarch64`. I need to investigate. Generally, we do vectorize (if forced - they are 2-element vectors after all) at least some of the `long` cases (hand checked `longAddSimple`), but it seems it is just not very fast, no idea why. The problematic `longMulSimple` does also vectorize (if forced only), but it is consistently slow. The confusing part: `longMulDotProduct` should be even slower. But a quick investigation showed that we actually do not vectorize it, the packing algorithm gets confused about which multiplications to pack. I suspect that generally 2-element multiplication reduction is very slow on `neon / arch64`. We will have to be careful about that when we change the heuristic. **It is edge cases like these that make me nervous, and are the reason why I have not changed these heuristics sooner.** I would also have to investigate the impact on a few more platforms, especially on `AVX` and `AVX2`. With `x64` and `byte/char/short`, we never vectorize. Still, enabling `SuperWord` changes the level of unrolling, and it seems in some cases `SuperWord` enabled leads to over-unrolling, hence you see some slowdowns in some cases. We should investigate that as well. For now it is clear: this flag would be helpful for improving performance heuristics. --------------------------------------- **Example for the Flag** I played around with an example like this: java -XX:CompileCommand=compileonly,Test::test2 -XX:CompileCommand=TraceAutoVectorization,Test::test*,ALL -Xbatch -XX:AutoVectorizationOverrideProfitability=0 -XX:MaxVectorSize=64 Test.java public class Test { public static int[] a = new int[10_000]; public static void main(String[] args) { for (int i = 0; i < a.length; i++) { a[i] = (int)i; } for (int i = 0; i < 10_000; i++) { test1(); test2(a, a); } System.out.println("sum: " + test1()); } public static int test1() { int sum = 0; for (int i = 0; i < a.length; i++) { sum += a[i]; } return sum; } public static void test2(int[] a, int[] b) { for (int i = 0; i < a.length; i++) { a[i] = b[i]; } } } ------------- Commit messages: - fix little bug - Merge branch 'master' into JDK-8357530-SuperWordOverrideProfitability - improve test more - int tests - improve test - wip test - manual merge - more changes and printing - JDK-8357530 Changes: https://git.openjdk.org/jdk/pull/25387/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25387&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357530 Stats: 233 lines in 3 files changed: 225 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25387.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25387/head:pull/25387 PR: https://git.openjdk.org/jdk/pull/25387 From epeter at openjdk.org Fri May 23 13:42:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 23 May 2025 13:42:53 GMT Subject: RFR: 8357530: C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability In-Reply-To: <-K3da45mMblFzdAPag8RqzEKg7F2gM_nnN2HlGK36uk=.4d9855ad-b6ed-4a72-a712-242e5f1f93f8@github.com> References: <-K3da45mMblFzdAPag8RqzEKg7F2gM_nnN2HlGK36uk=.4d9855ad-b6ed-4a72-a712-242e5f1f93f8@github.com> Message-ID: On Thu, 22 May 2025 08:54:42 GMT, Emanuel Peter wrote: > I'm adding a diagnostic flag `AutoVectorizationOverrideProfitability`. The goal is that with it, we can systematically benchmark our Auto Vectorization profitability heuristics. In all cases, we run Auto Vectorization, including packing. > - `0`: abort vectorization, as if it was not profitable. > - `1`: default, use profitability heuristics to determine if we should vectorize. > - `2`: always vectorize when possible, even if profitability heuristic would say that it is not profitable. > > In the future, we may change our heuristics. We may for example introduce a cost model [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093). But at any rate, we need this flag, so that we can override these profitability heuristics, even if just for benchmarking. > > I did not yet go through all of `SuperWord` to check if there may be other decisions that could go under this flag. If we find any later, we can still add them. > > Below, I'm showing how it helps to benchmark the some reduction cases we have been working on. > > And if you want a small test to experiement with, I have one at the end for you. > > **Note to reviewer:** This patch should not make any behavioral difference, i.e. with the default `AutoVectorizationOverrideProfitability=1` the behavior should be as before this patch. > > -------------------------------------- > > **Use-Case: investigate Reduction Heuristics** > > A while back, I have written a comprehensive benchmark for Reductions https://github.com/openjdk/jdk/pull/21032. I saw that some cases might possibly be profitable, but we have disabled vectorization because of a heuristic. > > This heuristic was added a long time ago. The observation at the time was that simple add and mul reductions were not profitable. > - https://bugs.openjdk.org/browse/JDK-8078563 > - https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2015-April/017740.html > From the comments, it becomes clear that "simple reductions" are not profitable, that's why we check if there are more work vectors than reduction vectors. But I'm not sure why 2-element reductions are deemed always not profitable. Maybe it fit the benchmarks at the time, but now with moving reductions out of the loop, this probably does not make sense any more, at least for int/long. > > But in the meantime, I have added an improvement, where we move int/long reductions out of the loop. We can do that because int/long reductions can be reordered. See https://github.com/openjdk/jdk/pull/13056 . We cannot do that with float/double reductions,... @galderz You may be interested in these results ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25387#issuecomment-2904464156 From mhaessig at openjdk.org Fri May 23 13:47:02 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 23 May 2025 13:47:02 GMT Subject: RFR: 8357649: IGV: add block index to the supplemental node properties Message-ID: This PR adds the block index to IGV node properties as soon as the CFG has been scheduled. This is really handy when working on peepholes, where one has to work with block indices. ![Screenshot from 2025-05-23 15-35-29](https://github.com/user-attachments/assets/1a895e07-cf5f-4eed-afd0-08fe26cf0266) Testing: - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions?query=branch%3AJDK-8357649-block-index) - [ ] tier1, tier2, and some Oracle internal testing on Oracle supported platfors and OSs Shout out to @robcasloz for coming up with an initial version of this patch. ------------- Commit messages: - Add block_index to IGV nodes Changes: https://git.openjdk.org/jdk/pull/25414/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25414&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357649 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25414.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25414/head:pull/25414 PR: https://git.openjdk.org/jdk/pull/25414 From rcastanedalo at openjdk.org Fri May 23 13:55:57 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 23 May 2025 13:55:57 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill In-Reply-To: References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Thu, 22 May 2025 13:30:28 GMT, Jatin Bhateja wrote: > > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. > > > > > > Have you checked that these tests exercise `ZRuntimeCallSpill` significantly? Most tests in that directory seem to exercise C2's generated ZGC barriers, which use other spilling/restoring logic across runtime calls (`SaveLiveRegisters`). Also, I expect the register pressure in these test cases to be minimal, so it could be good to randomize register assignment to improve the testing effectiveness. Finally, `ZRuntimeCallSpill` is typically used in slow paths, which are rarely exercised in short-lived test cases. Have you considered altering the users of `ZRuntimeCallSpill` so that they are forced to always, or at least more often, enter the slow path, for testing purposes? [This PR](https://github.com/openjdk/jdk/pull/18967) did something similar in the context of C2 ZGC barriers. > > Intel SDE allows us to collect execution traces with _-itrace_execute_emulate_ and we found quite a lot of register save/ restorations around native method, there is already an existing test point for it https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/gcbarriers/UnsafeIntrinsicsTest.java OK, thanks for checking Jatin! Have you also checked whether, at least in some of the cases, some of the APX EGPRs are live across the runtime call (i.e. are defined before the call and used after the call), and whether the called runtime routine typically clobbers these registers? Knowing that this case is exercised in the test runs would be good to be confident about the correctness of the patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25351#issuecomment-2904504259 From rcastanedalo at openjdk.org Fri May 23 14:01:50 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 23 May 2025 14:01:50 GMT Subject: RFR: 8357649: IGV: add block index to the supplemental node properties In-Reply-To: References: Message-ID: <1JTbSsJoroyd0ahwIKoQd-vNwn9JJBusLc56GUVRQFM=.10684ed4-44f6-452e-aecc-c06f036b483c@github.com> On Fri, 23 May 2025 13:42:12 GMT, Manuel H?ssig wrote: > This PR adds the block index to IGV node properties as soon as the CFG has been scheduled. This is really handy when working on peepholes, where one has to work with block indices. > > ![Screenshot from 2025-05-23 15-35-29](https://github.com/user-attachments/assets/1a895e07-cf5f-4eed-afd0-08fe26cf0266) > > Testing: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions?query=branch%3AJDK-8357649-block-index) > - [ ] tier1, tier2, and some Oracle internal testing on Oracle supported platfors and OSs > > Shout out to @robcasloz for coming up with an initial version of this patch. Looks good, thanks! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25414#pullrequestreview-2864535936 From chagedorn at openjdk.org Fri May 23 15:01:50 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 23 May 2025 15:01:50 GMT Subject: RFR: 8357649: IGV: add block index to the supplemental node properties In-Reply-To: References: Message-ID: <2i_bTT-7CsREYxS4ET030wxON1EbiSbF-ndQ4GLr4FM=.82b6ece9-3bd3-47ba-b414-14eda17fb15c@github.com> On Fri, 23 May 2025 13:42:12 GMT, Manuel H?ssig wrote: > This PR adds the block index to IGV node properties as soon as the CFG has been scheduled. This is really handy when working on peepholes, where one has to work with block indices. > > ![Screenshot from 2025-05-23 15-35-29](https://github.com/user-attachments/assets/1a895e07-cf5f-4eed-afd0-08fe26cf0266) > > Testing: > - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions?query=branch%3AJDK-8357649-block-index) > - [ ] tier1, tier2, and some Oracle internal testing on Oracle supported platfors and OSs > > Shout out to @robcasloz for coming up with an initial version of this patch. Looks good to me, too! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25414#pullrequestreview-2864736108 From rcastanedalo at openjdk.org Fri May 23 15:09:52 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 23 May 2025 15:09:52 GMT Subject: RFR: 8357105: C2: compilation fails with "assert(false) failed: empty program detected during loop optimization" In-Reply-To: References: Message-ID: On Thu, 22 May 2025 15:19:08 GMT, Daniel Skantz wrote: > This pull request contains a fix for JDK-8357105. > > The problem is performing stacked string concatenation optimization between a pair of StringBuilder.append().toString()-links SB1 and SB2, where the parameter of an append call in SB2 has a complex dependency on the result of SB1, which in turn is replaced by top() during stringopts -- similar to JDK-8271341, which had a diamond if-structure using the result of SB1, while in this case the use is an unstable If. In the attached regression test, a live part of the graph gets optimized away during later phases and ultimately the whole graph vanishes. > > The proposed solution is to simply exclude this specific case. This bug has existed for a long time and stacked concats is a niche optimization. > > Testing: > Tier1-4. > > Extra testing: > Ran Tier1-4 with an instrumented build and observed that we do not disable stacked concatenation in any previously known case after the fix. The overall analysis and fix look good to me, I just have some minor style, test, and code comment suggestions. src/hotspot/share/opto/stringopts.cpp line 991: > 989: > 990: // A test which leads to an uncommon trap which could be safe. > 991: // If so, this trap will later be converted into a trap that restarts I suggest to make the safety condition clearer in the comment, something like this: Suggestion: // A test which leads to an uncommon trap. It is safe to convert the trap // into a trap that restarts at the beginning as long as its test does not // depend on intermediate results of the candidate chain. src/hotspot/share/opto/stringopts.cpp line 996: > 994: CallStaticJavaNode* call = otherproj->unique_out()->isa_CallStaticJava(); > 995: if (call != nullptr && call->_name != nullptr && strcmp(call->_name, "uncommon_trap") == 0) { > 996: // first check for dependency on a toString that is going away during stacked concats. Suggestion: // First check for dependency on a toString that is going away during stacked concats. src/hotspot/share/opto/stringopts.cpp line 998: > 996: // first check for dependency on a toString that is going away during stacked concats. > 997: if (_multiple && ((v1->is_Proj() && is_SB_toString(v1->in(0)) && ctrl_path.member(v1->in(0))) > 998: || (v2->is_Proj() && is_SB_toString(v2->in(0)) && ctrl_path.member(v2->in(0))))) { For consistency with the surrounding code, and to make the symmetry between the two cases more obvious: Suggestion: if (_multiple && ((v1->is_Proj() && is_SB_toString(v1->in(0)) && ctrl_path.member(v1->in(0))) || (v2->is_Proj() && is_SB_toString(v2->in(0)) && ctrl_path.member(v2->in(0))))) { test/hotspot/jtreg/compiler/stringopts/TestStackedConcatsAppendUncommonTrap.java line 30: > 28: * of the first StringBuilder chain is wired into an uncommon trap > 29: * located in the second one. > 30: * @run main/othervm compiler.stringopts.TestStackedConcatsAppendUncommonTrap Please add a second run that is constrained via JVM flags to be more stable and easier to analyze, using `-Xbatch`, `-XX:CompileOnly=...`, and perhaps `-XX:-TieredCompilation`. Using `-Xbatch` also allows you to reduce the number of warm-up iterations, many tests use `10_000`. ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25395#pullrequestreview-2864732620 PR Review Comment: https://git.openjdk.org/jdk/pull/25395#discussion_r2104781146 PR Review Comment: https://git.openjdk.org/jdk/pull/25395#discussion_r2104782240 PR Review Comment: https://git.openjdk.org/jdk/pull/25395#discussion_r2104785775 PR Review Comment: https://git.openjdk.org/jdk/pull/25395#discussion_r2104792453 From kvn at openjdk.org Fri May 23 15:13:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 23 May 2025 15:13:52 GMT Subject: RFR: 8357530: C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability In-Reply-To: <-K3da45mMblFzdAPag8RqzEKg7F2gM_nnN2HlGK36uk=.4d9855ad-b6ed-4a72-a712-242e5f1f93f8@github.com> References: <-K3da45mMblFzdAPag8RqzEKg7F2gM_nnN2HlGK36uk=.4d9855ad-b6ed-4a72-a712-242e5f1f93f8@github.com> Message-ID: On Thu, 22 May 2025 08:54:42 GMT, Emanuel Peter wrote: > I'm adding a diagnostic flag `AutoVectorizationOverrideProfitability`. The goal is that with it, we can systematically benchmark our Auto Vectorization profitability heuristics. In all cases, we run Auto Vectorization, including packing. > - `0`: abort vectorization, as if it was not profitable. > - `1`: default, use profitability heuristics to determine if we should vectorize. > - `2`: always vectorize when possible, even if profitability heuristic would say that it is not profitable. > > In the future, we may change our heuristics. We may for example introduce a cost model [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093). But at any rate, we need this flag, so that we can override these profitability heuristics, even if just for benchmarking. > > I did not yet go through all of `SuperWord` to check if there may be other decisions that could go under this flag. If we find any later, we can still add them. > > Below, I'm showing how it helps to benchmark the some reduction cases we have been working on. > > And if you want a small test to experiement with, I have one at the end for you. > > **Note to reviewer:** This patch should not make any behavioral difference, i.e. with the default `AutoVectorizationOverrideProfitability=1` the behavior should be as before this patch. > > -------------------------------------- > > **Use-Case: investigate Reduction Heuristics** > > A while back, I have written a comprehensive benchmark for Reductions https://github.com/openjdk/jdk/pull/21032. I saw that some cases might possibly be profitable, but we have disabled vectorization because of a heuristic. > > This heuristic was added a long time ago. The observation at the time was that simple add and mul reductions were not profitable. > - https://bugs.openjdk.org/browse/JDK-8078563 > - https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2015-April/017740.html > From the comments, it becomes clear that "simple reductions" are not profitable, that's why we check if there are more work vectors than reduction vectors. But I'm not sure why 2-element reductions are deemed always not profitable. Maybe it fit the benchmarks at the time, but now with moving reductions out of the loop, this probably does not make sense any more, at least for int/long. > > But in the meantime, I have added an improvement, where we move int/long reductions out of the loop. We can do that because int/long reductions can be reordered. See https://github.com/openjdk/jdk/pull/13056 . We cannot do that with float/double reductions,... This looks fine. One suggestion I have for separate RFE is to use UL for such outputs. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25387#pullrequestreview-2864779943 From sparasa at openjdk.org Fri May 23 15:37:02 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 23 May 2025 15:37:02 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v23] In-Reply-To: References: <6aZaHfVvUJFLz83fyZ42bnoSGseaRBYd0jEg_VLdS2Q=.4c681def-ee7c-4fcd-b147-348d317ac58f@github.com> <1e-92EcDWshsTiFbEmJt8z5SAVfhf5vpr8sgbEq3BbQ=.25d6d5f7-48d3-4a13-ac7d-8844844490fa@github.com> Message-ID: On Thu, 22 May 2025 20:44:59 GMT, Srinivas Vamsi Parasa wrote: >>> > Hi @vamsi-parasa , I don't see demotion tests being generated with full mode gtest, i.e. python3 x86-asmtest.py --full >>> >>> Please see the updated `x86-asmtest.py` refactored to work with full set (`--full`). Please let me know if anything is missing. >> >> Hi @vamsi-parasa , >> I am seeing some failures with --full mode when ENABLE_DEMOTION=False >> /home/jatinbha/sandboxes/apx-release/jdk/test/hotspot/gtest/x86/test_assembler_x86.cpp:61: Failure >> Failed >> __ ecmovq (Assembler::Condition::greater, r31, r31, Address(rcx, rdx, (Address::ScaleFactor)0, +0x3c8d1915)); >> OpenJDK: cc cc cc cc cc cc cc cc cc cc cc >> GNU Assembler: 62 64 84 10 4f bc 11 15 19 8d 3c >> [ FAILED ] AssemblerX86.validate_vm (13562 ms) >> [----------] 1 test from AssemblerX86 (13708 ms total) > > Hi Jatin (@jatin-bhateja), > > Incorporated the changes suggested for cpu_family and is_P6_or_later() and other minor changes. Please let me know if everything looks good. > > Thanks, > Vamsi > @vamsi-parasa Testing launched, ping me again in 24h :) Thanks Emanuel (@eme64)! Please let me know if there're are any issues with the tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2904838851 From fjiang at openjdk.org Fri May 23 15:38:36 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 23 May 2025 15:38:36 GMT Subject: RFR: 8357460: RISC-V: Optimize array fill stub for small size [v3] In-Reply-To: References: Message-ID: > Please consider. > As discussed in https://github.com/openjdk/jdk/pull/23890#discussion_r2094920943, we can also further optimize the array fill stub by unrolling the storage of values when the size is less than 8. > > This PR also removes the **aligned tail part** with the consideration of code size and testing coverage. As the test reveals there are no significant regressions. > > > Before: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 7 avgt 12 27.215 ? 0.073 ns/op > ArrayFill.fillByteArray 15 avgt 12 32.687 ? 0.904 ns/op > ArrayFill.fillIntArray 7 avgt 12 28.629 ? 0.006 ns/op > ArrayFill.fillIntArray 15 avgt 12 29.351 ? 0.009 ns/op > ArrayFill.fillShortArray 7 avgt 12 30.776 ? 0.006 ns/op > ArrayFill.fillShortArray 15 avgt 12 31.724 ? 0.447 ns/op > ArrayFill.zeroByteArray 7 avgt 12 27.199 ? 0.006 ns/op > ArrayFill.zeroByteArray 15 avgt 12 32.685 ? 0.900 ns/op > ArrayFill.zeroIntArray 7 avgt 12 28.630 ? 0.007 ns/op > ArrayFill.zeroIntArray 15 avgt 12 29.352 ? 0.011 ns/op > ArrayFill.zeroShortArray 7 avgt 12 30.776 ? 0.006 ns/op > ArrayFill.zeroShortArray 15 avgt 12 31.497 ? 0.012 ns/op > > After: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 7 avgt 12 20.137 ? 0.042 ns/op > ArrayFill.fillByteArray 15 avgt 12 32.928 ? 0.004 ns/op > ArrayFill.fillIntArray 7 avgt 12 28.630 ? 0.004 ns/op > ArrayFill.fillIntArray 15 avgt 12 29.344 ? 0.005 ns/op > ArrayFill.fillShortArray 7 avgt 12 31.494 ? 0.004 ns/op > ArrayFill.fillShortArray 15 avgt 12 31.492 ? 0.008 ns/op > ArrayFill.zeroByteArray 7 avgt 12 19.980 ? 0.164 ns/op > ArrayFill.zeroByteArray 15 avgt 12 32.927 ? 0.004 ns/op > ArrayFill.zeroIntArray 7 avgt 12 28.629 ? 0.005 ns/op > ArrayFill.zeroIntArray 15 avgt 12 29.346 ? 0.006 ns/op > ArrayFill.zeroShortArray 7 avgt 12 32.193 ? 0.027 ns/op > ArrayFill.zeroShortArray 15 avgt 12 31.495 ? 0.010 ns/op > > > Testing: > - [x] tier1 Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'openjdk:master' into riscv-optimize-generate-fill - Merge branch 'openjdk:master' into riscv-optimize-generate-fill - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-optimize-generate-fill - optimize array fill stub for small size ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25350/files - new: https://git.openjdk.org/jdk/pull/25350/files/79e09023..54bab0bb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25350&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25350&range=01-02 Stats: 591 lines in 28 files changed: 286 ins; 193 del; 112 mod Patch: https://git.openjdk.org/jdk/pull/25350.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25350/head:pull/25350 PR: https://git.openjdk.org/jdk/pull/25350 From fjiang at openjdk.org Fri May 23 15:38:36 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 23 May 2025 15:38:36 GMT Subject: RFR: 8357460: RISC-V: Optimize array fill stub for small size [v2] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 06:19:37 GMT, Feilong Jiang wrote: >> Please consider. >> As discussed in https://github.com/openjdk/jdk/pull/23890#discussion_r2094920943, we can also further optimize the array fill stub by unrolling the storage of values when the size is less than 8. >> >> This PR also removes the **aligned tail part** with the consideration of code size and testing coverage. As the test reveals there are no significant regressions. >> >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 7 avgt 12 27.215 ? 0.073 ns/op >> ArrayFill.fillByteArray 15 avgt 12 32.687 ? 0.904 ns/op >> ArrayFill.fillIntArray 7 avgt 12 28.629 ? 0.006 ns/op >> ArrayFill.fillIntArray 15 avgt 12 29.351 ? 0.009 ns/op >> ArrayFill.fillShortArray 7 avgt 12 30.776 ? 0.006 ns/op >> ArrayFill.fillShortArray 15 avgt 12 31.724 ? 0.447 ns/op >> ArrayFill.zeroByteArray 7 avgt 12 27.199 ? 0.006 ns/op >> ArrayFill.zeroByteArray 15 avgt 12 32.685 ? 0.900 ns/op >> ArrayFill.zeroIntArray 7 avgt 12 28.630 ? 0.007 ns/op >> ArrayFill.zeroIntArray 15 avgt 12 29.352 ? 0.011 ns/op >> ArrayFill.zeroShortArray 7 avgt 12 30.776 ? 0.006 ns/op >> ArrayFill.zeroShortArray 15 avgt 12 31.497 ? 0.012 ns/op >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 7 avgt 12 20.137 ? 0.042 ns/op >> ArrayFill.fillByteArray 15 avgt 12 32.928 ? 0.004 ns/op >> ArrayFill.fillIntArray 7 avgt 12 28.630 ? 0.004 ns/op >> ArrayFill.fillIntArray 15 avgt 12 29.344 ? 0.005 ns/op >> ArrayFill.fillShortArray 7 avgt 12 31.494 ? 0.004 ns/op >> ArrayFill.fillShortArray 15 avgt 12 31.492 ? 0.008 ns/op >> ArrayFill.zeroByteArray 7 avgt 12 19.980 ? 0.164 ns/op >> ArrayFill.zeroByteArray 15 avgt 12 32.927 ? 0.004 ns/op >> ArrayFill.zeroIntArray 7 avgt 12 28.629 ? 0.005 ns/op >> ArrayFill.zeroIntArray 15 avgt 12 29.346 ? 0.006 ns/op >> ArrayFill.zeroShortArray 7 avgt 12 32.193 ? 0.027 ns/op >> ArrayFill.zeroShortArray 15 avgt 12 31.495 ? 0.010 ns/op >> >> >> Testing: >> - [x] tier1 > > Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'openjdk:master' into riscv-optimize-generate-fill > - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-optimize-generate-fill > - optimize array fill stub for small size Here is the `ArrayFill` jmh result for the length from 1-7: Before: Benchmark (size) Mode Cnt Score Error Units ArrayFill.fillByteArray 1 avgt 12 20.052 ? 0.014 ns/op ArrayFill.fillByteArray 2 avgt 12 19.977 ? 0.049 ns/op ArrayFill.fillByteArray 3 avgt 12 21.474 ? 0.005 ns/op ArrayFill.fillByteArray 4 avgt 12 22.904 ? 0.005 ns/op ArrayFill.fillByteArray 5 avgt 12 24.336 ? 0.005 ns/op ArrayFill.fillByteArray 6 avgt 12 25.764 ? 0.001 ns/op ArrayFill.fillByteArray 7 avgt 12 27.199 ? 0.005 ns/op ArrayFill.fillByteArray 15 avgt 12 32.210 ? 0.005 ns/op ArrayFill.fillIntArray 1 avgt 12 21.191 ? 1.095 ns/op ArrayFill.fillIntArray 2 avgt 12 27.913 ? 0.004 ns/op ArrayFill.fillIntArray 3 avgt 12 28.628 ? 0.002 ns/op ArrayFill.fillIntArray 4 avgt 12 29.346 ? 0.005 ns/op ArrayFill.fillIntArray 5 avgt 12 29.348 ? 0.004 ns/op ArrayFill.fillIntArray 6 avgt 12 28.629 ? 0.005 ns/op ArrayFill.fillIntArray 7 avgt 12 28.636 ? 0.013 ns/op ArrayFill.fillIntArray 15 avgt 12 29.345 ? 0.007 ns/op ArrayFill.fillShortArray 1 avgt 12 19.474 ? 0.065 ns/op ArrayFill.fillShortArray 2 avgt 12 19.338 ? 0.058 ns/op ArrayFill.fillShortArray 3 avgt 12 20.143 ? 0.192 ns/op ArrayFill.fillShortArray 4 avgt 12 30.776 ? 0.004 ns/op ArrayFill.fillShortArray 5 avgt 12 30.778 ? 0.004 ns/op ArrayFill.fillShortArray 6 avgt 12 30.776 ? 0.006 ns/op ArrayFill.fillShortArray 7 avgt 12 30.779 ? 0.004 ns/op ArrayFill.fillShortArray 15 avgt 12 31.495 ? 0.005 ns/op ArrayFill.zeroByteArray 1 avgt 12 19.690 ? 0.288 ns/op ArrayFill.zeroByteArray 2 avgt 12 19.884 ? 0.093 ns/op ArrayFill.zeroByteArray 3 avgt 12 21.475 ? 0.005 ns/op ArrayFill.zeroByteArray 4 avgt 12 22.905 ? 0.005 ns/op ArrayFill.zeroByteArray 5 avgt 12 24.337 ? 0.005 ns/op ArrayFill.zeroByteArray 6 avgt 12 25.772 ? 0.011 ns/op ArrayFill.zeroByteArray 7 avgt 12 27.199 ? 0.004 ns/op ArrayFill.zeroByteArray 15 avgt 12 32.209 ? 0.005 ns/op ArrayFill.zeroIntArray 1 avgt 12 19.609 ? 0.414 ns/op ArrayFill.zeroIntArray 2 avgt 12 27.919 ? 0.006 ns/op ArrayFill.zeroIntArray 3 avgt 12 28.631 ? 0.005 ns/op ArrayFill.zeroIntArray 4 avgt 12 29.353 ? 0.014 ns/op ArrayFill.zeroIntArray 5 avgt 12 29.345 ? 0.005 ns/op ArrayFill.zeroIntArray 6 avgt 12 28.632 ? 0.005 ns/op ArrayFill.zeroIntArray 7 avgt 12 28.630 ? 0.004 ns/op ArrayFill.zeroIntArray 15 avgt 12 29.362 ? 0.030 ns/op ArrayFill.zeroShortArray 1 avgt 12 20.099 ? 0.102 ns/op ArrayFill.zeroShortArray 2 avgt 12 19.563 ? 0.452 ns/op ArrayFill.zeroShortArray 3 avgt 12 20.198 ? 0.443 ns/op ArrayFill.zeroShortArray 4 avgt 12 30.776 ? 0.004 ns/op ArrayFill.zeroShortArray 5 avgt 12 30.775 ? 0.004 ns/op ArrayFill.zeroShortArray 6 avgt 12 30.777 ? 0.006 ns/op ArrayFill.zeroShortArray 7 avgt 12 30.776 ? 0.005 ns/op ArrayFill.zeroShortArray 15 avgt 12 31.492 ? 0.005 ns/op After: Benchmark (size) Mode Cnt Score Error Units ArrayFill.fillByteArray 1 avgt 12 19.442 ? 0.031 ns/op ArrayFill.fillByteArray 2 avgt 12 19.324 ? 0.001 ns/op ArrayFill.fillByteArray 3 avgt 12 19.326 ? 0.003 ns/op ArrayFill.fillByteArray 4 avgt 12 19.324 ? 0.002 ns/op ArrayFill.fillByteArray 5 avgt 12 19.566 ? 0.452 ns/op ArrayFill.fillByteArray 6 avgt 12 19.327 ? 0.004 ns/op ArrayFill.fillByteArray 7 avgt 12 20.146 ? 0.039 ns/op ArrayFill.fillByteArray 15 avgt 12 32.924 ? 0.005 ns/op ArrayFill.fillIntArray 1 avgt 12 20.040 ? 0.003 ns/op ArrayFill.fillIntArray 2 avgt 12 28.151 ? 0.449 ns/op ArrayFill.fillIntArray 3 avgt 12 28.634 ? 0.003 ns/op ArrayFill.fillIntArray 4 avgt 12 29.348 ? 0.005 ns/op ArrayFill.fillIntArray 5 avgt 12 29.338 ? 0.010 ns/op ArrayFill.fillIntArray 6 avgt 12 28.631 ? 0.007 ns/op ArrayFill.fillIntArray 7 avgt 12 28.629 ? 0.005 ns/op ArrayFill.fillIntArray 15 avgt 12 29.347 ? 0.006 ns/op ArrayFill.fillShortArray 1 avgt 12 20.675 ? 0.058 ns/op ArrayFill.fillShortArray 2 avgt 12 20.624 ? 0.942 ns/op ArrayFill.fillShortArray 3 avgt 12 19.852 ? 0.337 ns/op ArrayFill.fillShortArray 4 avgt 12 30.777 ? 0.005 ns/op ArrayFill.fillShortArray 5 avgt 12 30.538 ? 0.453 ns/op ArrayFill.fillShortArray 6 avgt 12 30.776 ? 0.005 ns/op ArrayFill.fillShortArray 7 avgt 12 31.493 ? 0.004 ns/op ArrayFill.fillShortArray 15 avgt 12 31.494 ? 0.005 ns/op ArrayFill.zeroByteArray 1 avgt 12 19.423 ? 0.018 ns/op ArrayFill.zeroByteArray 2 avgt 12 19.327 ? 0.003 ns/op ArrayFill.zeroByteArray 3 avgt 12 19.327 ? 0.003 ns/op ArrayFill.zeroByteArray 4 avgt 12 19.327 ? 0.003 ns/op ArrayFill.zeroByteArray 5 avgt 12 19.802 ? 0.452 ns/op ArrayFill.zeroByteArray 6 avgt 12 19.326 ? 0.003 ns/op ArrayFill.zeroByteArray 7 avgt 12 19.891 ? 0.139 ns/op ArrayFill.zeroByteArray 15 avgt 12 33.170 ? 0.464 ns/op ArrayFill.zeroIntArray 1 avgt 12 19.983 ? 0.112 ns/op ArrayFill.zeroIntArray 2 avgt 12 27.914 ? 0.004 ns/op ArrayFill.zeroIntArray 3 avgt 12 28.629 ? 0.004 ns/op ArrayFill.zeroIntArray 4 avgt 12 29.346 ? 0.004 ns/op ArrayFill.zeroIntArray 5 avgt 12 29.346 ? 0.005 ns/op ArrayFill.zeroIntArray 6 avgt 12 28.629 ? 0.003 ns/op ArrayFill.zeroIntArray 7 avgt 12 28.627 ? 0.003 ns/op ArrayFill.zeroIntArray 15 avgt 12 29.354 ? 0.018 ns/op ArrayFill.zeroShortArray 1 avgt 12 19.818 ? 0.339 ns/op ArrayFill.zeroShortArray 2 avgt 12 19.325 ? 0.003 ns/op ArrayFill.zeroShortArray 3 avgt 12 19.325 ? 0.003 ns/op ArrayFill.zeroShortArray 4 avgt 12 30.777 ? 0.005 ns/op ArrayFill.zeroShortArray 5 avgt 12 30.777 ? 0.005 ns/op ArrayFill.zeroShortArray 6 avgt 12 30.776 ? 0.006 ns/op ArrayFill.zeroShortArray 7 avgt 12 31.732 ? 0.905 ns/op ArrayFill.zeroShortArray 15 avgt 12 31.492 ? 0.003 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/25350#issuecomment-2904829785 From kvn at openjdk.org Fri May 23 15:57:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 23 May 2025 15:57:53 GMT Subject: RFR: 8357581: [JVMCI] Add HotSpotProfilingInfo [v2] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 06:19:42 GMT, Doug Simon wrote: >> Graal is adding enhanced logic to detect deoptimization cycles and needs to be able to query a method's decompilation counter (i.e. `MethodData::_compiler_counters._nof_decompiles`). >> This PR adds the `HotSpotProfilingInfo` interface so that such HotSpot-specific profiling info can be accessed. >> The change looks bigger in the GitHub review UI than it really is. I have simply renamed the pre-existing `HotSpotProfilingInfo` private class as `HotSpotProfilingInfoImpl` and repurposed the `HotSpotProfilingInfo` name for the *new* public interface. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > fix copyright Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25397#pullrequestreview-2864916858 From kvn at openjdk.org Fri May 23 16:06:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 23 May 2025 16:06:52 GMT Subject: RFR: 8357473: Compilation spike leaves many CompileTasks in free list In-Reply-To: References: Message-ID: On Fri, 23 May 2025 09:42:17 GMT, Aleksey Shipilev wrote: > See bug for more discussion. > > This PR implements the "all the way" solution by removing the free list completely. It complements https://github.com/openjdk/jdk/pull/25364, and can go either first, or second. We will remerge the other one once either integrates. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler` > - [ ] Linux AArch64 server fastdebug, `all` Did you do performance compare? src/hotspot/share/compiler/compileTask.cpp line 81: > 79: > 80: CompileTask::~CompileTask() { > 81: assert(!lock()->is_locked(), "Should not be locked when freed"); Do we need to free `_lock` ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25409#issuecomment-2904951634 PR Review Comment: https://git.openjdk.org/jdk/pull/25409#discussion_r2104908263 From rriggs at openjdk.org Fri May 23 16:15:56 2025 From: rriggs at openjdk.org (Roger Riggs) Date: Fri, 23 May 2025 16:15:56 GMT Subject: RFR: 8355223: Improve documentation on @IntrinsicCandidate [v7] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 21:31:16 GMT, Chen Liang wrote: >> In offline discussion, we noted that the documentation on this annotation does not recommend minimizing the intrinsified section and moving whatever can be done in Java to Java; thus I prepared this documentation update, to shrink a "TLDR" essay to something concise for readers, such as pointing to that list at `vmIntrinsics.hpp` instead of "a list". > > Chen Liang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - More review updates > - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate > - Move intrinsic to be a subsection; just one most common function of the annotation > - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate > - Merge branch 'master' of https://github.com/openjdk/jdk into doc/intrinsic-candidate > - Update src/java.base/share/classes/jdk/internal/vm/annotation/IntrinsicCandidate.java > > Co-authored-by: Raffaello Giulietti > - Shorter first sentence > - Updates, thanks to John > - Refine validation and defensive copying > - 8355223: Improve documentation on @IntrinsicCandidate The bulk of these comments belong in a design/rational doc for VM intrinsics, not in the javadoc of an annotation. ------------- PR Review: https://git.openjdk.org/jdk/pull/24777#pullrequestreview-2864975777 From dnsimon at openjdk.org Fri May 23 16:33:04 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 23 May 2025 16:33:04 GMT Subject: RFR: 8357581: [JVMCI] Add HotSpotProfilingInfo [v2] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 06:19:42 GMT, Doug Simon wrote: >> Graal is adding enhanced logic to detect deoptimization cycles and needs to be able to query a method's decompilation counter (i.e. `MethodData::_compiler_counters._nof_decompiles`). >> This PR adds the `HotSpotProfilingInfo` interface so that such HotSpot-specific profiling info can be accessed. >> The change looks bigger in the GitHub review UI than it really is. I have simply renamed the pre-existing `HotSpotProfilingInfo` private class as `HotSpotProfilingInfoImpl` and repurposed the `HotSpotProfilingInfo` name for the *new* public interface. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > fix copyright Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25397#issuecomment-2905036890 From dnsimon at openjdk.org Fri May 23 16:33:05 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 23 May 2025 16:33:05 GMT Subject: Integrated: 8357581: [JVMCI] Add HotSpotProfilingInfo In-Reply-To: References: Message-ID: On Thu, 22 May 2025 17:12:34 GMT, Doug Simon wrote: > Graal is adding enhanced logic to detect deoptimization cycles and needs to be able to query a method's decompilation counter (i.e. `MethodData::_compiler_counters._nof_decompiles`). > This PR adds the `HotSpotProfilingInfo` interface so that such HotSpot-specific profiling info can be accessed. > The change looks bigger in the GitHub review UI than it really is. I have simply renamed the pre-existing `HotSpotProfilingInfo` private class as `HotSpotProfilingInfoImpl` and repurposed the `HotSpotProfilingInfo` name for the *new* public interface. This pull request has now been integrated. Changeset: 2b6b7661 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/2b6b7661b949971fe776714795d7dd46ed343cde Stats: 235 lines in 5 files changed: 17 ins; 194 del; 24 mod 8357581: [JVMCI] Add HotSpotProfilingInfo Reviewed-by: kvn, never ------------- PR: https://git.openjdk.org/jdk/pull/25397 From never at openjdk.org Fri May 23 17:14:57 2025 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 23 May 2025 17:14:57 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v16] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 20:23:22 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with three additional commits since the last revision: > > - Update tests > - Exclude JVMCI methods > - Create nmethod relocation stress test I'd like to clarify a bit what's actually done here. Some JVMCI compilation can have an associated instance of InstalledCode that has value written into it by hotspot that point at the nmethod* and the verified entry point. If the mirror object is reclaimed by the garbage collector before the nmethod dies, the mirror field will be cleared. Graal may read those fields but will never write them. JVMCI compilations initiated by the CompileBroker will never have an associated mirror. The mirror object is associated with the method at construction time and will never be changed. So it's not necessary to exclude all JVMCI compiled nmethods from this relocation, only ones which have a non-null mirror object. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2905181373 From shade at openjdk.org Fri May 23 17:20:57 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 23 May 2025 17:20:57 GMT Subject: RFR: 8357473: Compilation spike leaves many CompileTasks in free list In-Reply-To: References: Message-ID: <5iF7f0nfb51TJZT9YJUd_I4opw3iH-I7LOOMPZYYtAg=.155b676c-e12c-4b03-a296-51d0cb86e92c@github.com> On Fri, 23 May 2025 16:03:55 GMT, Vladimir Kozlov wrote: >> See bug for more discussion. >> >> This PR implements the "all the way" solution by removing the free list completely. It complements https://github.com/openjdk/jdk/pull/25364, and can go either first, or second. We will remerge the other one once either integrates. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler` >> - [ ] Linux AArch64 server fastdebug, `all` > > src/hotspot/share/compiler/compileTask.cpp line 81: > >> 79: >> 80: CompileTask::~CompileTask() { >> 81: assert(!lock()->is_locked(), "Should not be locked when freed"); > > Do we need to free `_lock` ? Ah yes, duh! I cherry-picked this hunk from my other PR that moves this lock to global, so it does not require delete. For this PR, we need `delete lock;` as well. Will add. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25409#discussion_r2105079218 From shade at openjdk.org Fri May 23 17:34:36 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 23 May 2025 17:34:36 GMT Subject: RFR: 8357473: Compilation spike leaves many CompileTasks in free list [v2] In-Reply-To: References: Message-ID: > See bug for more discussion. > > This PR implements the "all the way" solution by removing the free list completely. It complements https://github.com/openjdk/jdk/pull/25364, and can go either first, or second. We will remerge the other one once either integrates. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler` > - [ ] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into JDK-8357473-compile-task-free-list - Also free the lock! - Comments and indenting - Basic deletion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25409/files - new: https://git.openjdk.org/jdk/pull/25409/files/b9319107..c26b8ae8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25409&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25409&range=00-01 Stats: 562 lines in 18 files changed: 286 ins; 177 del; 99 mod Patch: https://git.openjdk.org/jdk/pull/25409.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25409/head:pull/25409 PR: https://git.openjdk.org/jdk/pull/25409 From cslucas at openjdk.org Fri May 23 18:26:51 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 23 May 2025 18:26:51 GMT Subject: RFR: 8357600: Patch nmethod flushing message to include more details In-Reply-To: <4BJrQqFckuixWGmZqmFzd-rTJ-mJrlo17pNl_mIWn-M=.d5f308f3-3aba-44ec-9ea7-d030d97a997e@github.com> References: <4BJrQqFckuixWGmZqmFzd-rTJ-mJrlo17pNl_mIWn-M=.d5f308f3-3aba-44ec-9ea7-d030d97a997e@github.com> Message-ID: On Fri, 23 May 2025 09:29:24 GMT, Aleksey Shipilev wrote: >> Please review this patch for adding more details to nmethod flushing message. These details are particularly important when investigating interaction of JVMCI compiled code and code cache flushing heuristics. >> >> Tested on Linux x64 with JTREG tier1-3 using fastdebug and release builds. > > A bit concerned about performance impact of this logging, especially since we are under `CodeCache_lock`. So I would suggest two improvements: > > 1. Maybe move logging before acquiring `CodeCache_lock`? Not sure if it is safe for various `CodeCache::*` getters. > > 2. Predicate the argument preparation/logging with: > > ``` > LogTarget(Debug, codecache) lt; > if (lt.is_enabled()) { > ... Thank you for the comments @shipilev . TBH I totally overlooked that lock there and I didn't know about the `LogTarget::is_enabled()` ------------- PR Comment: https://git.openjdk.org/jdk/pull/25402#issuecomment-2905412473 From shade at openjdk.org Fri May 23 18:42:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 23 May 2025 18:42:55 GMT Subject: RFR: 8357473: Compilation spike leaves many CompileTasks in free list In-Reply-To: References: Message-ID: On Fri, 23 May 2025 16:03:05 GMT, Vladimir Kozlov wrote: > Did you do performance compare? Both issues in `CompileTasks` are about footprint. I expect no significant difference on throughput, given how few compile tasks we normally allocate. On my ad-hoc runs it is more or less a wash: $ taskset -c 0-3 hyperfine -w 30 -r 300 \ "build/linux-x86_64-server-release/images/jdk/bin/java -XX:TieredStopAtLevel=1 Hello.java" # Baseline Time (mean ? ?): 353.4 ms ? 1.8 ms [User: 482.3 ms, System: 88.3 ms] Range (min ? max): 348.0 ms ? 358.6 ms 300 runs # No free list -- this PR Time (mean ? ?): 354.5 ms ? 1.6 ms [User: 482.8 ms, System: 89.4 ms] Range (min ? max): 350.4 ms ? 360.1 ms 300 runs # Global blocking mutex -- #25364 Time (mean ? ?): 354.3 ms ? 1.7 ms [User: 482.1 ms, System: 89.4 ms] Range (min ? max): 350.0 ms ? 359.8 ms 300 runs # This PR + #25364 Time (mean ? ?): 354.4 ms ? 1.8 ms [User: 482.2 ms, System: 88.6 ms] Range (min ? max): 349.8 ms ? 360.8 ms 300 runs ------------- PR Comment: https://git.openjdk.org/jdk/pull/25409#issuecomment-2905450866 From dlong at openjdk.org Fri May 23 19:31:56 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 23 May 2025 19:31:56 GMT Subject: RFR: 8357468: [asan] heap buffer overflow reported in PcDesc::pc_offset() pcDesc.hpp:57 In-Reply-To: References: Message-ID: <85Yuyrexb1H0cnjHoDMl-7dNp9SD7k5_paavrVO1uJM=.c5072a8f-2019-4046-bc8f-18ac1461095b@github.com> On Thu, 22 May 2025 23:43:09 GMT, Dean Long wrote: > This appears to be mostly harmless, but we should fix it anyway. The initial sentinel PcDesc has a pc_offset of -1. We can prevent looking before the sentinel by reversing the condition so that pc[0] is checked before pc[-1]. Thanks Tobias. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25404#issuecomment-2905581315 From dlong at openjdk.org Fri May 23 19:31:57 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 23 May 2025 19:31:57 GMT Subject: Integrated: 8357468: [asan] heap buffer overflow reported in PcDesc::pc_offset() pcDesc.hpp:57 In-Reply-To: References: Message-ID: On Thu, 22 May 2025 23:43:09 GMT, Dean Long wrote: > This appears to be mostly harmless, but we should fix it anyway. The initial sentinel PcDesc has a pc_offset of -1. We can prevent looking before the sentinel by reversing the condition so that pc[0] is checked before pc[-1]. This pull request has now been integrated. Changeset: 66747710 Author: Dean Long URL: https://git.openjdk.org/jdk/commit/66747710a49ea6a78aee94d3a3ec6a24b7cc36e5 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8357468: [asan] heap buffer overflow reported in PcDesc::pc_offset() pcDesc.hpp:57 Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/25404 From kvn at openjdk.org Fri May 23 20:29:51 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 23 May 2025 20:29:51 GMT Subject: RFR: 8357473: Compilation spike leaves many CompileTasks in free list [v2] In-Reply-To: References: Message-ID: <0q_Bc09m4PmQavzyfyNUlhZ93UrYdSO0IM2wu7euql8=.58334ed4-ebbc-4105-b96a-db54d046848a@github.com> On Fri, 23 May 2025 17:34:36 GMT, Aleksey Shipilev wrote: >> See bug for more discussion. >> >> This PR implements the "all the way" solution by removing the free list completely. It complements https://github.com/openjdk/jdk/pull/25364, and can go either first, or second. We will remerge the other one once either integrates. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler` >> - [ ] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8357473-compile-task-free-list > - Also free the lock! > - Comments and indenting > - Basic deletion Good. Let me test it. ------------- PR Review: https://git.openjdk.org/jdk/pull/25409#pullrequestreview-2865636409 From duke at openjdk.org Fri May 23 20:54:43 2025 From: duke at openjdk.org (Zihao Lin) Date: Fri, 23 May 2025 20:54:43 GMT Subject: RFR: 8344116: C2: remove slice parameter from LoadNode::make [v7] In-Reply-To: References: Message-ID: > This patch remove slice parameter from LoadNode::make > > Mention in https://github.com/openjdk/jdk/pull/21834#pullrequestreview-2429164805 > > Hi team, I am new, I'd appreciate any guidance. Thank a lot! Zihao Lin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'openjdk:master' into 8344116 - Merge branch 'openjdk:master' into 8344116 - Fix build - Fix test failed - 8344116: C2: remove slice parameter from LoadNode::make ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24258/files - new: https://git.openjdk.org/jdk/pull/24258/files/3efb1c17..ea83736e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24258&range=05-06 Stats: 393670 lines in 4531 files changed: 146248 ins; 225477 del; 21945 mod Patch: https://git.openjdk.org/jdk/pull/24258.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24258/head:pull/24258 PR: https://git.openjdk.org/jdk/pull/24258 From vlivanov at openjdk.org Fri May 23 22:08:55 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 23 May 2025 22:08:55 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v19] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 11:10:15 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > More touchups I feel uneasy about all the complications introduced by coordination between accessors. It looks like supporting concurrent release operation adds a lot of complexity. Weak -> strong transition is monotonic, so shouldn't need as much care. What do you think about making release operation part of CompileTask recycling (e.g., in `UnloadableMethodHandle` destructor)? By the time it happens, there should not be any other users of the task. (Otherwise, recycling concurrently accesses task is unsafe anyway). ------------- PR Review: https://git.openjdk.org/jdk/pull/24018#pullrequestreview-2865813410 From vlivanov at openjdk.org Fri May 23 22:43:35 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 23 May 2025 22:43:35 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3] In-Reply-To: References: Message-ID: > This PR introduces C2 support for `Reference.reachabilityFence()`. > > After [JDK-8199462](https://bugs.openjdk.org/browse/JDK-8199462) went in, it was discovered that C2 may break the invariant the fix relied upon [1]. So, this is an attempt to introduce proper support for `Reference.reachabilityFence()` in C2. C1 is left intact for now, because there are no signs yet it is affected. > > `Reference.reachabilityFence()` can be used in performance critical code, so the primary goal for C2 is to reduce its runtime overhead as much as possible. The ultimate goal is to ensure liveness information is attached to interfering safepoints, but it takes multiple steps to properly propagate the information through compilation pipeline without negatively affecting generated code quality. > > Also, I don't consider this fix as complete. It does fix the reported problem, but it doesn't provide any strong guarantees yet. In particular, since `ReachabilityFence` is CFG-only node, nothing explicitly forbids memory operations to float past `Reference.reachabilityFence()` and potentially reaching some other safepoints current analysis treats as non-interfering. Representing `ReachabilityFence` as memory barrier (e.g., `MemBarCPUOrder`) would solve the issue, but performance costs are prohibitively high. Alternatively, the optimization proposed in this PR can be improved to conservatively extend referent's live range beyond `ReachabilityFence` nodes associated with it. It would meet performance criteria, but I prefer to implement it as a followup fix. > > Another known issue relates to reachability fences on constant oops. If such constant is GCed (most likely, due to a bug in Java code), similar reachability issues may arise. For now, RFs on constants are treated as no-ops, but there's a diagnostic flag `PreserveReachabilityFencesOnConstants` to keep the fences. I plan to address it separately. > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/ref/Reference.java#L667 > "HotSpot JVM retains the ref and does not GC it before a call to this method, because the JIT-compilers do not have GC-only safepoints." > > Testing: > - [x] hs-tier1 - hs-tier8 > - [x] hs-tier1 - hs-tier6 w/ -XX:+StressReachabilityFences -XX:+VerifyLoopOptimizations > - [x] java/lang/foreign microbenchmarks Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: renaming ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25315/files - new: https://git.openjdk.org/jdk/pull/25315/files/ad314e05..ca2809af Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25315&range=01-02 Stats: 5 lines in 4 files changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25315/head:pull/25315 PR: https://git.openjdk.org/jdk/pull/25315 From vlivanov at openjdk.org Fri May 23 22:45:56 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 23 May 2025 22:45:56 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3] In-Reply-To: References: <_y8LMhDg7b8EuWLikdsmgK0nUCCh0Y2PH0LX_-TpsD4=.fa9bb975-a850-4223-825e-726dcc5a74f2@github.com> Message-ID: On Fri, 23 May 2025 07:24:38 GMT, Christian Hagedorn wrote: >> IMO it doesn't clarify things much. It's enum constant name and it's not shown in IGV. >> >> It is used in a single place [1] where it's clear what it refers to. >> >> [1] >> >> { // No more loop opts. It is safe to eliminate reachability fence nodes. >> TracePhase tp(_t_idealLoop); >> PhaseIdealLoop::optimize(igvn, LoopOptsEliminateRFs); >> print_method(PHASE_OPTIMIZE_RF, 2); >> if (failing()) return; >> } > > That's a good point, there it really does not matter. Another thought I've just had: When you add it to the `CompilePhase` class in the IR framework and it would be nice to have the same name. There, it would be beneficial to have the full name since people then only see `CompilePhase::OPTIMIZE_RF`. That's fair. Renamed and added to `CompilePhase.java`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2105497776 From vlivanov at openjdk.org Sat May 24 01:49:16 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 24 May 2025 01:49:16 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII [v4] In-Reply-To: References: Message-ID: On Sun, 18 May 2025 07:06:41 GMT, Quan Anh Mai wrote: >> Hi, >> >> The issue here is that the `CastLLNode` is created before the actual check that ensures the range of the input. This patch fixes it by moving the creation to the correct place, which is under `inline_block`. I also noticed that the code there seems incorrect and confusing. `ArrayCopyNode::get_partial_inline_vector_lane_count` takes the length of the array, not the size in bytes. If you look into the method it will multiply `const_len` with `type2aelementbytes(bt)` to get the size in bytes of the array. In the runtime test, we compare `length << log2(type2bytes(bt))` with `ArrayOperationPartialInlineSize`. This seems confusing, why don't we just compare `length` with `ArrayOperationPartialInlineSize / type2bytes(bt)`, it also unifies the test with the actual cast. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: > > - fix comment > - fix comment Overall, looks good. src/hotspot/share/opto/macroArrayCopy.cpp line 209: > 207: int inline_limit = ArrayOperationPartialInlineSize / type2aelembytes(type); > 208: > 209: const TypeLong* length_type = _igvn.type(length)->isa_long(); Any particular benefit in eagerly pruning the block? It duplicates post-expansion GVN checks of the branch condition. (If it were normal parsing with prompt GVN analysis, you could detect the branch is dead right after `generate_guard` call.) Alternatively, the checks are equivalent to checking that join of `length_type` with `[0...inline_limit]` is not empty. But I prefer to let GVN handle it. ------------- PR Review: https://git.openjdk.org/jdk/pull/25284#pullrequestreview-2866012522 PR Review Comment: https://git.openjdk.org/jdk/pull/25284#discussion_r2105660733 From qamai at openjdk.org Sat May 24 08:46:50 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 24 May 2025 08:46:50 GMT Subject: RFR: 8355574: Fatal error in abort_verify_int_in_range due to Invalid CastII [v4] In-Reply-To: References: Message-ID: On Sat, 24 May 2025 01:46:25 GMT, Vladimir Ivanov wrote: >> Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: >> >> - fix comment >> - fix comment > > src/hotspot/share/opto/macroArrayCopy.cpp line 209: > >> 207: int inline_limit = ArrayOperationPartialInlineSize / type2aelembytes(type); >> 208: >> 209: const TypeLong* length_type = _igvn.type(length)->isa_long(); > > Any particular benefit in eagerly pruning the block? It duplicates post-expansion GVN checks of the branch condition. (If it were normal parsing with prompt GVN analysis, you could detect the branch is dead right after `generate_guard` call.) > > Alternatively, the checks are equivalent to checking that join of `length_type` with `[0...inline_limit]` is not empty. But I prefer to let GVN handle it. I think it is a trivial check and it is much more efficient than creating a bunch of nodes and removing them later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25284#discussion_r2105761153 From hgreule at openjdk.org Sun May 25 09:48:53 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Sun, 25 May 2025 09:48:53 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: <_gPxiLsD6OkEvDuqhkRHfnCHNXKr2YavvaAEhXLHo0U=.95cee2ba-d56c-4bba-83f1-4ea6b0389e50@github.com> On Mon, 19 May 2025 16:00:48 GMT, Hannes Greule wrote: >> src/hotspot/share/opto/divnode.cpp line 1229: >> >>> 1227: // Mod by zero? Throw exception at runtime! >>> 1228: if (i2->is_con() && i2->get_con() == 0) { >>> 1229: return TypeInt::ZERO; >> >> Like @merykitty , I am unsure of returning zero in this case. The original code probably returned TypeInt::POS for the same reason you bring up below: >> >>> JVMS `irem` bytecode: "the result of the remainder operation can be negative only if the dividend is negative and can be positive only if the dividend is positive" >> >> Hence, I would argue to keep that oldbehaviorr, since the result of a modulo with zero is not defined to be zero. >> >> I like the idea of returning TOP, but that needs to be tested really well, since all uses of the modulo computation will get removed. I am not familiar enough with the type lattice to reason about the formal correctness of this. > >> The original code probably returned TypeInt::POS for the same reason you bring up below: > > I doubt that, as it doesn't account for the sign of the dividend at all here. We also can't keep the existing behavior (see the section about monotonicity in the PR description). > From my understanding, the node should also be kept alive no matter the value due to its control input. > > I'll test with returning TOP. Using TOP seems to work, but I'm still a bit hesitant to use it: - I'm not sure if it is intended to be used this way - `TypeNode`s get a special handling when they become TOP. That is not a problem as long as the Mod nodes aren't `TypeNode`s, but it looks rather dangerous. I think I'd still prefer to use ZERO unless someone else can guarantee that using TOP will be fine. I could maybe give a more in-depth explanation why ZERO behaves monotonic here as a comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2106144665 From qamai at openjdk.org Sun May 25 11:29:51 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 25 May 2025 11:29:51 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value In-Reply-To: <_gPxiLsD6OkEvDuqhkRHfnCHNXKr2YavvaAEhXLHo0U=.95cee2ba-d56c-4bba-83f1-4ea6b0389e50@github.com> References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> <_gPxiLsD6OkEvDuqhkRHfnCHNXKr2YavvaAEhXLHo0U=.95cee2ba-d56c-4bba-83f1-4ea6b0389e50@github.com> Message-ID: On Sun, 25 May 2025 09:46:27 GMT, Hannes Greule wrote: >>> The original code probably returned TypeInt::POS for the same reason you bring up below: >> >> I doubt that, as it doesn't account for the sign of the dividend at all here. We also can't keep the existing behavior (see the section about monotonicity in the PR description). >> From my understanding, the node should also be kept alive no matter the value due to its control input. >> >> I'll test with returning TOP. > > Using TOP seems to work, but I'm still a bit hesitant to use it: > - I'm not sure if it is intended to be used this way > - `TypeNode`s get a special handling when they become TOP. That is not a problem as long as the Mod nodes aren't `TypeNode`s, but it looks rather dangerous. > > I think I'd still prefer to use ZERO unless someone else can guarantee that using TOP will be fine. I could maybe give a more in-depth explanation why ZERO behaves monotonic here as a comment. Please don't use zero. Using zero is NOT fine. > I'm not sure if it is intended to be used this way Yes it is, `TOP` denotes an empty set, and the result of `ModI/L` with the divisor being a constant zero is an empty set. > `TypeNode`s get a special handling when they become `TOP`. That is not a problem as long as the `Mod` nodes aren't `TypeNode`s, but it looks rather dangerous. `TypeNode` kills all paths below it if it becomes `TOP`, I think it is the expected behaviour when the divisor is a constant zero in that path. > I could maybe give a more in-depth explanation why ZERO behaves monotonic here as a comment. You are just lucky using zero in this particular operation and it seems to be monotonic, but if you apply the same logic to similar operations, like `Div`, `UMod`, you will see that zero can make them non-monotonic. As a result, I can confidently say that using zero here is incorrect. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2106171097 From hgreule at openjdk.org Sun May 25 11:51:51 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Sun, 25 May 2025 11:51:51 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> <_gPxiLsD6OkEvDuqhkRHfnCHNXKr2YavvaAEhXLHo0U=.95cee2ba-d56c-4bba-83f1-4ea6b0389e50@github.com> Message-ID: On Sun, 25 May 2025 11:27:36 GMT, Quan Anh Mai wrote: > `TypeNode` kills all paths below it if it becomes `TOP`, I think it is the expected behaviour when the divisor is a constant zero in that path. I think I just misread the usage of `_maybe_top_type_nodes` in PhaseCCP. It checks whether the type is still TOP after the analysis, so starting with TOP is fine. In that case, I don't see any problem with TOP anymore. Thanks! > You are just lucky using zero in this particular operation and it seems to be monotonic, but if you apply the same logic to similar operations, like `Div`, `UMod`, you will see that zero can make them non-monotonic. As a result, I can confidently say that using zero here is incorrect. Yes, I was referring to only the current state of the changed functions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2106176218 From hgreule at openjdk.org Sun May 25 13:17:49 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Sun, 25 May 2025 13:17:49 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v2] In-Reply-To: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: > This change improves the precision of the `Mod(I|L)Node::Value()` functions. > > I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early. > The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions. > > ### Monotonicity > > Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range). > > ### Testing > > I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something). > > Please review and let me know what you think. > > ### Other > > The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508. > > During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into: > - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement? > - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd. Hannes Greule has updated the pull request incrementally with three additional commits since the last revision: - Update ModL comment - Use TOP instead of ZERO - Apply suggested test changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25254/files - new: https://git.openjdk.org/jdk/pull/25254/files/20a19bf5..20fe91d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25254&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25254&range=00-01 Stats: 33 lines in 3 files changed: 6 ins; 0 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/25254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25254/head:pull/25254 PR: https://git.openjdk.org/jdk/pull/25254 From cslucas at openjdk.org Sun May 25 23:23:36 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Sun, 25 May 2025 23:23:36 GMT Subject: RFR: 8357600: Patch nmethod flushing message to include more details [v2] In-Reply-To: References: Message-ID: <5A2bSd3m_9tKlhT38_oGxLN-fywDsODjSAD2ThA6418=.4d58b1a1-75e6-4a99-8d51-b65dc381b94a@github.com> > Please review this patch for adding more details to nmethod flushing message. These details are particularly important when investigating interaction of JVMCI compiled code and code cache flushing heuristics. > > Tested on Linux x64 with JTREG tier1-3 using fastdebug and release builds. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Move before lock & check logging ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25402/files - new: https://git.openjdk.org/jdk/pull/25402/files/ef10a91a..f6c64755 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25402&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25402&range=00-01 Stats: 22 lines in 1 file changed: 7 ins; 5 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/25402.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25402/head:pull/25402 PR: https://git.openjdk.org/jdk/pull/25402 From cslucas at openjdk.org Sun May 25 23:31:52 2025 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Sun, 25 May 2025 23:31:52 GMT Subject: RFR: 8357600: Patch nmethod flushing message to include more details [v2] In-Reply-To: <4BJrQqFckuixWGmZqmFzd-rTJ-mJrlo17pNl_mIWn-M=.d5f308f3-3aba-44ec-9ea7-d030d97a997e@github.com> References: <4BJrQqFckuixWGmZqmFzd-rTJ-mJrlo17pNl_mIWn-M=.d5f308f3-3aba-44ec-9ea7-d030d97a997e@github.com> Message-ID: <2giliIL1ePdnWxr4yPtj59XBb5rWzTKJbjVBmcH4LSQ=.38409a6d-2f45-464d-9b91-f15f1a7aa6fe@github.com> On Fri, 23 May 2025 09:29:24 GMT, Aleksey Shipilev wrote: >> Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: >> >> Move before lock & check logging > > A bit concerned about performance impact of this logging, especially since we are under `CodeCache_lock`. So I would suggest two improvements: > > 1. Maybe move logging before acquiring `CodeCache_lock`? Not sure if it is safe for various `CodeCache::*` getters. > > 2. Predicate the argument preparation/logging with: > > ``` > LogTarget(Debug, codecache) lt; > if (lt.is_enabled()) { > ... @shipilev - I made changes following your suggestion. I moved the printing to before acquiring the lock and I also moved it to under a if checking if logging is enabled. I don't think these get/logging operation in particular need to be under a lock. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25402#issuecomment-2908154023 From fyang at openjdk.org Mon May 26 00:46:52 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 26 May 2025 00:46:52 GMT Subject: RFR: 8357460: RISC-V: Optimize array fill stub for small size [v3] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 15:38:36 GMT, Feilong Jiang wrote: >> Please consider. >> As discussed in https://github.com/openjdk/jdk/pull/23890#discussion_r2094920943, we can also further optimize the array fill stub by unrolling the storage of values when the size is less than 8. >> >> This PR also removes the **aligned tail part** with the consideration of code size and testing coverage. As the test reveals there are no significant regressions. >> >> >> Before: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 7 avgt 12 27.215 ? 0.073 ns/op >> ArrayFill.fillByteArray 15 avgt 12 32.687 ? 0.904 ns/op >> ArrayFill.fillIntArray 7 avgt 12 28.629 ? 0.006 ns/op >> ArrayFill.fillIntArray 15 avgt 12 29.351 ? 0.009 ns/op >> ArrayFill.fillShortArray 7 avgt 12 30.776 ? 0.006 ns/op >> ArrayFill.fillShortArray 15 avgt 12 31.724 ? 0.447 ns/op >> ArrayFill.zeroByteArray 7 avgt 12 27.199 ? 0.006 ns/op >> ArrayFill.zeroByteArray 15 avgt 12 32.685 ? 0.900 ns/op >> ArrayFill.zeroIntArray 7 avgt 12 28.630 ? 0.007 ns/op >> ArrayFill.zeroIntArray 15 avgt 12 29.352 ? 0.011 ns/op >> ArrayFill.zeroShortArray 7 avgt 12 30.776 ? 0.006 ns/op >> ArrayFill.zeroShortArray 15 avgt 12 31.497 ? 0.012 ns/op >> >> After: >> Benchmark (size) Mode Cnt Score Error Units >> ArrayFill.fillByteArray 7 avgt 12 20.137 ? 0.042 ns/op >> ArrayFill.fillByteArray 15 avgt 12 32.928 ? 0.004 ns/op >> ArrayFill.fillIntArray 7 avgt 12 28.630 ? 0.004 ns/op >> ArrayFill.fillIntArray 15 avgt 12 29.344 ? 0.005 ns/op >> ArrayFill.fillShortArray 7 avgt 12 31.494 ? 0.004 ns/op >> ArrayFill.fillShortArray 15 avgt 12 31.492 ? 0.008 ns/op >> ArrayFill.zeroByteArray 7 avgt 12 19.980 ? 0.164 ns/op >> ArrayFill.zeroByteArray 15 avgt 12 32.927 ? 0.004 ns/op >> ArrayFill.zeroIntArray 7 avgt 12 28.629 ? 0.005 ns/op >> ArrayFill.zeroIntArray 15 avgt 12 29.346 ? 0.006 ns/op >> ArrayFill.zeroShortArray 7 avgt 12 32.193 ? 0.027 ns/op >> ArrayFill.zeroShortArray 15 avgt 12 31.495 ? 0.010 ns/op >> >> >> Testing: >> - [x] tier1 > > Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'openjdk:master' into riscv-optimize-generate-fill > - Merge branch 'openjdk:master' into riscv-optimize-generate-fill > - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-optimize-generate-fill > - optimize array fill stub for small size Looks good. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25350#pullrequestreview-2867010574 From dzhang at openjdk.org Mon May 26 02:57:48 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Mon, 26 May 2025 02:57:48 GMT Subject: RFR: 8357695: RISC-V: Move vector intrinsic condition checks into match_rule_supported_vector Message-ID: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> Hi all, Please take a look and review this PR, thanks! Currently, the match_rule_supported function in riscv.ad contains checks for vector-related intrinsics (e.g., FmaVF, FmaVD, RoundVF, RoundVD). These checks can be centralized into the match_rule_supported_vector function in the riscv_v.ad file, ensuring consistent handling in their appropriate context. ### Testing * [ ] Linux riscv64 server release build on SG2042 ------------- Commit messages: - 8357695: RISC-V: Move vector intrinsic condition checks into match_rule_supported_vector Changes: https://git.openjdk.org/jdk/pull/25438/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25438&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357695 Stats: 31 lines in 2 files changed: 16 ins; 15 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25438/head:pull/25438 PR: https://git.openjdk.org/jdk/pull/25438 From amitkumar at openjdk.org Mon May 26 04:11:46 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 26 May 2025 04:11:46 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v5] In-Reply-To: References: Message-ID: > Unsafe::setMemory intrinsic implementation for s390x. > > Stub Code: > > > StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) > -------------------------------------------------------------------------------- > 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 > 0x000003ffb04b63c4: nill %r1,7 > 0x000003ffb04b63c8: je 0x000003ffb04b6410 > 0x000003ffb04b63cc: nill %r1,3 > 0x000003ffb04b63d0: je 0x000003ffb04b6460 > 0x000003ffb04b63d4: nill %r1,1 > 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 > 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 > 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 > 0x000003ffb04b63e8: je 0x000003ffb04b6402 > 0x000003ffb04b63ec: nopr > 0x000003ffb04b63ee: nopr > 0x000003ffb04b63f0: sth %r4,0(%r2) > 0x000003ffb04b63f4: sth %r4,2(%r2) > 0x000003ffb04b63f8: agfi %r2,4 > 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 > 0x000003ffb04b6402: nilf %r3,2 > 0x000003ffb04b6408: ber %r14 > 0x000003ffb04b640a: sth %r4,0(%r2) > 0x000003ffb04b640e: br %r14 > 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 > 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 > 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 > 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 > 0x000003ffb04b6428: je 0x000003ffb04b6446 > 0x000003ffb04b642c: nopr > 0x000003ffb04b642e: nopr > 0x000003ffb04b6430: stg %r4,0(%r2) > 0x000003ffb04b6436: stg %r4,8(%r2) > 0x000003ffb04b643c: agfi %r2,16 > 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 > 0x000003ffb04b6446: nilf %r3,8 > 0x000003ffb04b644c: ber %r14 > 0x000003ffb04b644e: stg %r4,0(%r2) > 0x000003ffb04b6454: br %r14 > 0x000003ffb04b6456: nopr > 0x000003ffb04b6458: nopr > 0x000003ffb04b645a: nopr > 0x000003ffb04b645c: nopr > 0x000003ffb04b645e: nopr > 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 > 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 > 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 > 0x000003ffb04b6472: je 0x000003ffb04b6492 > 0x000003ffb04b6476: nopr > 0x000003ffb04b6478: nopr > 0x000003ffb04b647a: nopr > 0x000003ffb04b647c: nopr > 0x000003ffb04b647e: nopr > 0x000003ffb04b6480: st %r4,0(%r2) > 0x000003ffb04b6484: st %r4,4(%r2) > 0x000003ffb04b6488: agfi %r2,8 > 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 > 0x000003ffb04b6492: nilf %r3,4 > 0x000003ffb04b6498: ber %r14 > 0x000003ffb04b649a: st %r4,0(%r2) > 0x000003ffb04b649e: br %r14 > 0x000003ffb04b64a0: risbgz %r1,%r3,32,63,63 > 0x000003ffb04b64a6: je 0x000003ffb04b64c2 > 0x000003... Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: switch to vector stores ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24480/files - new: https://git.openjdk.org/jdk/pull/24480/files/f75209f5..d79a841f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24480&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24480&range=03-04 Stats: 144 lines in 1 file changed: 42 ins; 74 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/24480.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24480/head:pull/24480 PR: https://git.openjdk.org/jdk/pull/24480 From amitkumar at openjdk.org Mon May 26 04:11:47 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 26 May 2025 04:11:47 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: On Wed, 23 Apr 2025 06:09:25 GMT, Amit Kumar wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > improved mvc implementation Tier-1 test are clean with fastdebug-vm; These are the performance number on my z16 zVM: Benchmark (aligned) (size) Mode Cnt Score Error Units MemorySegmentZeroUnsafe.panama true 1 avgt 30 2.889 ? 0.020 ns/op MemorySegmentZeroUnsafe.panama true 2 avgt 30 3.115 ? 0.014 ns/op MemorySegmentZeroUnsafe.panama true 3 avgt 30 3.271 ? 0.003 ns/op MemorySegmentZeroUnsafe.panama true 4 avgt 30 3.382 ? 0.006 ns/op MemorySegmentZeroUnsafe.panama true 5 avgt 30 3.295 ? 0.062 ns/op MemorySegmentZeroUnsafe.panama true 6 avgt 30 3.428 ? 0.008 ns/op MemorySegmentZeroUnsafe.panama true 7 avgt 30 3.482 ? 0.049 ns/op MemorySegmentZeroUnsafe.panama true 8 avgt 30 3.188 ? 0.013 ns/op MemorySegmentZeroUnsafe.panama true 15 avgt 30 4.612 ? 0.005 ns/op MemorySegmentZeroUnsafe.panama true 16 avgt 30 3.795 ? 0.004 ns/op MemorySegmentZeroUnsafe.panama true 63 avgt 30 5.376 ? 0.037 ns/op MemorySegmentZeroUnsafe.panama true 64 avgt 30 4.846 ? 0.033 ns/op MemorySegmentZeroUnsafe.panama true 255 avgt 30 7.723 ? 0.263 ns/op MemorySegmentZeroUnsafe.panama true 256 avgt 30 7.299 ? 0.017 ns/op MemorySegmentZeroUnsafe.panama false 1 avgt 30 2.883 ? 0.017 ns/op MemorySegmentZeroUnsafe.panama false 2 avgt 30 3.110 ? 0.003 ns/op MemorySegmentZeroUnsafe.panama false 3 avgt 30 3.271 ? 0.003 ns/op MemorySegmentZeroUnsafe.panama false 4 avgt 30 3.385 ? 0.009 ns/op MemorySegmentZeroUnsafe.panama false 5 avgt 30 3.268 ? 0.024 ns/op MemorySegmentZeroUnsafe.panama false 6 avgt 30 3.431 ? 0.010 ns/op MemorySegmentZeroUnsafe.panama false 7 avgt 30 3.459 ? 0.003 ns/op MemorySegmentZeroUnsafe.panama false 8 avgt 30 3.186 ? 0.005 ns/op MemorySegmentZeroUnsafe.panama false 15 avgt 30 4.614 ? 0.015 ns/op MemorySegmentZeroUnsafe.panama false 16 avgt 30 3.799 ? 0.006 ns/op MemorySegmentZeroUnsafe.panama false 63 avgt 30 5.282 ? 0.020 ns/op MemorySegmentZeroUnsafe.panama false 64 avgt 30 4.891 ? 0.012 ns/op MemorySegmentZeroUnsafe.panama false 255 avgt 30 8.038 ? 0.007 ns/op MemorySegmentZeroUnsafe.panama false 256 avgt 30 7.890 ? 0.108 ns/op MemorySegmentZeroUnsafe.unsafe true 1 avgt 30 3.785 ? 0.062 ns/op MemorySegmentZeroUnsafe.unsafe true 2 avgt 30 3.772 ? 0.075 ns/op MemorySegmentZeroUnsafe.unsafe true 3 avgt 30 3.433 ? 0.052 ns/op MemorySegmentZeroUnsafe.unsafe true 4 avgt 30 3.727 ? 0.172 ns/op MemorySegmentZeroUnsafe.unsafe true 5 avgt 30 3.414 ? 0.062 ns/op MemorySegmentZeroUnsafe.unsafe true 6 avgt 30 3.313 ? 0.117 ns/op MemorySegmentZeroUnsafe.unsafe true 7 avgt 30 3.198 ? 0.015 ns/op MemorySegmentZeroUnsafe.unsafe true 8 avgt 30 2.843 ? 0.158 ns/op MemorySegmentZeroUnsafe.unsafe true 15 avgt 30 3.278 ? 0.004 ns/op MemorySegmentZeroUnsafe.unsafe true 16 avgt 30 2.925 ? 0.113 ns/op MemorySegmentZeroUnsafe.unsafe true 63 avgt 30 3.800 ? 0.006 ns/op MemorySegmentZeroUnsafe.unsafe true 64 avgt 30 3.400 ? 0.050 ns/op MemorySegmentZeroUnsafe.unsafe true 255 avgt 30 7.032 ? 0.120 ns/op MemorySegmentZeroUnsafe.unsafe true 256 avgt 30 6.423 ? 0.013 ns/op MemorySegmentZeroUnsafe.unsafe false 1 avgt 30 3.645 ? 0.148 ns/op MemorySegmentZeroUnsafe.unsafe false 2 avgt 30 3.638 ? 0.152 ns/op MemorySegmentZeroUnsafe.unsafe false 3 avgt 30 3.377 ? 0.068 ns/op MemorySegmentZeroUnsafe.unsafe false 4 avgt 30 3.692 ? 0.119 ns/op MemorySegmentZeroUnsafe.unsafe false 5 avgt 30 3.436 ? 0.027 ns/op MemorySegmentZeroUnsafe.unsafe false 6 avgt 30 3.427 ? 0.038 ns/op MemorySegmentZeroUnsafe.unsafe false 7 avgt 30 3.192 ? 0.014 ns/op MemorySegmentZeroUnsafe.unsafe false 8 avgt 30 3.035 ? 0.046 ns/op MemorySegmentZeroUnsafe.unsafe false 15 avgt 30 3.294 ? 0.049 ns/op MemorySegmentZeroUnsafe.unsafe false 16 avgt 30 3.042 ? 0.061 ns/op MemorySegmentZeroUnsafe.unsafe false 63 avgt 30 3.579 ? 0.006 ns/op MemorySegmentZeroUnsafe.unsafe false 64 avgt 30 3.449 ? 0.035 ns/op MemorySegmentZeroUnsafe.unsafe false 255 avgt 30 8.633 ? 0.317 ns/op MemorySegmentZeroUnsafe.unsafe false 256 avgt 30 7.003 ? 0.085 ns/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2908459844 From thartmann at openjdk.org Mon May 26 05:02:53 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 26 May 2025 05:02:53 GMT Subject: RFR: 8356989: Unexpected null in C2 compiled code In-Reply-To: References: Message-ID: <5ZAx-1gPLH7gqRxIahHxmCIgcs6veIyna530gicmsR0=.427a2072-f2a8-49d2-9a90-17e126e90e63@github.com> On Thu, 22 May 2025 09:22:08 GMT, Roland Westrelin wrote: > In the test case, a non escaping array is initialized by an > `arraycopy` that uses this array as source and destination. Following > the `arraycopy`, one of the element of the array is tested for > `null`. That null check is constant folded to always `null` by escape > analysis. As I understand, the `Allocate` for the array should be > marked by EA as destination of an array copy. That state should then > be propagated by EA to uses and all destinations of an array copy > should be marked as unknown value. But EA has logic that explicitly > skips the case where an `ArrayCopy` has same source and > destination. Removing that logic fixes the failure. Testing all passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25389#issuecomment-2908531231 From thartmann at openjdk.org Mon May 26 05:35:54 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 26 May 2025 05:35:54 GMT Subject: RFR: 8357530: C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability In-Reply-To: <-K3da45mMblFzdAPag8RqzEKg7F2gM_nnN2HlGK36uk=.4d9855ad-b6ed-4a72-a712-242e5f1f93f8@github.com> References: <-K3da45mMblFzdAPag8RqzEKg7F2gM_nnN2HlGK36uk=.4d9855ad-b6ed-4a72-a712-242e5f1f93f8@github.com> Message-ID: <6OsU9gpjhuaflrB-93m3FjlTIdAkVQeVThrnWCAgc-M=.6071f477-6907-44bd-8300-6f55d37c575f@github.com> On Thu, 22 May 2025 08:54:42 GMT, Emanuel Peter wrote: > I'm adding a diagnostic flag `AutoVectorizationOverrideProfitability`. The goal is that with it, we can systematically benchmark our Auto Vectorization profitability heuristics. In all cases, we run Auto Vectorization, including packing. > - `0`: abort vectorization, as if it was not profitable. > - `1`: default, use profitability heuristics to determine if we should vectorize. > - `2`: always vectorize when possible, even if profitability heuristic would say that it is not profitable. > > In the future, we may change our heuristics. We may for example introduce a cost model [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093). But at any rate, we need this flag, so that we can override these profitability heuristics, even if just for benchmarking. > > I did not yet go through all of `SuperWord` to check if there may be other decisions that could go under this flag. If we find any later, we can still add them. > > Below, I'm showing how it helps to benchmark the some reduction cases we have been working on. > > And if you want a small test to experiement with, I have one at the end for you. > > **Note to reviewer:** This patch should not make any behavioral difference, i.e. with the default `AutoVectorizationOverrideProfitability=1` the behavior should be as before this patch. > > -------------------------------------- > > **Use-Case: investigate Reduction Heuristics** > > A while back, I have written a comprehensive benchmark for Reductions https://github.com/openjdk/jdk/pull/21032. I saw that some cases might possibly be profitable, but we have disabled vectorization because of a heuristic. > > This heuristic was added a long time ago. The observation at the time was that simple add and mul reductions were not profitable. > - https://bugs.openjdk.org/browse/JDK-8078563 > - https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2015-April/017740.html > From the comments, it becomes clear that "simple reductions" are not profitable, that's why we check if there are more work vectors than reduction vectors. But I'm not sure why 2-element reductions are deemed always not profitable. Maybe it fit the benchmarks at the time, but now with moving reductions out of the loop, this probably does not make sense any more, at least for int/long. > > But in the meantime, I have added an improvement, where we move int/long reductions out of the loop. We can do that because int/long reductions can be reordered. See https://github.com/openjdk/jdk/pull/13056 . We cannot do that with float/double reductions,... Looks good to me otherwise. src/hotspot/share/opto/c2_globals.hpp line 381: > 379: "Override the auto vectorization profitability heuristics." \ > 380: "0 = Run auto vectorizer, but abort just before applying" \ > 381: " vectrorization, as though it was not profitable." \ Suggestion: " vectorization, as though it was not profitable." \ src/hotspot/share/opto/c2_globals.hpp line 383: > 381: " vectrorization, as though it was not profitable." \ > 382: "1 = Run auto vectorizer with the default profitability" \ > 383: " heuristics. This is is the default, and hopefully" \ Suggestion: " heuristics. This is the default, and hopefully" \ src/hotspot/share/opto/superword.cpp line 1608: > 1606: if (is_marked_reduction(p0)) { > 1607: const Type *arith_type = p0->bottom_type(); > 1608: // This heuristic predicts that 2-element reductions for INT/LONG, predicting Needs rephrasing Suggestion: // This heuristic predicts 2-element reductions for INT/LONG, predicting src/hotspot/share/opto/superword.cpp line 1613: > 1611: // hence it is not directly clear that they are profitable. If we only have > 1612: // two elements per vector, then the performance gains from non-reduction > 1613: // vectors is at most going from 2 scalar instructions to 1 vector instruction. Suggestion: // vectors are at most going from 2 scalar instructions to 1 vector instruction. ------------- Changes requested by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25387#pullrequestreview-2867250557 PR Review Comment: https://git.openjdk.org/jdk/pull/25387#discussion_r2106533479 PR Review Comment: https://git.openjdk.org/jdk/pull/25387#discussion_r2106534591 PR Review Comment: https://git.openjdk.org/jdk/pull/25387#discussion_r2106542788 PR Review Comment: https://git.openjdk.org/jdk/pull/25387#discussion_r2106543287 From thartmann at openjdk.org Mon May 26 05:35:55 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 26 May 2025 05:35:55 GMT Subject: RFR: 8357530: C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability In-Reply-To: <6OsU9gpjhuaflrB-93m3FjlTIdAkVQeVThrnWCAgc-M=.6071f477-6907-44bd-8300-6f55d37c575f@github.com> References: <-K3da45mMblFzdAPag8RqzEKg7F2gM_nnN2HlGK36uk=.4d9855ad-b6ed-4a72-a712-242e5f1f93f8@github.com> <6OsU9gpjhuaflrB-93m3FjlTIdAkVQeVThrnWCAgc-M=.6071f477-6907-44bd-8300-6f55d37c575f@github.com> Message-ID: <2NKVQWxNQZS-rNdBPlm3tRCZvDkfCAlVsLvQmguH_nA=.2271fc60-76b7-4d51-be49-926fea9a9a45@github.com> On Mon, 26 May 2025 05:23:00 GMT, Tobias Hartmann wrote: >> I'm adding a diagnostic flag `AutoVectorizationOverrideProfitability`. The goal is that with it, we can systematically benchmark our Auto Vectorization profitability heuristics. In all cases, we run Auto Vectorization, including packing. >> - `0`: abort vectorization, as if it was not profitable. >> - `1`: default, use profitability heuristics to determine if we should vectorize. >> - `2`: always vectorize when possible, even if profitability heuristic would say that it is not profitable. >> >> In the future, we may change our heuristics. We may for example introduce a cost model [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093). But at any rate, we need this flag, so that we can override these profitability heuristics, even if just for benchmarking. >> >> I did not yet go through all of `SuperWord` to check if there may be other decisions that could go under this flag. If we find any later, we can still add them. >> >> Below, I'm showing how it helps to benchmark the some reduction cases we have been working on. >> >> And if you want a small test to experiement with, I have one at the end for you. >> >> **Note to reviewer:** This patch should not make any behavioral difference, i.e. with the default `AutoVectorizationOverrideProfitability=1` the behavior should be as before this patch. >> >> -------------------------------------- >> >> **Use-Case: investigate Reduction Heuristics** >> >> A while back, I have written a comprehensive benchmark for Reductions https://github.com/openjdk/jdk/pull/21032. I saw that some cases might possibly be profitable, but we have disabled vectorization because of a heuristic. >> >> This heuristic was added a long time ago. The observation at the time was that simple add and mul reductions were not profitable. >> - https://bugs.openjdk.org/browse/JDK-8078563 >> - https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2015-April/017740.html >> From the comments, it becomes clear that "simple reductions" are not profitable, that's why we check if there are more work vectors than reduction vectors. But I'm not sure why 2-element reductions are deemed always not profitable. Maybe it fit the benchmarks at the time, but now with moving reductions out of the loop, this probably does not make sense any more, at least for int/long. >> >> But in the meantime, I have added an improvement, where we move int/long reductions out of the loop. We can do that because int/long reductions can be reordered. See https://github.com/openjdk/jdk/pull/1... > > src/hotspot/share/opto/c2_globals.hpp line 381: > >> 379: "Override the auto vectorization profitability heuristics." \ >> 380: "0 = Run auto vectorizer, but abort just before applying" \ >> 381: " vectrorization, as though it was not profitable." \ > > Suggestion: > > " vectorization, as though it was not profitable." \ And shouldn't this be ".... as if it was not profitable"? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25387#discussion_r2106534257 From chagedorn at openjdk.org Mon May 26 05:37:52 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 26 May 2025 05:37:52 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3] In-Reply-To: References: <_y8LMhDg7b8EuWLikdsmgK0nUCCh0Y2PH0LX_-TpsD4=.fa9bb975-a850-4223-825e-726dcc5a74f2@github.com> Message-ID: On Fri, 23 May 2025 22:43:06 GMT, Vladimir Ivanov wrote: >> That's a good point, there it really does not matter. Another thought I've just had: When you add it to the `CompilePhase` class in the IR framework and it would be nice to have the same name. There, it would be beneficial to have the full name since people then only see `CompilePhase::OPTIMIZE_RF`. > > That's fair. Renamed and added to `CompilePhase.java`. Thanks for the update! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2106545614 From thartmann at openjdk.org Mon May 26 05:39:52 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 26 May 2025 05:39:52 GMT Subject: RFR: 8357568: IGV: Show NULL and numbers up to 4 characters in "Condense graph" filter [v3] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 09:07:17 GMT, Christian Hagedorn wrote: >> When using the "Condense graph" filter in IGV, it would be useful to show `NULL` and numbers wider than 2 characters instead of `P` and `I/L` (fallback for larger numbers), respectively. There is a comment in `idealGrapPrinter.cpp` which says that maximally 2 chars are allowed for numbers: >> https://github.com/openjdk/jdk/blob/428d33ef3ca0af34d8f164fe9d9b722e81e866a7/src/hotspot/share/opto/idealGraphPrinter.cpp#L646 >> >> But we already allow larger entries today: >> ![image](https://github.com/user-attachments/assets/e90d0518-148f-4a33-a9e8-0bdca14aa017) >> >> I there propose to use `NULL` and allow up to 4 characters for numbers which could be a good trade-off between shortness and expressiveness. This allows us to quickly see null checks and larger constants. >> >> Without patch: >> ![image](https://github.com/user-attachments/assets/e450f1d2-503c-4b84-8137-25892f8ab7f9) >> >> >> With patch: >> ![image](https://github.com/user-attachments/assets/81371a53-be7e-4acd-afbf-e5613e96815a) >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > better way to check nof chars, also print narrow oop null Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25393#pullrequestreview-2867271063 From chagedorn at openjdk.org Mon May 26 05:39:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 26 May 2025 05:39:53 GMT Subject: RFR: 8357568: IGV: Show NULL and numbers up to 4 characters in "Condense graph" filter [v2] In-Reply-To: References: <-wSA31a4omoLNUNuEBvmDcx3uv6-C5rNhCxNaO38pEE=.79e314cd-3400-4421-bc5a-2fc944bd117f@github.com> Message-ID: On Thu, 22 May 2025 14:39:53 GMT, Manuel H?ssig wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Increase number of parameters as suggested by Manuel > > That is a nice convenience. Thank you for the improvement. > > It looks good to me :slightly_smiling_face: Thanks @mhaessig, @marc-chevalier and @TobiHartmann for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25393#issuecomment-2908589329 From epeter at openjdk.org Mon May 26 05:54:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 May 2025 05:54:59 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v23] In-Reply-To: References: <6aZaHfVvUJFLz83fyZ42bnoSGseaRBYd0jEg_VLdS2Q=.4c681def-ee7c-4fcd-b147-348d317ac58f@github.com> <1e-92EcDWshsTiFbEmJt8z5SAVfhf5vpr8sgbEq3BbQ=.25d6d5f7-48d3-4a13-ac7d-8844844490fa@github.com> Message-ID: On Fri, 23 May 2025 15:33:48 GMT, Srinivas Vamsi Parasa wrote: >> Hi Jatin (@jatin-bhateja), >> >> Incorporated the changes suggested for cpu_family and is_P6_or_later() and other minor changes. Please let me know if everything looks good. >> >> Thanks, >> Vamsi > >> @vamsi-parasa Testing launched, ping me again in 24h :) > > Thanks Emanuel (@eme64)! Please let me know if there're are any issues with the tests. @vamsi-parasa Testing looked good, though now you pushed some more changes. I'd like to run tests one more time before integration. Please let me know when you are ready :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2908612509 From epeter at openjdk.org Mon May 26 05:59:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 May 2025 05:59:27 GMT Subject: RFR: 8357530: C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability [v2] In-Reply-To: <-K3da45mMblFzdAPag8RqzEKg7F2gM_nnN2HlGK36uk=.4d9855ad-b6ed-4a72-a712-242e5f1f93f8@github.com> References: <-K3da45mMblFzdAPag8RqzEKg7F2gM_nnN2HlGK36uk=.4d9855ad-b6ed-4a72-a712-242e5f1f93f8@github.com> Message-ID: > I'm adding a diagnostic flag `AutoVectorizationOverrideProfitability`. The goal is that with it, we can systematically benchmark our Auto Vectorization profitability heuristics. In all cases, we run Auto Vectorization, including packing. > - `0`: abort vectorization, as if it was not profitable. > - `1`: default, use profitability heuristics to determine if we should vectorize. > - `2`: always vectorize when possible, even if profitability heuristic would say that it is not profitable. > > In the future, we may change our heuristics. We may for example introduce a cost model [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093). But at any rate, we need this flag, so that we can override these profitability heuristics, even if just for benchmarking. > > I did not yet go through all of `SuperWord` to check if there may be other decisions that could go under this flag. If we find any later, we can still add them. > > Below, I'm showing how it helps to benchmark the some reduction cases we have been working on. > > And if you want a small test to experiement with, I have one at the end for you. > > **Note to reviewer:** This patch should not make any behavioral difference, i.e. with the default `AutoVectorizationOverrideProfitability=1` the behavior should be as before this patch. > > -------------------------------------- > > **Use-Case: investigate Reduction Heuristics** > > A while back, I have written a comprehensive benchmark for Reductions https://github.com/openjdk/jdk/pull/21032. I saw that some cases might possibly be profitable, but we have disabled vectorization because of a heuristic. > > This heuristic was added a long time ago. The observation at the time was that simple add and mul reductions were not profitable. > - https://bugs.openjdk.org/browse/JDK-8078563 > - https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2015-April/017740.html > From the comments, it becomes clear that "simple reductions" are not profitable, that's why we check if there are more work vectors than reduction vectors. But I'm not sure why 2-element reductions are deemed always not profitable. Maybe it fit the benchmarks at the time, but now with moving reductions out of the loop, this probably does not make sense any more, at least for int/long. > > But in the meantime, I have added an improvement, where we move int/long reductions out of the loop. We can do that because int/long reductions can be reordered. See https://github.com/openjdk/jdk/pull/13056 . We cannot do that with float/double reductions,... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25387/files - new: https://git.openjdk.org/jdk/pull/25387/files/54c626e0..3f2c2698 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25387&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25387&range=00-01 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25387.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25387/head:pull/25387 PR: https://git.openjdk.org/jdk/pull/25387 From epeter at openjdk.org Mon May 26 06:02:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 May 2025 06:02:45 GMT Subject: RFR: 8357530: C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability [v3] In-Reply-To: <-K3da45mMblFzdAPag8RqzEKg7F2gM_nnN2HlGK36uk=.4d9855ad-b6ed-4a72-a712-242e5f1f93f8@github.com> References: <-K3da45mMblFzdAPag8RqzEKg7F2gM_nnN2HlGK36uk=.4d9855ad-b6ed-4a72-a712-242e5f1f93f8@github.com> Message-ID: > I'm adding a diagnostic flag `AutoVectorizationOverrideProfitability`. The goal is that with it, we can systematically benchmark our Auto Vectorization profitability heuristics. In all cases, we run Auto Vectorization, including packing. > - `0`: abort vectorization, as if it was not profitable. > - `1`: default, use profitability heuristics to determine if we should vectorize. > - `2`: always vectorize when possible, even if profitability heuristic would say that it is not profitable. > > In the future, we may change our heuristics. We may for example introduce a cost model [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093). But at any rate, we need this flag, so that we can override these profitability heuristics, even if just for benchmarking. > > I did not yet go through all of `SuperWord` to check if there may be other decisions that could go under this flag. If we find any later, we can still add them. > > Below, I'm showing how it helps to benchmark the some reduction cases we have been working on. > > And if you want a small test to experiement with, I have one at the end for you. > > **Note to reviewer:** This patch should not make any behavioral difference, i.e. with the default `AutoVectorizationOverrideProfitability=1` the behavior should be as before this patch. > > -------------------------------------- > > **Use-Case: investigate Reduction Heuristics** > > A while back, I have written a comprehensive benchmark for Reductions https://github.com/openjdk/jdk/pull/21032. I saw that some cases might possibly be profitable, but we have disabled vectorization because of a heuristic. > > This heuristic was added a long time ago. The observation at the time was that simple add and mul reductions were not profitable. > - https://bugs.openjdk.org/browse/JDK-8078563 > - https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2015-April/017740.html > From the comments, it becomes clear that "simple reductions" are not profitable, that's why we check if there are more work vectors than reduction vectors. But I'm not sure why 2-element reductions are deemed always not profitable. Maybe it fit the benchmarks at the time, but now with moving reductions out of the loop, this probably does not make sense any more, at least for int/long. > > But in the meantime, I have added an improvement, where we move int/long reductions out of the loop. We can do that because int/long reductions can be reordered. See https://github.com/openjdk/jdk/pull/13056 . We cannot do that with float/double reductions,... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix wording ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25387/files - new: https://git.openjdk.org/jdk/pull/25387/files/3f2c2698..8da34fc8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25387&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25387&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25387.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25387/head:pull/25387 PR: https://git.openjdk.org/jdk/pull/25387 From epeter at openjdk.org Mon May 26 06:04:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 May 2025 06:04:55 GMT Subject: RFR: 8357530: C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability [v3] In-Reply-To: References: <-K3da45mMblFzdAPag8RqzEKg7F2gM_nnN2HlGK36uk=.4d9855ad-b6ed-4a72-a712-242e5f1f93f8@github.com> Message-ID: On Fri, 23 May 2025 15:10:52 GMT, Vladimir Kozlov wrote: > This looks fine. One suggestion I have for separate RFE is to use UL for such outputs. @vnkozlov Thanks for the approval! About your suggestion: I suppose that means I would be refactoring all TraceSuperWord/TraceAutoVectorization to use UL. Is there now a good way to do the `CompileCommand` method-level filtering with UL? Because `TraceAutoVectorization` uses method-based filtering. @TobiHartmann I applied all your suggestions :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25387#issuecomment-2908630305 From dskantz at openjdk.org Mon May 26 06:08:55 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Mon, 26 May 2025 06:08:55 GMT Subject: RFR: 8357105: C2: compilation fails with "assert(false) failed: empty program detected during loop optimization" [v2] In-Reply-To: References: Message-ID: > This pull request contains a fix for JDK-8357105. > > The problem is performing stacked string concatenation optimization between a pair of StringBuilder.append().toString()-links SB1 and SB2, where the parameter of an append call in SB2 has a complex dependency on the result of SB1, which in turn is replaced by top() during stringopts -- similar to JDK-8271341, which had a diamond if-structure using the result of SB1, while in this case the use is an unstable If. In the attached regression test, a live part of the graph gets optimized away during later phases and ultimately the whole graph vanishes. > > The proposed solution is to simply exclude this specific case. This bug has existed for a long time and stacked concats is a niche optimization. > > Testing: > Tier1-4. > > Extra testing: > Ran Tier1-4 with an instrumented build and observed that we do not disable stacked concatenation in any previously known case after the fix. Daniel Skantz has updated the pull request incrementally with two additional commits since the last revision: - Apply suggestions from code review Co-authored-by: Roberto Casta?eda Lozano - Update src/hotspot/share/opto/stringopts.cpp Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25395/files - new: https://git.openjdk.org/jdk/pull/25395/files/1ed6dde9..65c331fb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25395&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25395&range=00-01 Stats: 7 lines in 1 file changed: 2 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/25395.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25395/head:pull/25395 PR: https://git.openjdk.org/jdk/pull/25395 From epeter at openjdk.org Mon May 26 06:18:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 May 2025 06:18:52 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v5] In-Reply-To: <-LdLobbf_wMuaEd7e4ietBnJHhDBJFUWk7Hw2EdnmuY=.c7fa1951-8874-4f56-afe7-75341e748899@github.com> References: <-LdLobbf_wMuaEd7e4ietBnJHhDBJFUWk7Hw2EdnmuY=.c7fa1951-8874-4f56-afe7-75341e748899@github.com> Message-ID: On Thu, 22 May 2025 11:58:18 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend (both Neon and SVE) for FP16 vector operations - add, mul, sub, div, min, max, sqrt and fma. >> >> Testing: >> JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) pass on aarch64 which also includes the JTREG test to test the FP16 vector operations - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Reduce @Warmup from 10000 to 50 Thanks for the work and patience :) Changes look reasonable, though I did not review them in detail. Tests have passed! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25096#pullrequestreview-2867351573 From dskantz at openjdk.org Mon May 26 06:33:09 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Mon, 26 May 2025 06:33:09 GMT Subject: RFR: 8357105: C2: compilation fails with "assert(false) failed: empty program detected during loop optimization" [v3] In-Reply-To: References: Message-ID: > This pull request contains a fix for JDK-8357105. > > The problem is performing stacked string concatenation optimization between a pair of StringBuilder.append().toString()-links SB1 and SB2, where the parameter of an append call in SB2 has a complex dependency on the result of SB1, which in turn is replaced by top() during stringopts -- similar to JDK-8271341, which had a diamond if-structure using the result of SB1, while in this case the use is an unstable If. In the attached regression test, a live part of the graph gets optimized away during later phases and ultimately the whole graph vanishes. > > The proposed solution is to simply exclude this specific case. This bug has existed for a long time and stacked concats is a niche optimization. > > Testing: > Tier1-4. > > Extra testing: > Ran Tier1-4 with an instrumented build and observed that we do not disable stacked concatenation in any previously known case after the fix. Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: add second run; - iteration count; add use of result ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25395/files - new: https://git.openjdk.org/jdk/pull/25395/files/65c331fb..a4996f6c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25395&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25395&range=01-02 Stats: 9 lines in 1 file changed: 6 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25395.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25395/head:pull/25395 PR: https://git.openjdk.org/jdk/pull/25395 From epeter at openjdk.org Mon May 26 06:35:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 May 2025 06:35:55 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll [v2] In-Reply-To: <6FOXAuT4a0YIf6D0s6bUqpD2vsV-KUySZ8gVo7KK9PU=.e7e8b30f-8202-44ac-a8f6-b04e252f400d@github.com> References: <6FOXAuT4a0YIf6D0s6bUqpD2vsV-KUySZ8gVo7KK9PU=.e7e8b30f-8202-44ac-a8f6-b04e252f400d@github.com> Message-ID: On Fri, 23 May 2025 08:05:29 GMT, Marc Chevalier wrote: >> src/hotspot/share/opto/loopTransform.cpp line 1911: >> >>> 1909: } else if (loop_head->has_exact_trip_count() && init->is_Con()) { >>> 1910: // We should not be here if we have old_trip_count == max_juint >>> 1911: // it would make trip_count == 2^31 which causes overflow and the situation is overall weird >> >> Can you say something a little more specific than "weird"? As a reader, it's not immediately clear what that may imply. > > I've tried to explain the concerns better but tbh, it's also part of the weirdness: it's not clear how inconsistent the state would be and what would and should happen. Thanks for the additional context :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25295#discussion_r2106648745 From mhaessig at openjdk.org Mon May 26 06:36:56 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 26 May 2025 06:36:56 GMT Subject: RFR: 8357649: IGV: add block index to the supplemental node properties In-Reply-To: References: Message-ID: On Fri, 23 May 2025 13:42:12 GMT, Manuel H?ssig wrote: > This PR adds the block index to IGV node properties as soon as the CFG has been scheduled. This is really handy when working on peepholes, where one has to work with block indices. > > ![Screenshot from 2025-05-23 15-35-29](https://github.com/user-attachments/assets/1a895e07-cf5f-4eed-afd0-08fe26cf0266) > > Testing: > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions?query=branch%3AJDK-8357649-block-index) > - [x] tier1, tier2, and some Oracle internal testing on Oracle supported platfors and OSs > > Shout out to @robcasloz for coming up with an initial version of this patch. Thank you both for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25414#issuecomment-2908709413 From duke at openjdk.org Mon May 26 06:36:57 2025 From: duke at openjdk.org (duke) Date: Mon, 26 May 2025 06:36:57 GMT Subject: RFR: 8357649: IGV: add block index to the supplemental node properties In-Reply-To: References: Message-ID: On Fri, 23 May 2025 13:42:12 GMT, Manuel H?ssig wrote: > This PR adds the block index to IGV node properties as soon as the CFG has been scheduled. This is really handy when working on peepholes, where one has to work with block indices. > > ![Screenshot from 2025-05-23 15-35-29](https://github.com/user-attachments/assets/1a895e07-cf5f-4eed-afd0-08fe26cf0266) > > Testing: > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions?query=branch%3AJDK-8357649-block-index) > - [x] tier1, tier2, and some Oracle internal testing on Oracle supported platfors and OSs > > Shout out to @robcasloz for coming up with an initial version of this patch. @mhaessig Your change (at version bb149db5962bb9959adc139797992c8b3ab2bb7e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25414#issuecomment-2908711039 From chagedorn at openjdk.org Mon May 26 06:52:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 26 May 2025 06:52:53 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll [v3] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 08:09:36 GMT, Marc Chevalier wrote: >> This assert seems a bit too tight. See the JBS issue to check the math: the bound of `trip_count` should be `<= 2^31`, while the current bound is ` < (julong)max_juint/2` = floor((2^32-1)/2) = (2^32-2) / 2 = 2^31-1. > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Comments src/hotspot/share/opto/loopTransform.cpp line 1922: > 1920: // Let's check we are in a surprise-free situation, that should be the only one reachable > 1921: // here. => old_trip_count was set, is reliable, and is small enough to be sure that `stride_con` > 1922: // will also be small enough, and no overflow risk. Can't we just say that the old trip count is only set as exact in `compute_trip_count()` if it is less than `max_juint` and otherwise, it's inexact and we don't enter this path? You could also mention that adding `stride_m` is then overflow safe. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25295#discussion_r2106667315 From epeter at openjdk.org Mon May 26 06:58:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 May 2025 06:58:53 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API In-Reply-To: References: Message-ID: On Fri, 9 May 2025 07:35:41 GMT, Xiaohong Gong wrote: > JDK-8318650 introduced hotspot intrinsification of subword gather load APIs for X86 platforms [1]. However, the current implementation is not optimal for AArch64 SVE platform, which natively supports vector instructions for subword gather load operations using an int vector for indices (see [2][3]). > > Two key areas require improvement: > 1. At the Java level, vector indices generated for range validation could be reused for the subsequent gather load operation on architectures with native vector instructions like AArch64 SVE. However, the current implementation prevents compiler reuse of these index vectors due to divergent control flow, potentially impacting performance. > 2. At the compiler IR level, the additional `offset` input for `LoadVectorGather`/`LoadVectorGatherMasked` with subword types increases IR complexity and complicates backend implementation. Furthermore, generating `add` instructions before each memory access negatively impacts performance. > > This patch refactors the implementation at both the Java level and compiler mid-end to improve efficiency and maintainability across different architectures. > > Main changes: > 1. Java-side API refactoring: > - Explicitly passes generated index vectors to hotspot, eliminating duplicate index vectors for gather load instructions on > architectures like AArch64. > 2. C2 compiler IR refactoring: > - Refactors `LoadVectorGather`/`LoadVectorGatherMasked` IR for subword types by removing the memory offset input and incorporating it into the memory base `addr` at the IR level. This simplifies backend implementation, reduces add operations, and unifies the IR across all types. > 3. Backend changes: > - Streamlines X86 implementation of subword gather operations following the removal of the offset input from the IR level. > > Performance: > The performance of the relative JMH improves up to 27% on a X86 AVX512 system. Please see the data below: > > Benchmark Mode Cnt Unit SIZE Before After Gain > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 64 53682.012 52650.325 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 256 14484.252 14255.156 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 1024 3664.900 3595.615 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 4096 908.312 935.269 1.02 > GatherOperationsBenchmark.micr... It seems reasonable. @jatin-bhateja should definitively look over this, or someone else from Intel who knows this code well :) I'll launch some testing now :) src/hotspot/share/opto/vectorIntrinsics.cpp line 1203: > 1201: // Class> vectorIndexClass, int indexLength, > 1202: // Object base, long offset, > 1203: // W indexVector1, W index_vector2, W index_vector3, W index_vector4, Are we doing a mix of `CamelCase` and `with_underscore` on purpose? src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java line 495: > 493: Class> vectorIndexClass, > 494: int indexLength, Object base, long offset, > 495: W indexVector1, W indexVector2, W indexVector3, W indexVector4, Here you went for all `camelCase`, just an observation :) ------------- PR Review: https://git.openjdk.org/jdk/pull/25138#pullrequestreview-2867447489 PR Review Comment: https://git.openjdk.org/jdk/pull/25138#discussion_r2106674102 PR Review Comment: https://git.openjdk.org/jdk/pull/25138#discussion_r2106673987 From mchevalier at openjdk.org Mon May 26 07:00:36 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 26 May 2025 07:00:36 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll [v4] In-Reply-To: References: Message-ID: > This assert seems a bit too tight. See the JBS issue to check the math: the bound of `trip_count` should be `<= 2^31`, while the current bound is ` < (julong)max_juint/2` = floor((2^32-1)/2) = (2^32-2) / 2 = 2^31-1. Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: +how the assert is true ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25295/files - new: https://git.openjdk.org/jdk/pull/25295/files/4da34fca..a5552ffd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25295&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25295&range=02-03 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25295/head:pull/25295 PR: https://git.openjdk.org/jdk/pull/25295 From mchevalier at openjdk.org Mon May 26 07:00:36 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 26 May 2025 07:00:36 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll [v3] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 06:46:32 GMT, Christian Hagedorn wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Comments > > src/hotspot/share/opto/loopTransform.cpp line 1922: > >> 1920: // Let's check we are in a surprise-free situation, that should be the only one reachable >> 1921: // here. => old_trip_count was set, is reliable, and is small enough to be sure that `stride_con` >> 1922: // will also be small enough, and no overflow risk. > > Can't we just say that the old trip count is only set as exact in `compute_trip_count()` if it is less than `max_juint` and otherwise, it's inexact and we don't enter this path? You could also mention that adding `stride_m` is then overflow safe. We can say that, of course, but that tells what it is (for now), but doesn't really explain (or mention) that's it's actually an important property, and not just something that happens by chance. If I'm to change `compute_trip_count()` (or add another call to `set_exact_trip_count` and I read about that, I need to pay attention to still make the assert pass because I need this invariant. I should not remove it thinking "well, not true anymore, but it's fine". So I'd say saying that it's set in `compute_trip_count()` is the why this assert should pass, but not why it's important it passes. I'll add what you suggest, of course. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25295#discussion_r2106678090 From mchevalier at openjdk.org Mon May 26 07:00:36 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Mon, 26 May 2025 07:00:36 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll [v3] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 06:53:46 GMT, Marc Chevalier wrote: >> src/hotspot/share/opto/loopTransform.cpp line 1922: >> >>> 1920: // Let's check we are in a surprise-free situation, that should be the only one reachable >>> 1921: // here. => old_trip_count was set, is reliable, and is small enough to be sure that `stride_con` >>> 1922: // will also be small enough, and no overflow risk. >> >> Can't we just say that the old trip count is only set as exact in `compute_trip_count()` if it is less than `max_juint` and otherwise, it's inexact and we don't enter this path? You could also mention that adding `stride_m` is then overflow safe. > > We can say that, of course, but that tells what it is (for now), but doesn't really explain (or mention) that's it's actually an important property, and not just something that happens by chance. If I'm to change `compute_trip_count()` (or add another call to `set_exact_trip_count` and I read about that, I need to pay attention to still make the assert pass because I need this invariant. I should not remove it thinking "well, not true anymore, but it's fine". So I'd say saying that it's set in `compute_trip_count()` is the why this assert should pass, but not why it's important it passes. > > I'll add what you suggest, of course. I've tried a phrasing. Feel free to take a look! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25295#discussion_r2106682843 From roland at openjdk.org Mon May 26 07:20:10 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 26 May 2025 07:20:10 GMT Subject: RFR: 8354383: C2: enable sinking of Type nodes out of loop In-Reply-To: <2T83tcCFOvCp4BElLS5ufAb7RR2ZkEUFXPOWEOAhVYg=.ed4262e0-5f36-4f95-9c4c-b4f750b6e555@github.com> References: <2T83tcCFOvCp4BElLS5ufAb7RR2ZkEUFXPOWEOAhVYg=.ed4262e0-5f36-4f95-9c4c-b4f750b6e555@github.com> Message-ID: On Fri, 23 May 2025 06:55:33 GMT, Tobias Hartmann wrote: > That looks good to me but given that we had quite a few bugs in that area in the past, I would suggest to only integrate into JDK 26 after the fork on June 05, 2025. Sounds reasonable to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25396#issuecomment-2908792492 From syan at openjdk.org Mon May 26 07:20:13 2025 From: syan at openjdk.org (SendaoYan) Date: Mon, 26 May 2025 07:20:13 GMT Subject: RFR: 8357105: C2: compilation fails with "assert(false) failed: empty program detected during loop optimization" [v3] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 06:33:09 GMT, Daniel Skantz wrote: >> This pull request contains a fix for JDK-8357105. >> >> The problem is performing stacked string concatenation optimization between a pair of StringBuilder.append().toString()-links SB1 and SB2, where the parameter of an append call in SB2 has a complex dependency on the result of SB1, which in turn is replaced by top() during stringopts -- similar to JDK-8271341, which had a diamond if-structure using the result of SB1, while in this case the use is an unstable If. In the attached regression test, a live part of the graph gets optimized away during later phases and ultimately the whole graph vanishes. >> >> The proposed solution is to simply exclude this specific case. This bug has existed for a long time and stacked concats is a niche optimization. >> >> Testing: >> Tier1-4. >> >> Extra testing: >> Ran Tier1-4 with an instrumented build and observed that we do not disable stacked concatenation in any previously known case after the fix. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > add second run; - iteration count; add use of result LGTM ------------- Marked as reviewed by syan (Committer). PR Review: https://git.openjdk.org/jdk/pull/25395#pullrequestreview-2867466092 From xgong at openjdk.org Mon May 26 07:20:30 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 26 May 2025 07:20:30 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API In-Reply-To: References: Message-ID: On Mon, 26 May 2025 06:51:12 GMT, Emanuel Peter wrote: >> JDK-8318650 introduced hotspot intrinsification of subword gather load APIs for X86 platforms [1]. However, the current implementation is not optimal for AArch64 SVE platform, which natively supports vector instructions for subword gather load operations using an int vector for indices (see [2][3]). >> >> Two key areas require improvement: >> 1. At the Java level, vector indices generated for range validation could be reused for the subsequent gather load operation on architectures with native vector instructions like AArch64 SVE. However, the current implementation prevents compiler reuse of these index vectors due to divergent control flow, potentially impacting performance. >> 2. At the compiler IR level, the additional `offset` input for `LoadVectorGather`/`LoadVectorGatherMasked` with subword types increases IR complexity and complicates backend implementation. Furthermore, generating `add` instructions before each memory access negatively impacts performance. >> >> This patch refactors the implementation at both the Java level and compiler mid-end to improve efficiency and maintainability across different architectures. >> >> Main changes: >> 1. Java-side API refactoring: >> - Explicitly passes generated index vectors to hotspot, eliminating duplicate index vectors for gather load instructions on >> architectures like AArch64. >> 2. C2 compiler IR refactoring: >> - Refactors `LoadVectorGather`/`LoadVectorGatherMasked` IR for subword types by removing the memory offset input and incorporating it into the memory base `addr` at the IR level. This simplifies backend implementation, reduces add operations, and unifies the IR across all types. >> 3. Backend changes: >> - Streamlines X86 implementation of subword gather operations following the removal of the offset input from the IR level. >> >> Performance: >> The performance of the relative JMH improves up to 27% on a X86 AVX512 system. Please see the data below: >> >> Benchmark Mode Cnt Unit SIZE Before After Gain >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 64 53682.012 52650.325 0.98 >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 256 14484.252 14255.156 0.98 >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 1024 3664.900 3595.615 0.98 >> GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms 4096 908.31... > > src/hotspot/share/opto/vectorIntrinsics.cpp line 1203: > >> 1201: // Class> vectorIndexClass, int indexLength, >> 1202: // Object base, long offset, >> 1203: // W indexVector1, W index_vector2, W index_vector3, W index_vector4, > > Are we doing a mix of `CamelCase` and `with_underscore` on purpose? This would be a naming style issue. Thanks for pointing out. I will fix it once I update this PR. It should be aligned with definition in `VectorSupport.java`. > src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java line 495: > >> 493: Class> vectorIndexClass, >> 494: int indexLength, Object base, long offset, >> 495: W indexVector1, W indexVector2, W indexVector3, W indexVector4, > > Here you went for all `camelCase`, just an observation :) Yeah, I choose to use `camelCase` style here to align with other parameter naming styles in this method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25138#discussion_r2106693179 PR Review Comment: https://git.openjdk.org/jdk/pull/25138#discussion_r2106692107 From thartmann at openjdk.org Mon May 26 07:20:29 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 26 May 2025 07:20:29 GMT Subject: RFR: 8357530: C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability [v3] In-Reply-To: <2NKVQWxNQZS-rNdBPlm3tRCZvDkfCAlVsLvQmguH_nA=.2271fc60-76b7-4d51-be49-926fea9a9a45@github.com> References: <-K3da45mMblFzdAPag8RqzEKg7F2gM_nnN2HlGK36uk=.4d9855ad-b6ed-4a72-a712-242e5f1f93f8@github.com> <6OsU9gpjhuaflrB-93m3FjlTIdAkVQeVThrnWCAgc-M=.6071f477-6907-44bd-8300-6f55d37c575f@github.com> <2NKVQWxNQZS-rNdBPlm3tRCZvDkfCAlVsLvQmguH_nA=.2271fc60-76b7-4d51-be49-926fea9a9a45@github.com> Message-ID: On Mon, 26 May 2025 05:23:58 GMT, Tobias Hartmann wrote: >> src/hotspot/share/opto/c2_globals.hpp line 381: >> >>> 379: "Override the auto vectorization profitability heuristics." \ >>> 380: "0 = Run auto vectorizer, but abort just before applying" \ >>> 381: " vectrorization, as though it was not profitable." \ >> >> Suggestion: >> >> " vectorization, as though it was not profitable." \ > > And shouldn't this be ".... as if it was not profitable"? I think you missed my second comment here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25387#discussion_r2106687368 From epeter at openjdk.org Mon May 26 07:20:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 May 2025 07:20:33 GMT Subject: RFR: 8357530: C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability [v3] In-Reply-To: References: <-K3da45mMblFzdAPag8RqzEKg7F2gM_nnN2HlGK36uk=.4d9855ad-b6ed-4a72-a712-242e5f1f93f8@github.com> <6OsU9gpjhuaflrB-93m3FjlTIdAkVQeVThrnWCAgc-M=.6071f477-6907-44bd-8300-6f55d37c575f@github.com> <2NKVQWxNQZS-rNdBPlm3tRCZvDkfCAlVsLvQmguH_nA=.2271fc60-76b7-4d51-be49-926fea9a9a45@github.com> Message-ID: On Mon, 26 May 2025 06:59:30 GMT, Tobias Hartmann wrote: >> And shouldn't this be ".... as if it was not profitable"? > > I think you missed my second comment here. @TobiHartmann I did miss it! This is what I get when I google "as if vs as though": > "As if" and "as though" are generally interchangeable and have the same meaning; they are both used to compare a real situation to an imaginary or hypothetical one. "As if" is slightly more common than "as though," but both are used to express a comparison, often where something appears to be the case but may not be. Would you still like me to change it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25387#discussion_r2106692631 From roland at openjdk.org Mon May 26 07:21:08 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 26 May 2025 07:21:08 GMT Subject: RFR: 8356989: Unexpected null in C2 compiled code [v2] In-Reply-To: References: Message-ID: > In the test case, a non escaping array is initialized by an > `arraycopy` that uses this array as source and destination. Following > the `arraycopy`, one of the element of the array is tested for > `null`. That null check is constant folded to always `null` by escape > analysis. As I understand, the `Allocate` for the array should be > marked by EA as destination of an array copy. That state should then > be propagated by EA to uses and all destinations of an array copy > should be marked as unknown value. But EA has logic that explicitly > skips the case where an `ArrayCopy` has same source and > destination. Removing that logic fixes the failure. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - Update test/hotspot/jtreg/compiler/escapeAnalysis/TestArrayCopySameSrcDstInitializesNonEscapingArray.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/escapeAnalysis/TestArrayCopySameSrcDstInitializesNonEscapingArray.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25389/files - new: https://git.openjdk.org/jdk/pull/25389/files/d9ed1f1e..d8b31737 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25389&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25389&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25389/head:pull/25389 PR: https://git.openjdk.org/jdk/pull/25389 From roland at openjdk.org Mon May 26 07:21:19 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 26 May 2025 07:21:19 GMT Subject: RFR: 8356989: Unexpected null in C2 compiled code In-Reply-To: <14ISbyfFWRwTEptGP6f9zzLdcCW9-JnkBmqYE3Fh37s=.9c2e9da1-d036-455d-a5da-84cd64cf4e9d@github.com> References: <14ISbyfFWRwTEptGP6f9zzLdcCW9-JnkBmqYE3Fh37s=.9c2e9da1-d036-455d-a5da-84cd64cf4e9d@github.com> Message-ID: On Fri, 23 May 2025 04:49:18 GMT, Vladimir Kozlov wrote: >> In the test case, a non escaping array is initialized by an >> `arraycopy` that uses this array as source and destination. Following >> the `arraycopy`, one of the element of the array is tested for >> `null`. That null check is constant folded to always `null` by escape >> analysis. As I understand, the `Allocate` for the array should be >> marked by EA as destination of an array copy. That state should then >> be propagated by EA to uses and all destinations of an array copy >> should be marked as unknown value. But EA has logic that explicitly >> skips the case where an `ArrayCopy` has same source and >> destination. Removing that logic fixes the failure. > > On other hand we may not propagate global-escape state which may affect locking optimizations. > Okay your fix is good. @vnkozlov thanks for the comments and review @TobiHartmann @chhagedorn thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/25389#issuecomment-2908784418 From roland at openjdk.org Mon May 26 07:24:43 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 26 May 2025 07:24:43 GMT Subject: RFR: 8354383: C2: enable sinking of Type nodes out of loop [v2] In-Reply-To: References: Message-ID: > `PhaseIdealLoop::try_sink_out_of_loop()` excludes `Type` nodes because > we ran into some issues where a `Type` node is sunk and then becomes > `top` but the control path of its uses doesn't become unreachable. > > 8349479 should have fixed that so that exception no longer makes > sense. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25396/files - new: https://git.openjdk.org/jdk/pull/25396/files/8a3f8cd3..bebfec3f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25396&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25396&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25396.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25396/head:pull/25396 PR: https://git.openjdk.org/jdk/pull/25396 From chagedorn at openjdk.org Mon May 26 07:28:37 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 26 May 2025 07:28:37 GMT Subject: Integrated: 8357568: IGV: Show NULL and numbers up to 4 characters in "Condense graph" filter In-Reply-To: References: Message-ID: On Thu, 22 May 2025 13:46:39 GMT, Christian Hagedorn wrote: > When using the "Condense graph" filter in IGV, it would be useful to show `NULL` and numbers wider than 2 characters instead of `P` and `I/L` (fallback for larger numbers), respectively. There is a comment in `idealGrapPrinter.cpp` which says that maximally 2 chars are allowed for numbers: > https://github.com/openjdk/jdk/blob/428d33ef3ca0af34d8f164fe9d9b722e81e866a7/src/hotspot/share/opto/idealGraphPrinter.cpp#L646 > > But we already allow larger entries today: > ![image](https://github.com/user-attachments/assets/e90d0518-148f-4a33-a9e8-0bdca14aa017) > > I there propose to use `NULL` and allow up to 4 characters for numbers which could be a good trade-off between shortness and expressiveness. This allows us to quickly see null checks and larger constants. > > Without patch: > ![image](https://github.com/user-attachments/assets/e450f1d2-503c-4b84-8137-25892f8ab7f9) > > > With patch: > ![image](https://github.com/user-attachments/assets/81371a53-be7e-4acd-afbf-e5613e96815a) > > Thanks, > Christian This pull request has now been integrated. Changeset: 99f33b4d Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/99f33b4d9b91c71ec032dc47ed0b98e4419ac432 Stats: 19 lines in 1 file changed: 6 ins; 4 del; 9 mod 8357568: IGV: Show NULL and numbers up to 4 characters in "Condense graph" filter Reviewed-by: thartmann, mchevalier, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/25393 From roland at openjdk.org Mon May 26 07:29:32 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 26 May 2025 07:29:32 GMT Subject: RFR: 8354383: C2: enable sinking of Type nodes out of loop [v2] In-Reply-To: <6x8dbtEZAnT_977LpZL_0SLggYCz9Q8IgQYbYWkoQuI=.f2c683a2-7cf6-46c3-ae03-f1ee117efd75@github.com> References: <6x8dbtEZAnT_977LpZL_0SLggYCz9Q8IgQYbYWkoQuI=.f2c683a2-7cf6-46c3-ae03-f1ee117efd75@github.com> Message-ID: On Fri, 23 May 2025 07:12:53 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/loopopts.cpp line 1688: > >> 1686: !n->is_OpaqueInitializedAssertionPredicate() && >> 1687: !n->is_OpaqueTemplateAssertionPredicate() && >> 1688: !n->is_Type()) { > > I cannot remember exactly, how often was it a problem without JDK-8349479? If it was more common, we might want to only allow it when `KillPathsReachableByDeadTypeNode` is set. I made that change. As far as I remember, the logic removed by JDK-8319372 played a key role in those failures. Not sure if any were still reproducible after than one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25396#discussion_r2106726011 From jkarthikeyan at openjdk.org Mon May 26 07:29:51 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 26 May 2025 07:29:51 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short Message-ID: Hi all, This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! ------------- Commit messages: - Fix vector truncation with subword types Changes: https://git.openjdk.org/jdk/pull/25440/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25440&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8350177 Stats: 284 lines in 2 files changed: 282 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25440.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25440/head:pull/25440 PR: https://git.openjdk.org/jdk/pull/25440 From roland at openjdk.org Mon May 26 07:31:10 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 26 May 2025 07:31:10 GMT Subject: RFR: 8356989: Unexpected null in C2 compiled code In-Reply-To: <14ISbyfFWRwTEptGP6f9zzLdcCW9-JnkBmqYE3Fh37s=.9c2e9da1-d036-455d-a5da-84cd64cf4e9d@github.com> References: <14ISbyfFWRwTEptGP6f9zzLdcCW9-JnkBmqYE3Fh37s=.9c2e9da1-d036-455d-a5da-84cd64cf4e9d@github.com> Message-ID: On Fri, 23 May 2025 04:49:18 GMT, Vladimir Kozlov wrote: >> In the test case, a non escaping array is initialized by an >> `arraycopy` that uses this array as source and destination. Following >> the `arraycopy`, one of the element of the array is tested for >> `null`. That null check is constant folded to always `null` by escape >> analysis. As I understand, the `Allocate` for the array should be >> marked by EA as destination of an array copy. That state should then >> be propagated by EA to uses and all destinations of an array copy >> should be marked as unknown value. But EA has logic that explicitly >> skips the case where an `ArrayCopy` has same source and >> destination. Removing that logic fixes the failure. > > On other hand we may not propagate global-escape state which may affect locking optimizations. > Okay your fix is good. @vnkozlov @TobiHartmann @chhagedorn I need someone to re-approve the PR because I added Christian's suggestions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25389#issuecomment-2908826334 From roland at openjdk.org Mon May 26 07:45:31 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 26 May 2025 07:45:31 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v29] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: - Update test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningIntLoopWithLongChecksPredicates.java Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/predicates.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21630/files - new: https://git.openjdk.org/jdk/pull/21630/files/d409deb4..42f1ecfd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=27-28 Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From roland at openjdk.org Mon May 26 07:50:52 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 26 May 2025 07:50:52 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v30] In-Reply-To: References: Message-ID: <1InsaK0ytsg0dMv6oFdUSUBink7Vs0qSvukV5LA6I2Q=.82f6ac15-8c05-45eb-b058-a9b6d2ec9df4@github.com> > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningLongCountedLoop.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21630/files - new: https://git.openjdk.org/jdk/pull/21630/files/42f1ecfd..42cd7091 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=28-29 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From roland at openjdk.org Mon May 26 07:53:33 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 26 May 2025 07:53:33 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v31] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request incrementally with three additional commits since the last revision: - Update test/micro/org/openjdk/bench/java/lang/foreign/HeapMismatchManualLoopTest.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningLongCountedLoopScaleOverflow.java Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/longcountedloops/TestShortRunningLongCountedLoopPredicatesClone.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21630/files - new: https://git.openjdk.org/jdk/pull/21630/files/42cd7091..43e8a14f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=29-30 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From fyang at openjdk.org Mon May 26 08:15:42 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 26 May 2025 08:15:42 GMT Subject: RFR: 8357695: RISC-V: Move vector intrinsic condition checks into match_rule_supported_vector In-Reply-To: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> References: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> Message-ID: On Mon, 26 May 2025 02:52:01 GMT, Dingli Zhang wrote: > Hi all, > Please take a look and review this PR, thanks! > > Currently, the match_rule_supported function in riscv.ad contains checks for vector-related intrinsics (e.g., FmaVF, FmaVD, RoundVF, RoundVD). These checks can be centralized into the match_rule_supported_vector function in the riscv_v.ad file, ensuring consistent handling in their appropriate context. > > ### Testing > * [x] Linux riscv64 server release build on SG2042 Looks fine to me. Thanks for the cleanup. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25438#pullrequestreview-2867628006 From chagedorn at openjdk.org Mon May 26 08:15:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 26 May 2025 08:15:53 GMT Subject: RFR: 8356989: Unexpected null in C2 compiled code [v2] In-Reply-To: References: Message-ID: <9SJIYI4UPlMZqm_RbkS8rlbF5LOJAz_6FaE74HeHEkE=.56af1d00-743f-4540-ab4e-d083d231b89b@github.com> On Mon, 26 May 2025 07:21:08 GMT, Roland Westrelin wrote: >> In the test case, a non escaping array is initialized by an >> `arraycopy` that uses this array as source and destination. Following >> the `arraycopy`, one of the element of the array is tested for >> `null`. That null check is constant folded to always `null` by escape >> analysis. As I understand, the `Allocate` for the array should be >> marked by EA as destination of an array copy. That state should then >> be propagated by EA to uses and all destinations of an array copy >> should be marked as unknown value. But EA has logic that explicitly >> skips the case where an `ArrayCopy` has same source and >> destination. Removing that logic fixes the failure. > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - Update test/hotspot/jtreg/compiler/escapeAnalysis/TestArrayCopySameSrcDstInitializesNonEscapingArray.java > > Co-authored-by: Christian Hagedorn > - Update test/hotspot/jtreg/compiler/escapeAnalysis/TestArrayCopySameSrcDstInitializesNonEscapingArray.java > > Co-authored-by: Christian Hagedorn Looks good, thanks for the update. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25389#pullrequestreview-2867631153 From qamai at openjdk.org Mon May 26 08:15:56 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 26 May 2025 08:15:56 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short In-Reply-To: References: Message-ID: On Mon, 26 May 2025 07:15:31 GMT, Jasmine Karthikeyan wrote: > Hi all, > This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. > > The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. > > I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! src/hotspot/share/opto/superword.cpp line 2496: > 2494: int opc = in->Opcode(); > 2495: return opc == Op_AddI || opc == Op_SubI || opc == Op_MulI || opc == Op_AndI || opc == Op_OrI || opc == Op_XorI > 2496: || opc == Op_ReverseBytesS || opc == Op_ReverseBytesUS; Are you sure? I don't think you can truncate a `ReverseByteS` to a `byte`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2106775202 From lucy at openjdk.org Mon May 26 08:16:09 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 26 May 2025 08:16:09 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v5] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 04:11:46 GMT, Amit Kumar wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > switch to vector stores Marked as reviewed by lucy (Reviewer). src/hotspot/cpu/s390/stubGenerator_s390.cpp line 1499: > 1497: > 1498: __ z_vlvgb(Z_V0, byteVal, 0); > 1499: __ z_vrepb(Z_V0, Z_V0, 0); You could also use `z_vzero(Vreg)` to preload the vector register with all zeroes. Saves an instruction. ------------- PR Review: https://git.openjdk.org/jdk/pull/24480#pullrequestreview-2867547778 PR Review Comment: https://git.openjdk.org/jdk/pull/24480#discussion_r2106740786 From lucy at openjdk.org Mon May 26 08:16:16 2025 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 26 May 2025 08:16:16 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v4] In-Reply-To: References: Message-ID: <9m_jULDk7nnTrUD6JpyD3rgbIPkzyEvip9PbnkFOeRg=.b3e0ce87-0b1b-4d2b-93a8-56422c94e8c0@github.com> On Fri, 23 May 2025 08:18:12 GMT, Andrew Haley wrote: >> Ah, thanks! I was not aware of that. That means the current implementation is probably wrong in some cases (mvc generated by gcc). Or is mvc only used in the single Byte aligned case? > >> Ah, thanks! I was not aware of that. That means the current implementation is probably wrong in some cases (mvc generated by gcc). Or is mvc only used in the single Byte aligned case? > > Yes, that's right, just for the byte-aligned case. The atomicity spec cited by @theRealAph severely limits the optimisation options. Depending on the data alignment, you have to use 8, 4, or 2-byte stores. Only for the unaligned case there are no hard restrictions, just the soft "let's be nice" conventions. With that said, the vector implementation should be ok. It is just not as nice as a byte store loop. There could be as many as 15 uninitialised bytes if just the last byte of a vector store is not writable. I would take that risk. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2908902764 From aboldtch at openjdk.org Mon May 26 08:16:17 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 26 May 2025 08:16:17 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v19] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 11:10:15 GMT, Aleksey Shipilev wrote: >> [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. >> >> The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. >> >> This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. >> >> It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > More touchups Just a drive by comment. Not sure what our opinion is w.r.t. `mutable`, but how do we feel about typing the spin lock as `mutable` and keep `is_safe()` and `method*()` const. We can then keep the old signature for `CompileTask::is_unloaded()` `CompileTask::method()` and `ArenaStatCounter::ArenaStatCounter(...)`. ------------- PR Review: https://git.openjdk.org/jdk/pull/24018#pullrequestreview-2867592646 From mhaessig at openjdk.org Mon May 26 08:18:11 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Mon, 26 May 2025 08:18:11 GMT Subject: Integrated: 8357649: IGV: add block index to the supplemental node properties In-Reply-To: References: Message-ID: On Fri, 23 May 2025 13:42:12 GMT, Manuel H?ssig wrote: > This PR adds the block index to IGV node properties as soon as the CFG has been scheduled. This is really handy when working on peepholes, where one has to work with block indices. > > ![Screenshot from 2025-05-23 15-35-29](https://github.com/user-attachments/assets/1a895e07-cf5f-4eed-afd0-08fe26cf0266) > > Testing: > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions?query=branch%3AJDK-8357649-block-index) > - [x] tier1, tier2, and some Oracle internal testing on Oracle supported platfors and OSs > > Shout out to @robcasloz for coming up with an initial version of this patch. This pull request has now been integrated. Changeset: a37e8265 Author: Manuel H?ssig Committer: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/a37e8265b53b35c0b7f3ce9f4df9b2efcde322be Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod 8357649: IGV: add block index to the supplemental node properties Co-authored-by: Roberto Casta?eda Lozano Reviewed-by: rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/25414 From chagedorn at openjdk.org Mon May 26 08:19:17 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 26 May 2025 08:19:17 GMT Subject: RFR: 8354383: C2: enable sinking of Type nodes out of loop [v2] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 07:24:43 GMT, Roland Westrelin wrote: >> `PhaseIdealLoop::try_sink_out_of_loop()` excludes `Type` nodes because >> we ran into some issues where a `Type` node is sunk and then becomes >> `top` but the control path of its uses doesn't become unreachable. >> >> 8349479 should have fixed that so that exception no longer makes >> sense. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by chagedorn (Reviewer). src/hotspot/share/opto/loopopts.cpp line 1692: > 1690: !n->is_OpaqueInitializedAssertionPredicate() && > 1691: !n->is_OpaqueTemplateAssertionPredicate() && > 1692: !is_raw_to_oop_cast &&// don't extend live ranges of raw oops Suggestion: !is_raw_to_oop_cast && // don't extend live ranges of raw oops ------------- PR Review: https://git.openjdk.org/jdk/pull/25396#pullrequestreview-2867639447 PR Review Comment: https://git.openjdk.org/jdk/pull/25396#discussion_r2106802829 From chagedorn at openjdk.org Mon May 26 08:19:23 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 26 May 2025 08:19:23 GMT Subject: RFR: 8354383: C2: enable sinking of Type nodes out of loop [v2] In-Reply-To: References: <6x8dbtEZAnT_977LpZL_0SLggYCz9Q8IgQYbYWkoQuI=.f2c683a2-7cf6-46c3-ae03-f1ee117efd75@github.com> Message-ID: On Mon, 26 May 2025 07:27:14 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopopts.cpp line 1688: >> >>> 1686: !n->is_OpaqueInitializedAssertionPredicate() && >>> 1687: !n->is_OpaqueTemplateAssertionPredicate() && >>> 1688: !n->is_Type()) { >> >> I cannot remember exactly, how often was it a problem without JDK-8349479? If it was more common, we might want to only allow it when `KillPathsReachableByDeadTypeNode` is set. > > I made that change. > As far as I remember, the logic removed by JDK-8319372 played a key role in those failures. Not sure if any were still reproducible after than one. Yes, that matches what I remember. Maybe JDK-8319372 can now be reverted with JDK-8349479 in? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25396#discussion_r2106802008 From roland at openjdk.org Mon May 26 08:20:50 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 26 May 2025 08:20:50 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v32] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21630/files - new: https://git.openjdk.org/jdk/pull/21630/files/43e8a14f..b1da1b13 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=30-31 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From chagedorn at openjdk.org Mon May 26 08:20:57 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 26 May 2025 08:20:57 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v32] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 08:18:05 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks Roland, update looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21630#pullrequestreview-2867647819 From roland at openjdk.org Mon May 26 08:21:23 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 26 May 2025 08:21:23 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v28] In-Reply-To: References: Message-ID: <7r3C8BAViyHKVVJjv4w0YxfIUkfk9PmY0OEt73V_aRI=.baf51fc4-d996-44d0-a1f5-10cf6dc4de8d@github.com> On Fri, 23 May 2025 07:00:05 GMT, Christian Hagedorn wrote: > A few minor last comments but otherwise, it looks good to me! Thanks for all the updates, the patience and the credit! Thanks for the careful review. I applied your suggestions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2908897918 From roland at openjdk.org Mon May 26 08:36:31 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 26 May 2025 08:36:31 GMT Subject: Integrated: 8356989: Unexpected null in C2 compiled code In-Reply-To: References: Message-ID: On Thu, 22 May 2025 09:22:08 GMT, Roland Westrelin wrote: > In the test case, a non escaping array is initialized by an > `arraycopy` that uses this array as source and destination. Following > the `arraycopy`, one of the element of the array is tested for > `null`. That null check is constant folded to always `null` by escape > analysis. As I understand, the `Allocate` for the array should be > marked by EA as destination of an array copy. That state should then > be propagated by EA to uses and all destinations of an array copy > should be marked as unknown value. But EA has logic that explicitly > skips the case where an `ArrayCopy` has same source and > destination. Removing that logic fixes the failure. This pull request has now been integrated. Changeset: ed4cd2ac Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/ed4cd2acd2d8bb92c296c5a860c76cffaff53add Stats: 68 lines in 2 files changed: 60 ins; 4 del; 4 mod 8356989: Unexpected null in C2 compiled code Reviewed-by: chagedorn, kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/25389 From roland at openjdk.org Mon May 26 08:36:37 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 26 May 2025 08:36:37 GMT Subject: RFR: 8354383: C2: enable sinking of Type nodes out of loop [v3] In-Reply-To: References: Message-ID: > `PhaseIdealLoop::try_sink_out_of_loop()` excludes `Type` nodes because > we ran into some issues where a `Type` node is sunk and then becomes > `top` but the control path of its uses doesn't become unreachable. > > 8349479 should have fixed that so that exception no longer makes > sense. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25396/files - new: https://git.openjdk.org/jdk/pull/25396/files/bebfec3f..303bb31a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25396&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25396&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25396.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25396/head:pull/25396 PR: https://git.openjdk.org/jdk/pull/25396 From roland at openjdk.org Mon May 26 08:36:39 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 26 May 2025 08:36:39 GMT Subject: RFR: 8354383: C2: enable sinking of Type nodes out of loop [v2] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 07:24:43 GMT, Roland Westrelin wrote: >> `PhaseIdealLoop::try_sink_out_of_loop()` excludes `Type` nodes because >> we ran into some issues where a `Type` node is sunk and then becomes >> `top` but the control path of its uses doesn't become unreachable. >> >> 8349479 should have fixed that so that exception no longer makes >> sense. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review > Maybe JDK-8319372 can now be reverted with JDK-8349479 in? I think it would be safe but I'm unclear if it's worth doing or not. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25396#issuecomment-2908955452 From mdoerr at openjdk.org Mon May 26 08:44:54 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 26 May 2025 08:44:54 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v5] In-Reply-To: References: Message-ID: <5vUZrd7aaB0XwfW9t0l7n5KO_UOtZXtFFLLyqTIJ7UY=.46c26338-41eb-488c-ac17-0100c7449f7a@github.com> On Mon, 26 May 2025 04:11:46 GMT, Amit Kumar wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > switch to vector stores The large number of conditional branches may cause a regression in real life scenarios with a large variance of sizes and alignments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2908979506 From shade at openjdk.org Mon May 26 09:26:32 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 26 May 2025 09:26:32 GMT Subject: RFR: 8357600: Patch nmethod flushing message to include more details [v2] In-Reply-To: <5A2bSd3m_9tKlhT38_oGxLN-fywDsODjSAD2ThA6418=.4d58b1a1-75e6-4a99-8d51-b65dc381b94a@github.com> References: <5A2bSd3m_9tKlhT38_oGxLN-fywDsODjSAD2ThA6418=.4d58b1a1-75e6-4a99-8d51-b65dc381b94a@github.com> Message-ID: On Sun, 25 May 2025 23:23:36 GMT, Cesar Soares Lucas wrote: >> Please review this patch for adding more details to nmethod flushing message. These details are particularly important when investigating interaction of JVMCI compiled code and code cache flushing heuristics. >> >> Tested on Linux x64 with JTREG tier1-3 using fastdebug and release builds. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Move before lock & check logging Not sure about calling `is_cold()` without a lock: looks like `_gc_epoch` counter is unsynchronized. Now that you predicated the bulk of the logging with logging checks, I think it is fine to grab `CodeCache_lock` before doing logging. ------------- PR Review: https://git.openjdk.org/jdk/pull/25402#pullrequestreview-2867769019 From shade at openjdk.org Mon May 26 09:32:32 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 26 May 2025 09:32:32 GMT Subject: RFR: 8357434: x86: Simplify Interpreter::profile_taken_branch In-Reply-To: References: Message-ID: On Wed, 21 May 2025 08:23:14 GMT, Aleksey Shipilev wrote: > Noticed that `Interpreter::profile_taken_branch` has the same `sbbptr` pattern we have eliminated with [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946). The same logic applies here: the counter is 64-bit, never practically overflows, and no other code cares about it. > > Also tidied up some comments around it. > > Additional testing; > - [x] Linux x86_64 server fastdebug, `tier1 tier2` Friendly reminder. I think this one is simple enough :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25343#issuecomment-2909050969 From amitkumar at openjdk.org Mon May 26 09:32:39 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 26 May 2025 09:32:39 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v5] In-Reply-To: <5vUZrd7aaB0XwfW9t0l7n5KO_UOtZXtFFLLyqTIJ7UY=.46c26338-41eb-488c-ac17-0100c7449f7a@github.com> References: <5vUZrd7aaB0XwfW9t0l7n5KO_UOtZXtFFLLyqTIJ7UY=.46c26338-41eb-488c-ac17-0100c7449f7a@github.com> Message-ID: On Mon, 26 May 2025 08:42:06 GMT, Martin Doerr wrote: > The large number of conditional branches may cause a regression in real life scenarios with a large variance of sizes and alignments. I can try to run the same benchmark with larger sizes. But again it wouldn't replicate the real life scenario. Could you suggest some other benchmark ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2909045991 From fjiang at openjdk.org Mon May 26 09:32:51 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 26 May 2025 09:32:51 GMT Subject: RFR: 8357695: RISC-V: Move vector intrinsic condition checks into match_rule_supported_vector In-Reply-To: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> References: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> Message-ID: On Mon, 26 May 2025 02:52:01 GMT, Dingli Zhang wrote: > Hi all, > Please take a look and review this PR, thanks! > > Currently, the match_rule_supported function in riscv.ad contains checks for vector-related intrinsics (e.g., FmaVF, FmaVD, RoundVF, RoundVD). These checks can be centralized into the match_rule_supported_vector function in the riscv_v.ad file, ensuring consistent handling in their appropriate context. > > ### Testing > * [x] Linux riscv64 server release build on SG2042 src/hotspot/cpu/riscv/riscv.ad line 1883: > 1881: case Op_CountPositives: // fall through > 1882: case Op_EncodeISOArray: > 1883: return UseRVV; Should we move this part to `riscv_v.ad` too? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25438#discussion_r2106907018 From thartmann at openjdk.org Mon May 26 09:32:51 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 26 May 2025 09:32:51 GMT Subject: RFR: 8357530: C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability [v3] In-Reply-To: References: <-K3da45mMblFzdAPag8RqzEKg7F2gM_nnN2HlGK36uk=.4d9855ad-b6ed-4a72-a712-242e5f1f93f8@github.com> Message-ID: <-62gIZPBmMLYGuFC722mNPMtFsyx0C9VS8jCKRugCn8=.827df0ee-eba2-4934-87e2-ef864497308f@github.com> On Mon, 26 May 2025 06:02:45 GMT, Emanuel Peter wrote: >> I'm adding a diagnostic flag `AutoVectorizationOverrideProfitability`. The goal is that with it, we can systematically benchmark our Auto Vectorization profitability heuristics. In all cases, we run Auto Vectorization, including packing. >> - `0`: abort vectorization, as if it was not profitable. >> - `1`: default, use profitability heuristics to determine if we should vectorize. >> - `2`: always vectorize when possible, even if profitability heuristic would say that it is not profitable. >> >> In the future, we may change our heuristics. We may for example introduce a cost model [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093). But at any rate, we need this flag, so that we can override these profitability heuristics, even if just for benchmarking. >> >> I did not yet go through all of `SuperWord` to check if there may be other decisions that could go under this flag. If we find any later, we can still add them. >> >> Below, I'm showing how it helps to benchmark the some reduction cases we have been working on. >> >> And if you want a small test to experiement with, I have one at the end for you. >> >> **Note to reviewer:** This patch should not make any behavioral difference, i.e. with the default `AutoVectorizationOverrideProfitability=1` the behavior should be as before this patch. >> >> -------------------------------------- >> >> **Use-Case: investigate Reduction Heuristics** >> >> A while back, I have written a comprehensive benchmark for Reductions https://github.com/openjdk/jdk/pull/21032. I saw that some cases might possibly be profitable, but we have disabled vectorization because of a heuristic. >> >> This heuristic was added a long time ago. The observation at the time was that simple add and mul reductions were not profitable. >> - https://bugs.openjdk.org/browse/JDK-8078563 >> - https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2015-April/017740.html >> From the comments, it becomes clear that "simple reductions" are not profitable, that's why we check if there are more work vectors than reduction vectors. But I'm not sure why 2-element reductions are deemed always not profitable. Maybe it fit the benchmarks at the time, but now with moving reductions out of the loop, this probably does not make sense any more, at least for int/long. >> >> But in the meantime, I have added an improvement, where we move int/long reductions out of the loop. We can do that because int/long reductions can be reordered. See https://github.com/openjdk/jdk/pull/1... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix wording Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25387#pullrequestreview-2867757844 From amitkumar at openjdk.org Mon May 26 09:32:48 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 26 May 2025 09:32:48 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v5] In-Reply-To: References: Message-ID: <0GRo60zA8_xC5ZRlU0C-giPkDwcxi88L9-aDG5Doy7o=.ff476b1b-9936-46d6-908b-4539e78caeec@github.com> On Mon, 26 May 2025 07:36:14 GMT, Lutz Schmidt wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> switch to vector stores > > src/hotspot/cpu/s390/stubGenerator_s390.cpp line 1499: > >> 1497: >> 1498: __ z_vlvgb(Z_V0, byteVal, 0); >> 1499: __ z_vrepb(Z_V0, Z_V0, 0); > > You could also use `z_vzero(Vreg)` to preload the vector register with all zeroes. Saves an instruction. I am not loading 0 here. This is my intention: with `z_vlvgb`, putting value of `byteVal` in the first 0th index of `Z_V0` and then with `z_vrepb` replicating the `0th` index value (1 byte) to the whole register. `z_vzero` will make sense if we are zeroing out the memory but that's not the case always. We do fill some non-zero 1 byte value in most of the case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24480#discussion_r2106913366 From thartmann at openjdk.org Mon May 26 09:32:52 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 26 May 2025 09:32:52 GMT Subject: RFR: 8357530: C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability [v3] In-Reply-To: References: <-K3da45mMblFzdAPag8RqzEKg7F2gM_nnN2HlGK36uk=.4d9855ad-b6ed-4a72-a712-242e5f1f93f8@github.com> <6OsU9gpjhuaflrB-93m3FjlTIdAkVQeVThrnWCAgc-M=.6071f477-6907-44bd-8300-6f55d37c575f@github.com> <2NKVQWxNQZS-rNdBPlm3tRCZvDkfCAlVsLvQmguH_nA=.2271fc60-76b7-4d51-be49-926fea9a9a45@github.com> Message-ID: On Mon, 26 May 2025 07:03:20 GMT, Emanuel Peter wrote: >> I think you missed my second comment here. > > @TobiHartmann I did miss it! This is what I get when I google "as if vs as though": >> "As if" and "as though" are generally interchangeable and have the same meaning; they are both used to compare a real situation to an imaginary or hypothetical one. "As if" is slightly more common than "as though," but both are used to express a comparison, often where something appears to be the case but may not be. > > Would you still like me to change it? Ah, no, all good then. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25387#discussion_r2106886531 From dlunden at openjdk.org Mon May 26 09:34:29 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 26 May 2025 09:34:29 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v7] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 15:00:58 GMT, Daniel Lund?n wrote: > I've merged your changes with `master` and will rerun the benchmarks to double check. An update on this: after merging with `master` it looks like we now have more of a positive result with your changes. I will continue running more benchmarks to verify what's going on. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2909057658 From dlunden at openjdk.org Mon May 26 10:08:36 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 26 May 2025 10:08:36 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v20] In-Reply-To: References: Message-ID: > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467+pr-updates - Fix typo - Updates after Emanuel's comments - Refactor and improve TestNestedSynchronize.java - Update comments - Revise TestNestedSynchronize to make use of CompileFramework - Revise overlap comments for frequency of cases - Update test comment to also mention timeouts - Fix suboptimal max limit in _grow - Updates after comments - ... and 19 more: https://git.openjdk.org/jdk/compare/ed4cd2ac...9cefa15f ------------- Changes: https://git.openjdk.org/jdk/pull/20404/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=19 Stats: 2655 lines in 31 files changed: 2099 ins; 276 del; 280 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404 From rcastanedalo at openjdk.org Mon May 26 10:15:51 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 26 May 2025 10:15:51 GMT Subject: RFR: 8357105: C2: compilation fails with "assert(false) failed: empty program detected during loop optimization" [v3] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 06:33:09 GMT, Daniel Skantz wrote: >> This pull request contains a fix for JDK-8357105. >> >> The problem is performing stacked string concatenation optimization between a pair of StringBuilder.append().toString()-links SB1 and SB2, where the parameter of an append call in SB2 has a complex dependency on the result of SB1, which in turn is replaced by top() during stringopts -- similar to JDK-8271341, which had a diamond if-structure using the result of SB1, while in this case the use is an unstable If. In the attached regression test, a live part of the graph gets optimized away during later phases and ultimately the whole graph vanishes. >> >> The proposed solution is to simply exclude this specific case. This bug has existed for a long time and stacked concats is a niche optimization. >> >> Testing: >> Tier1-4. >> >> Extra testing: >> Ran Tier1-4 with an instrumented build and observed that we do not disable stacked concatenation in any previously known case after the fix. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > add second run; - iteration count; add use of result Looks good, thanks! Please re-run testing of your latest changes (if you haven't yet) before integration. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25395#pullrequestreview-2867932983 From jbhateja at openjdk.org Mon May 26 10:19:53 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 26 May 2025 10:19:53 GMT Subject: RFR: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry In-Reply-To: References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: On Wed, 21 May 2025 20:02:27 GMT, Sandhya Viswanathan wrote: >>> @jatin-bhateja I'll run some internal testing, please ping me in 24h for results! :) >> >> Please use the latest version > >> > @jatin-bhateja @sviswa7 Can you explain the impact of the `EVEX_HVM`, `EVEX_QVM` etc, and what is the impact if we get them wrong? Performance? Wrong results? How can we test that they are correct? >> >> @eme64 In EVEX the displacement for memory in the addressing mode is encoded using compressed disp8 encoding scheme. The EVEX_FVM, EVEX_HVM, EVEX_QVM etc denote tuple type and are used to determine the scaling factor for displacement. Please see section "2.7.5 Compressed Displacement (disp8*N) Support in EVEX" in [Intel SDM Volume 2](https://cdrdv2.intel.com/v1/dl/getContent/671110). So to answer your question, if the tuple type is incorrect we will see wrong results if the displacement is non zero. > > For testing, the best way would be to create a SIMD instruction encoding test tool on similar lines as https://github.com/openjdk/jdk/commit/52d752c43b3a9935ea97051c39adf381084035cc in a separate future PR. > @sviswa7 Thanks for the explanations! Could we also test it with Java code that generates all sorts of address shapes, e.g. with various offsets and scaling factors? > > I'll re-run testing now, just to be sure. Hi @eme64 , Please let know if your tests are clean and its good to land this ------------- PR Comment: https://git.openjdk.org/jdk/pull/25021#issuecomment-2909194488 From mli at openjdk.org Mon May 26 10:42:05 2025 From: mli at openjdk.org (Hamlin Li) Date: Mon, 26 May 2025 10:42:05 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) Message-ID: Hi, Can you help to review this patch? This pr is splited from https://github.com/openjdk/jdk/pull/25341, and contains only share code change. This patch enable the vectorization of statement like `fd_1 bop fd_2 ? res_1 : res_2` in a loop. The current behaviour on other platforms support vecatorization of `fd_1 bop fd_2 ? res_1 : res_2` in a loop only when `fd` and `res` have the same size, but this constraint seems not necessary at least not necessary on riscv, so I relax this constraint on riscv, maybe on other platforms it can be relaxed too, but currently I only made it work on riscv. Besides of this, I also relax the constraint on transforming Op_CMoveI/L to Op_VectorBlend on riscv, this bring some extra benefit when the `res` is not float or double types. Both relaxation bring performance benefit via vectorization. Compared with other runs (master, master with `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on, patch without flags turned on), average improvement introduced by the patch with `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on is more than 2.1 times, in some cases it can bring more than 4 times improvement. When `-XX:-UseVectorCmov -XX:-UseCMoveUnconditionally` turned off, there is no regression on average. Check more details at: https://github.com/openjdk/jdk/pull/25341. Thanks ------------- Commit messages: - split from pr 25341 - disable cmovei/l => vectorblend - initial commit Changes: https://git.openjdk.org/jdk/pull/25336/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25336&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357554 Stats: 120 lines in 9 files changed: 118 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25336.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25336/head:pull/25336 PR: https://git.openjdk.org/jdk/pull/25336 From shade at openjdk.org Mon May 26 11:05:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 26 May 2025 11:05:54 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 22:12:19 GMT, Vladimir Ivanov wrote: >> src/java.base/share/classes/java/lang/ref/Reference.java line 662: >> >>> 660: * @since 9 >>> 661: */ >>> 662: @IntrinsicCandidate >> >> Sounds like we also want to restore `@DontInline` to cover the case when intrinsic is not available / disabled for some compiler. I vaguely remember some intrinsic handling code checks whether method is prohibited from inlining (maybe affects only global `-XX:-Inline`, not sure), so it might be as straightforward. > > I'd like to use `-XX:DisableIntrinsic=_Reference_reachabilityFence` to switch to current behavior (no fence). > Also, `@DontInline` would require special handling in C1 to unconditionally inline it. > > `@ForceInline` was there primarily to communicate the interaction with JVM. (Existing inlining heuristics should just unconditionally inline empty methods.) Once `@IntrinsicCandidate` is there, I don't see much value in any other annotations. But the whole point of this PR is that "current behavior" is incorrect, isn't it? I think disabling `_Reference_reachabilityFence` intrinsic (or, failing to inline the intrinsic for some other reason) should fail-safe to non-inlined method, not fail-deadly to a broken RF. In other words, let's not rely on intrinsic to work for correctness; non-intrinsified version should be correct as well. I agree `@DontInline` would require a bit of extra fiddling in C1, but I suspect it should be as easy as copy-pasting a few hunks around `LIRGenerator::do_blackhole`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2107085362 From zzambers at openjdk.org Mon May 26 11:09:55 2025 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Mon, 26 May 2025 11:09:55 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option [v2] In-Reply-To: References: Message-ID: <3l0u3A6bRQ_TzLwmzQyO6aEivX6z3sdXbzYUbbXhueU=.3b94722f-e668-40ab-aa3e-5e580a81d0c6@github.com> On Wed, 21 May 2025 13:57:07 GMT, Zdenek Zambersky wrote: >> This adds `@requires vm.compiler2.enabled` to tests, which fail with `Unrecognized VM option` on client VM. > > Zdenek Zambersky has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: > > Fix of compiler tests for client VM It seems, I'll have to rebase ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2909343156 From epeter at openjdk.org Mon May 26 11:20:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 May 2025 11:20:00 GMT Subject: RFR: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry [v5] In-Reply-To: References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: On Wed, 21 May 2025 01:57:35 GMT, Jatin Bhateja wrote: >> PR adds missing EVEX compressed displacement attributes used for computing the scale factor (N) of compressed displacement. >> AVX512 memory operand instructions use compressed disp8 encoding if the displacement is a multiple of scale (N), which depends on Vector Length, embedded broadcasting, and lane size. Please refer to section 2.7.5 of Intel SDM for more details. >> >> e.g., Consider two instructions, one with displacement 0x10203040 and the other with displacement 0x40, instruction operates over full 64-byte vector hence scale N = 64. Displacement of latter instruction is a multiple of scale, thus can be represented by 1 byte displacement encoding, while the former requires 4 bytes to represent displacement in instruction encoding. >> >> >> 1) vpternlogq $0xff,0x10203040(%r20,%r21,8),%zmm23,%zmm24 >> EVEX OP MR SIB DISP IMM >> --------------|----|----|----|---------------|-----| >> 62 6b c1 40 25 84 ec 40 30 20 10 ff >> >> 2) vpternlogq $0xff,0x40(%r20,%r21,8),%zmm23,%zmm24 >> For full vector width operation, scalar matches with vector size, hence scale N = 64 >> effective displacement / compressed DISP8 = OFFSET(64) / 64 = 0x1 >> EVEX OP MR SIB DISP IMM >> -------------|----|---|---|-----------|---| >> 62 6b c1 40 25 44 ec 01 ff >> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review Resoultions Testing passed, change looks reasonable! Looking forward to more testing in the future! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25021#pullrequestreview-2868081867 From jbhateja at openjdk.org Mon May 26 11:20:01 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 26 May 2025 11:20:01 GMT Subject: RFR: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry In-Reply-To: References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: <6myX_RsSLuQiNeBsGoDk6PcDqJDyLdom17Lid3_Wamg=.11c2b4bc-883b-4d97-b1ce-7c4bd3e07556@github.com> On Thu, 22 May 2025 06:01:33 GMT, Emanuel Peter wrote: >>> > @jatin-bhateja @sviswa7 Can you explain the impact of the `EVEX_HVM`, `EVEX_QVM` etc, and what is the impact if we get them wrong? Performance? Wrong results? How can we test that they are correct? >>> >>> @eme64 In EVEX the displacement for memory in the addressing mode is encoded using compressed disp8 encoding scheme. The EVEX_FVM, EVEX_HVM, EVEX_QVM etc denote tuple type and are used to determine the scaling factor for displacement. Please see section "2.7.5 Compressed Displacement (disp8*N) Support in EVEX" in [Intel SDM Volume 2](https://cdrdv2.intel.com/v1/dl/getContent/671110). So to answer your question, if the tuple type is incorrect we will see wrong results if the displacement is non zero. >> >> For testing, the best way would be to create a SIMD instruction encoding test tool on similar lines as https://github.com/openjdk/jdk/commit/52d752c43b3a9935ea97051c39adf381084035cc in a separate future PR. > > @sviswa7 Thanks for the explanations! > Could we also test it with Java code that generates all sorts of address shapes, e.g. with various offsets and scaling factors? > > I'll re-run testing now, just to be sure. Thanks @eme64 and @sviswa7 ------------- PR Comment: https://git.openjdk.org/jdk/pull/25021#issuecomment-2909365658 From jbhateja at openjdk.org Mon May 26 11:20:02 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 26 May 2025 11:20:02 GMT Subject: Integrated: 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry In-Reply-To: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> References: <6YRY7UjPTFDr08NUvGQQs1dmBx8L2zPpXWyv-v-AUt8=.ef637d92-6b41-4698-9d85-da4ab6e64aa8@github.com> Message-ID: On Sat, 3 May 2025 15:49:24 GMT, Jatin Bhateja wrote: > PR adds missing EVEX compressed displacement attributes used for computing the scale factor (N) of compressed displacement. > AVX512 memory operand instructions use compressed disp8 encoding if the displacement is a multiple of scale (N), which depends on Vector Length, embedded broadcasting, and lane size. Please refer to section 2.7.5 of Intel SDM for more details. > > e.g., Consider two instructions, one with displacement 0x10203040 and the other with displacement 0x40, instruction operates over full 64-byte vector hence scale N = 64. Displacement of latter instruction is a multiple of scale, thus can be represented by 1 byte displacement encoding, while the former requires 4 bytes to represent displacement in instruction encoding. > > > 1) vpternlogq $0xff,0x10203040(%r20,%r21,8),%zmm23,%zmm24 > EVEX OP MR SIB DISP IMM > --------------|----|----|----|---------------|-----| > 62 6b c1 40 25 84 ec 40 30 20 10 ff > > 2) vpternlogq $0xff,0x40(%r20,%r21,8),%zmm23,%zmm24 > For full vector width operation, scalar matches with vector size, hence scale N = 64 > effective displacement / compressed DISP8 = OFFSET(64) / 64 = 0x1 > EVEX OP MR SIB DISP IMM > -------------|----|---|---|-----------|---| > 62 6b c1 40 25 44 ec 01 ff > > > Kindly review and share your feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 7002233e Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/7002233ed943a21b49bc69ff728964d004b2d5c1 Stats: 4097 lines in 37 files changed: 4049 ins; 2 del; 46 mod 8351950: C2: AVX512 vector assembler routines causing SIGFPE / no valid evex tuple_table entry Reviewed-by: epeter, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/25021 From shade at openjdk.org Mon May 26 11:48:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 26 May 2025 11:48:54 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 22:23:48 GMT, Vladimir Ivanov wrote: >> Now that I read the next hunk, should `is_DecodeN` be `is_DecodeNarrowPtr` to capture class loads (however unlikely that one is)? > > `ReachabilityFence` accepts only OOPs as a referent and `DecodeNKlass` produces `Klass` pointer. > > I suspect it may be the case for safepoints as well (and `is_DecodeNarrowPtr()` is a a leftover from PermGen world), but I didn't check. Right, nevermind about `DecodeNKlass` then. My question on heap loads still stands: do we actually get `reachabilityFence(someField)` from anywhere? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2107155845 From shade at openjdk.org Mon May 26 11:48:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 26 May 2025 11:48:55 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 21:23:11 GMT, Vladimir Ivanov wrote: >> src/hotspot/share/opto/parse1.cpp line 1225: >> >>> 1223: >>> 1224: if (StressReachabilityFences) { >>> 1225: // Keep all oop arguments alive until method return. >> >> Comment says "arguments", but we save locals. Aren't arguments on "stack" in `JVMState`? For stress mode, would make sense to hook up both locals/stack from `JVMState`, maybe? > > It happens inside callee context, so all arguments are already moved to locals. The code could explicitly iterate over arguments (using `argument(uint)` query) or enumerate only those locals which hold arguments, but that would require a special case for receiver. Iteration over locals (`[0 ... max_locals)`) is uniform and enumerates only arguments since everything else is top. OK! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2107156627 From rehn at openjdk.org Mon May 26 11:54:59 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 26 May 2025 11:54:59 GMT Subject: Integrated: 8357056: RISC-V: Asm fixes - load/store width In-Reply-To: References: Message-ID: <0AsE5fa3oJaYn3DIESAYsbKaLAmnQCGAcbR-uz7Pemc=.1e7862e5-c234-4360-a7b0-8c5e2f9d877f@github.com> On Thu, 15 May 2025 14:46:12 GMT, Robbin Ehn wrote: > Hi, please consider. > > While working on https://github.com/openjdk/jdk/pull/25252, I notice: > - Major op code was just repeat > - Width coded in binary > - Stores have mixed up rs1 and rs2 > - Bonus, fsd used a macro for no reason > > I think this improves readability. > > Tested tier1 > > Thanks, Robbin This pull request has now been integrated. Changeset: daa8eda5 Author: Robbin Ehn URL: https://git.openjdk.org/jdk/commit/daa8eda530c4c3929c68ace1f1a2d1ed62331584 Stats: 289 lines in 1 file changed: 120 ins; 76 del; 93 mod 8357056: RISC-V: Asm fixes - load/store width Reviewed-by: fjiang, mli, luhenry, fyang ------------- PR: https://git.openjdk.org/jdk/pull/25253 From jbhateja at openjdk.org Mon May 26 12:03:56 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 26 May 2025 12:03:56 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill In-Reply-To: References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Fri, 23 May 2025 13:53:23 GMT, Roberto Casta?eda Lozano wrote: > > > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. > > > > > > > > > Have you checked that these tests exercise `ZRuntimeCallSpill` significantly? Most tests in that directory seem to exercise C2's generated ZGC barriers, which use other spilling/restoring logic across runtime calls (`SaveLiveRegisters`). Also, I expect the register pressure in these test cases to be minimal, so it could be good to randomize register assignment to improve the testing effectiveness. Finally, `ZRuntimeCallSpill` is typically used in slow paths, which are rarely exercised in short-lived test cases. Have you considered altering the users of `ZRuntimeCallSpill` so that they are forced to always, or at least more often, enter the slow path, for testing purposes? [This PR](https://github.com/openjdk/jdk/pull/18967) did something similar in the context of C2 ZGC barriers. > > > > > > Intel SDE allows us to collect execution traces with _-itrace_execute_emulate_ and we found quite a lot of register save/ restorations around native method, there is already an existing test point for it https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/gcbarriers/UnsafeIntrinsicsTest.java > > OK, thanks for checking Jatin! > > Have you also checked whether, at least in some of the cases, some of the APX EGPRs are live across the runtime call (i.e. are defined before the call and used after the call), and whether the called runtime routine typically clobbers these registers? Knowing that this case is exercised in the test runs would be good to be confident about the correctness of the patch. Hi @robcasloz, The patch uses new push2/pop2 instructions, which reduces dynamic instruction count needed to save and restore all the caller-saved registers. New instruction sequence based on push2/pop2 not only saves EGPRs but also existing GPRs with shorter JIT sequence. We verified our fix using the following standalone gtest with the Intel Software Development Emulator. [test_ZRuntimeCallSpill_cpp.txt](https://github.com/user-attachments/files/20440415/test_ZRuntimeCallSpill_cpp.txt) Given that gtests is a build-time validation and the JVM itself is built with with minimum feature set, hence am hesitant to add this along with the patch. BTW, ZRuntimeCallSpill is called as part of the slow path barrier for native methods, which can modify EGPRs. Let me know if you think it's good to land in. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25351#issuecomment-2909487712 From aboldtch at openjdk.org Mon May 26 12:13:56 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 26 May 2025 12:13:56 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill [v3] In-Reply-To: References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Thu, 22 May 2025 17:42:06 GMT, Jatin Bhateja wrote: >> Patch spills APX EGPRs across runtime calls to slow-path barriers using PUSH2P/POP2 instructions with PPX hints. >> These instructions operate over a pair of registers resulting into an smaller save/restoration JIT code, on the hind side they have hard alignment and balancing constraints, as they operate over 16-byte aligned stack address. >> ZRuntimeCallSpill is agnostic to live register, thus resulting SPILL sequence should not modify the contents of the register. >> >> Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Seems fine. Eventually it would be nice if we could generalise this and have the logic in the MacroAssembler. Just a small comment about the conditional rax push and pop. src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp line 196: > 194: __ movptr(_result, rax); > 195: __ popp(rax); > 196: } Same here. Suggestion: if (_result != rax) { if (_result != nullptr) { __ movptr(_result, rax); } __ popp(rax); } src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp line 211: > 209: __ movptr(_result, rax); > 210: __ pop(rax); > 211: } Was unsure if we should change the behaviour in the else branch in this PR. But it seems like an alright change. However, I think it is easier to see that this does the correct thing if the condition for pushing and popping are the same. Suggestion: if (_result != rax) { if (_result != noreg) { __ movptr(_result, rax); } __ pop(rax); } ------------- Changes requested by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25351#pullrequestreview-2868218162 PR Review Comment: https://git.openjdk.org/jdk/pull/25351#discussion_r2107199288 PR Review Comment: https://git.openjdk.org/jdk/pull/25351#discussion_r2107197141 From jbhateja at openjdk.org Mon May 26 12:56:24 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 26 May 2025 12:56:24 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill [v4] In-Reply-To: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: > Patch spills APX EGPRs across runtime calls to slow-path barriers using PUSH2P/POP2 instructions with PPX hints. > These instructions operate over a pair of registers resulting into an smaller save/restoration JIT code, on the hind side they have hard alignment and balancing constraints, as they operate over 16-byte aligned stack address. > ZRuntimeCallSpill is agnostic to live register, thus resulting SPILL sequence should not modify the contents of the register. > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Axel's comments incorporated ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25351/files - new: https://git.openjdk.org/jdk/pull/25351/files/9b5c2ac4..bd8b9c51 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25351&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25351&range=02-03 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25351.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25351/head:pull/25351 PR: https://git.openjdk.org/jdk/pull/25351 From fjiang at openjdk.org Mon May 26 13:17:51 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 26 May 2025 13:17:51 GMT Subject: RFR: 8357695: RISC-V: Move vector intrinsic condition checks into match_rule_supported_vector In-Reply-To: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> References: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> Message-ID: On Mon, 26 May 2025 02:52:01 GMT, Dingli Zhang wrote: > Hi all, > Please take a look and review this PR, thanks! > > Currently, the match_rule_supported function in riscv.ad contains checks for vector-related intrinsics (e.g., FmaVF, FmaVD, RoundVF, RoundVD). These checks can be centralized into the match_rule_supported_vector function in the riscv_v.ad file, ensuring consistent handling in their appropriate context. > > ### Testing > * [x] Linux riscv64 server release build on SG2042 Looks good. Thanks! ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/25438#pullrequestreview-2868425897 From fjiang at openjdk.org Mon May 26 13:17:52 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 26 May 2025 13:17:52 GMT Subject: RFR: 8357695: RISC-V: Move vector intrinsic condition checks into match_rule_supported_vector In-Reply-To: References: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> Message-ID: On Mon, 26 May 2025 09:14:51 GMT, Feilong Jiang wrote: >> Hi all, >> Please take a look and review this PR, thanks! >> >> Currently, the match_rule_supported function in riscv.ad contains checks for vector-related intrinsics (e.g., FmaVF, FmaVD, RoundVF, RoundVD). These checks can be centralized into the match_rule_supported_vector function in the riscv_v.ad file, ensuring consistent handling in their appropriate context. >> >> ### Testing >> * [x] Linux riscv64 server release build on SG2042 > > src/hotspot/cpu/riscv/riscv.ad line 1883: > >> 1881: case Op_CountPositives: // fall through >> 1882: case Op_EncodeISOArray: >> 1883: return UseRVV; > > Should we move this part to `riscv_v.ad` too? Ah, turns out `Op_EncodeISOArray` and its friends are checked only in `match_rule_supported`, so we can not move them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25438#discussion_r2107337885 From rcastanedalo at openjdk.org Mon May 26 14:07:53 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 26 May 2025 14:07:53 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill In-Reply-To: References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Fri, 23 May 2025 13:53:23 GMT, Roberto Casta?eda Lozano wrote: >>> > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. >>> >>> Have you checked that these tests exercise `ZRuntimeCallSpill` significantly? Most tests in that directory seem to exercise C2's generated ZGC barriers, which use other spilling/restoring logic across runtime calls (`SaveLiveRegisters`). Also, I expect the register pressure in these test cases to be minimal, so it could be good to randomize register assignment to improve the testing effectiveness. Finally, `ZRuntimeCallSpill` is typically used in slow paths, which are rarely exercised in short-lived test cases. Have you considered altering the users of `ZRuntimeCallSpill` so that they are forced to always, or at least more often, enter the slow path, for testing purposes? [This PR](https://github.com/openjdk/jdk/pull/18967) did something similar in the context of C2 ZGC barriers. >> >> Intel SDE allows us to collect execution traces with _-itrace_execute_emulate_ and we found quite a lot of register save/ restorations around native method, there is already an existing test point for it >> https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/gcbarriers/UnsafeIntrinsicsTest.java > >> > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. >> > >> > >> > Have you checked that these tests exercise `ZRuntimeCallSpill` significantly? Most tests in that directory seem to exercise C2's generated ZGC barriers, which use other spilling/restoring logic across runtime calls (`SaveLiveRegisters`). Also, I expect the register pressure in these test cases to be minimal, so it could be good to randomize register assignment to improve the testing effectiveness. Finally, `ZRuntimeCallSpill` is typically used in slow paths, which are rarely exercised in short-lived test cases. Have you considered altering the users of `ZRuntimeCallSpill` so that they are forced to always, or at least more often, enter the slow path, for testing purposes? [This PR](https://github.com/openjdk/jdk/pull/18967) did something similar in the context of C2 ZGC barriers. >> >> Intel SDE allows us to collect execution traces with _-itrace_execute_emulate_ and we found quite a lot of register save/ restorations around native method, there is already an existing test point for it https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/gcbarriers/UnsafeIntrinsicsTest.java > > OK, thanks for checking Jatin! > > Have you also checked whether, at least in some of the cases, some of the APX EGPRs are live across the runtime call (i.e. are defined before the call and used after the call), and whether the called runtime routine typically clobbers these registers? Knowing that this case is exercised in the test runs would be good to be confident about the correctness of the patch. > Hi @robcasloz, The patch uses new push2/pop2 instructions, which reduces dynamic instruction count needed to save and restore all the caller-saved registers. New instruction sequence based on push2/pop2 not only saves EGPRs but also existing GPRs with shorter JIT sequence. We verified our fix using the following standalone gtest with the Intel Software Development Emulator. > > [test_ZRuntimeCallSpill_cpp.txt](https://github.com/user-attachments/files/20440415/test_ZRuntimeCallSpill_cpp.txt) > > Given that gtests is a build-time validation and the JVM itself is built with with minimum feature set, hence am hesitant to add this along with the patch. BTW, ZRuntimeCallSpill is called as part of the slow path barrier for native methods, which can modify EGPRs. > > Let me know if you think it's good to land in. Thanks for the details! Let me run some internal testing, since the PR affects spilling of non-extended registers too (due to special handling of `_result == rax`). Will come back with the results within a day or two. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25351#issuecomment-2909871002 From dskantz at openjdk.org Mon May 26 14:25:01 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Mon, 26 May 2025 14:25:01 GMT Subject: RFR: 8357105: C2: compilation fails with "assert(false) failed: empty program detected during loop optimization" [v3] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 06:33:09 GMT, Daniel Skantz wrote: >> This pull request contains a fix for JDK-8357105. >> >> The problem is performing stacked string concatenation optimization between a pair of StringBuilder.append().toString()-links SB1 and SB2, where the parameter of an append call in SB2 has a complex dependency on the result of SB1, which in turn is replaced by top() during stringopts -- similar to JDK-8271341, which had a diamond if-structure using the result of SB1, while in this case the use is an unstable If. In the attached regression test, a live part of the graph gets optimized away during later phases and ultimately the whole graph vanishes. >> >> The proposed solution is to simply exclude this specific case. This bug has existed for a long time and stacked concats is a niche optimization. >> >> Testing: >> Tier1-4. >> >> Extra testing: >> Ran Tier1-4 with an instrumented build and observed that we do not disable stacked concatenation in any previously known case after the fix. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > add second run; - iteration count; add use of result Thanks for the reviews and suggestions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25395#issuecomment-2909913631 From dskantz at openjdk.org Mon May 26 14:25:02 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Mon, 26 May 2025 14:25:02 GMT Subject: Integrated: 8357105: C2: compilation fails with "assert(false) failed: empty program detected during loop optimization" In-Reply-To: References: Message-ID: On Thu, 22 May 2025 15:19:08 GMT, Daniel Skantz wrote: > This pull request contains a fix for JDK-8357105. > > The problem is performing stacked string concatenation optimization between a pair of StringBuilder.append().toString()-links SB1 and SB2, where the parameter of an append call in SB2 has a complex dependency on the result of SB1, which in turn is replaced by top() during stringopts -- similar to JDK-8271341, which had a diamond if-structure using the result of SB1, while in this case the use is an unstable If. In the attached regression test, a live part of the graph gets optimized away during later phases and ultimately the whole graph vanishes. > > The proposed solution is to simply exclude this specific case. This bug has existed for a long time and stacked concats is a niche optimization. > > Testing: > Tier1-4. > > Extra testing: > Ran Tier1-4 with an instrumented build and observed that we do not disable stacked concatenation in any previously known case after the fix. This pull request has now been integrated. Changeset: a300c356 Author: Daniel Skantz URL: https://git.openjdk.org/jdk/commit/a300c356555019a42c19bf0c16184f6dee4ad96e Stats: 66 lines in 2 files changed: 64 ins; 0 del; 2 mod 8357105: C2: compilation fails with "assert(false) failed: empty program detected during loop optimization" Reviewed-by: syan, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/25395 From jkarthikeyan at openjdk.org Mon May 26 14:30:52 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 26 May 2025 14:30:52 GMT Subject: RFR: 8355512: Test compiler/vectorization/TestVectorZeroCount.java times out with -XX:TieredStopAtLevel=3 In-Reply-To: References: <_ZX5WU8JRDjnkJg7YiE29_Pfzp6LhgpWxKBAfo9rKBE=.a7fc5946-b082-43f3-9ba6-e256304e43e3@github.com> Message-ID: On Mon, 19 May 2025 14:15:05 GMT, Emanuel Peter wrote: >> @eme64 @chhagedorn Thanks a lot for the reviews! Does testing need to be run on this change, or can I integrate it? > > @jaskarth Thanks for asking. I'll run some testing, just to be safe :) @eme64 Any results with testing? :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/25243#issuecomment-2909931316 From dzhang at openjdk.org Mon May 26 15:11:51 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Mon, 26 May 2025 15:11:51 GMT Subject: RFR: 8357695: RISC-V: Move vector intrinsic condition checks into match_rule_supported_vector In-Reply-To: References: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> Message-ID: On Mon, 26 May 2025 13:14:55 GMT, Feilong Jiang wrote: > Ah, turns out `Op_EncodeISOArray` and its friends are checked only in `match_rule_supported`, so we can not move them. Thanks for review! I think you're right, we don't need move this part. They are not vector operators and do not invoke `Matcher::match_rule_supported_vector`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25438#discussion_r2107531462 From epeter at openjdk.org Mon May 26 17:20:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 May 2025 17:20:55 GMT Subject: RFR: 8355512: Test compiler/vectorization/TestVectorZeroCount.java times out with -XX:TieredStopAtLevel=3 In-Reply-To: References: <_ZX5WU8JRDjnkJg7YiE29_Pfzp6LhgpWxKBAfo9rKBE=.a7fc5946-b082-43f3-9ba6-e256304e43e3@github.com> Message-ID: On Mon, 26 May 2025 14:27:45 GMT, Jasmine Karthikeyan wrote: >> @jaskarth Thanks for asking. I'll run some testing, just to be safe :) > > @eme64 Any results with testing? :) @jaskarth All :green_circle: , ship it! ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25243#issuecomment-2910308336 From epeter at openjdk.org Mon May 26 18:34:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 May 2025 18:34:01 GMT Subject: RFR: 8357530: C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability [v3] In-Reply-To: <-62gIZPBmMLYGuFC722mNPMtFsyx0C9VS8jCKRugCn8=.827df0ee-eba2-4934-87e2-ef864497308f@github.com> References: <-K3da45mMblFzdAPag8RqzEKg7F2gM_nnN2HlGK36uk=.4d9855ad-b6ed-4a72-a712-242e5f1f93f8@github.com> <-62gIZPBmMLYGuFC722mNPMtFsyx0C9VS8jCKRugCn8=.827df0ee-eba2-4934-87e2-ef864497308f@github.com> Message-ID: On Mon, 26 May 2025 09:01:53 GMT, Tobias Hartmann wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix wording > > Marked as reviewed by thartmann (Reviewer). @TobiHartmann @vnkozlov Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25387#issuecomment-2910413209 From epeter at openjdk.org Mon May 26 18:34:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 May 2025 18:34:02 GMT Subject: Integrated: 8357530: C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability In-Reply-To: <-K3da45mMblFzdAPag8RqzEKg7F2gM_nnN2HlGK36uk=.4d9855ad-b6ed-4a72-a712-242e5f1f93f8@github.com> References: <-K3da45mMblFzdAPag8RqzEKg7F2gM_nnN2HlGK36uk=.4d9855ad-b6ed-4a72-a712-242e5f1f93f8@github.com> Message-ID: <3NVL_4H0qcBme9_Fe-0Fjo4_nY0XfsbiKP6To9XZ49A=.95affa05-1398-4b0a-ad53-5514db9b0b92@github.com> On Thu, 22 May 2025 08:54:42 GMT, Emanuel Peter wrote: > I'm adding a diagnostic flag `AutoVectorizationOverrideProfitability`. The goal is that with it, we can systematically benchmark our Auto Vectorization profitability heuristics. In all cases, we run Auto Vectorization, including packing. > - `0`: abort vectorization, as if it was not profitable. > - `1`: default, use profitability heuristics to determine if we should vectorize. > - `2`: always vectorize when possible, even if profitability heuristic would say that it is not profitable. > > In the future, we may change our heuristics. We may for example introduce a cost model [JDK-8340093](https://bugs.openjdk.org/browse/JDK-8340093). But at any rate, we need this flag, so that we can override these profitability heuristics, even if just for benchmarking. > > I did not yet go through all of `SuperWord` to check if there may be other decisions that could go under this flag. If we find any later, we can still add them. > > Below, I'm showing how it helps to benchmark the some reduction cases we have been working on. > > And if you want a small test to experiement with, I have one at the end for you. > > **Note to reviewer:** This patch should not make any behavioral difference, i.e. with the default `AutoVectorizationOverrideProfitability=1` the behavior should be as before this patch. > > -------------------------------------- > > **Use-Case: investigate Reduction Heuristics** > > A while back, I have written a comprehensive benchmark for Reductions https://github.com/openjdk/jdk/pull/21032. I saw that some cases might possibly be profitable, but we have disabled vectorization because of a heuristic. > > This heuristic was added a long time ago. The observation at the time was that simple add and mul reductions were not profitable. > - https://bugs.openjdk.org/browse/JDK-8078563 > - https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2015-April/017740.html > From the comments, it becomes clear that "simple reductions" are not profitable, that's why we check if there are more work vectors than reduction vectors. But I'm not sure why 2-element reductions are deemed always not profitable. Maybe it fit the benchmarks at the time, but now with moving reductions out of the loop, this probably does not make sense any more, at least for int/long. > > But in the meantime, I have added an improvement, where we move int/long reductions out of the loop. We can do that because int/long reductions can be reordered. See https://github.com/openjdk/jdk/pull/13056 . We cannot do that with float/double reductions,... This pull request has now been integrated. Changeset: e8eff4d2 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/e8eff4d25b984d503a4daa5d291b52a8d1e2f186 Stats: 233 lines in 3 files changed: 225 ins; 0 del; 8 mod 8357530: C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/25387 From shade at openjdk.org Mon May 26 19:01:38 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 26 May 2025 19:01:38 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v20] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 35 commits: - Switch to mutable - Merge branch 'master' into JDK-8231269-compile-task-weaks - More touchups - Spin lock induces false sharing - Merge branch 'master' into JDK-8231269-compile-task-weaks - Merge branch 'master' into JDK-8231269-compile-task-weaks - Rename CompilerTask::is_unloaded back to avoid losing comment context - Simplify select_for_compilation - Merge branch 'master' into JDK-8231269-compile-task-weaks - More touchups - ... and 25 more: https://git.openjdk.org/jdk/compare/ed4cd2ac...0c1c5d65 ------------- Changes: https://git.openjdk.org/jdk/pull/24018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=19 Stats: 429 lines in 11 files changed: 389 ins; 22 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From epeter at openjdk.org Mon May 26 19:00:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 May 2025 19:00:06 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v32] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 08:20:50 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Need to hop trains, sending out a first batch :) src/hotspot/share/opto/c2_globals.hpp line 860: > 858: develop(bool, StressShortRunningLongLoop, false, \ > 859: "Speculate all long counted loops are short running when bounds " \ > 860: "are unknown even if profile data doesn't say so.") \ Why only when bounds are unknown? src/hotspot/share/opto/castnode.cpp line 327: > 325: } > 326: > 327: bool CastLLNode::inner_loop_backedge(Node* proj) { Suggestion: bool CastLLNode::is_inner_loop_backedge(Node* proj) { Optional. It would help me know that it is just a check. Otherwise, I wonder if we might "make" the inner loop backedge. src/hotspot/share/opto/castnode.cpp line 339: > 337: } > 338: > 339: bool CastLLNode::cmp_used_at_inner_loop_exit_test(Node* cmp) { Assert that `cmp` is really a `CmpNode`? ------------- PR Review: https://git.openjdk.org/jdk/pull/21630#pullrequestreview-2869017909 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2107745527 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2107747988 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2107751453 From shade at openjdk.org Mon May 26 19:01:38 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 26 May 2025 19:01:38 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v19] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 07:53:19 GMT, Axel Boldt-Christmas wrote: > Not sure what our opinion is w.r.t. `mutable`, but how do we feel about typing the spin lock as `mutable` and keep `is_safe()` and `method*()` const. I like this a lot! Dropping `const` just to satisfy spin lock (an implementation detail) felt really awkward. New version uses `mutable`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24018#issuecomment-2910465166 From epeter at openjdk.org Mon May 26 19:00:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 May 2025 19:00:07 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v22] In-Reply-To: References: Message-ID: On Wed, 21 May 2025 07:42:47 GMT, Roland Westrelin wrote: >> It could be named `ShortRunningLongLoopIterationLimit`. > > See https://github.com/openjdk/jdk/pull/21630#issuecomment-2896932626 > I don't think this makes sense. As long as we can avoid the loop nest, that should be beneficial. There's no benefit to the loop nest but it can be required for correctness. So I don't expect we want to tune anything. It could be helpful if you say what we still do: - do we still convert to int-loop? - Do we still strip-mine? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2107744719 From epeter at openjdk.org Mon May 26 19:00:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 May 2025 19:00:07 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v32] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 18:50:14 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/c2_globals.hpp line 860: > >> 858: develop(bool, StressShortRunningLongLoop, false, \ >> 859: "Speculate all long counted loops are short running when bounds " \ >> 860: "are unknown even if profile data doesn't say so.") \ > > Why only when bounds are unknown? I would like to see at least a hello-world test where you use this flag, to make sure it does not completely rot ;) > src/hotspot/share/opto/castnode.cpp line 327: > >> 325: } >> 326: >> 327: bool CastLLNode::inner_loop_backedge(Node* proj) { > > Suggestion: > > bool CastLLNode::is_inner_loop_backedge(Node* proj) { > > Optional. It would help me know that it is just a check. Otherwise, I wonder if we might "make" the inner loop backedge. Also: the only use is with input from `ProjNode* proj_out_or_null`, so why not require `ProjNode*` as input here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2107746389 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2107750111 From epeter at openjdk.org Mon May 26 19:00:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 May 2025 19:00:08 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v32] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 18:54:56 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/castnode.cpp line 327: >> >>> 325: } >>> 326: >>> 327: bool CastLLNode::inner_loop_backedge(Node* proj) { >> >> Suggestion: >> >> bool CastLLNode::is_inner_loop_backedge(Node* proj) { >> >> Optional. It would help me know that it is just a check. Otherwise, I wonder if we might "make" the inner loop backedge. > > Also: the only use is with input from `ProjNode* proj_out_or_null`, so why not require `ProjNode*` as input here? It would make the pattern a bit more obvious. Otherwise, I might wonder why we are not checking if the proj is really a proj ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2107750540 From epeter at openjdk.org Mon May 26 19:27:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 May 2025 19:27:01 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v32] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 08:20:50 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Will come back to this tomorrow :) src/hotspot/share/opto/castnode.cpp line 370: > 368: return true; > 369: } > 370: } else if (cmp_or_sub->Opcode() == Op_SubI && cmp_or_sub->in(1)->find_int_con(-1) == 0) { What is this pattern? It could be helpful if there was some single line that summarizes the pattern: `cmp(.., sub(0, convl2i(CastLL)))` Does this pattern have a name / purpose we could state here? src/hotspot/share/opto/loopTransform.cpp line 130: > 128: julong uinit_con = init_con; > 129: jlong limit_con = (stride_con > 0) ? limit_type->hi_as_long() : limit_type->lo_as_long(); > 130: julong ulimit_con = limit_con; The comment is not really reassuring me here. We are possibly dealing with long values ... could they not overflow? Plus: why are we using unsigned values here? They do immediately overflow if the input values are negative, no? But I guess at least overflow is well defined / not UB? src/hotspot/share/opto/loopTransform.cpp line 138: > 136: } else if (stride_con < 0 && limit_con < init_con) { > 137: udiff = uinit_con - ulimit_con; > 138: } Can you give an argument why calculating things like this is always correct? Could there not be any overflow in the long range here? src/hotspot/share/opto/loopnode.cpp line 893: > 891: if (short_running_loop(loop, stride_con, range_checks, iters_limit)) { > 892: C->set_major_progress(); > 893: return true; Is this a `is_short_running_loop` or a `do/optimize_short_running_loop`? ------------- PR Review: https://git.openjdk.org/jdk/pull/21630#pullrequestreview-2869032955 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2107761420 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2107778368 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2107780496 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2107784585 From epeter at openjdk.org Mon May 26 19:27:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 26 May 2025 19:27:02 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v32] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 18:56:38 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/castnode.cpp line 339: > >> 337: } >> 338: >> 339: bool CastLLNode::cmp_used_at_inner_loop_exit_test(Node* cmp) { > > Assert that `cmp` is really a `CmpNode`? Or test for `Cmp`, or make it a `CmpNode` :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2107755593 From sparasa at openjdk.org Mon May 26 22:59:55 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Mon, 26 May 2025 22:59:55 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v23] In-Reply-To: References: <6aZaHfVvUJFLz83fyZ42bnoSGseaRBYd0jEg_VLdS2Q=.4c681def-ee7c-4fcd-b147-348d317ac58f@github.com> <1e-92EcDWshsTiFbEmJt8z5SAVfhf5vpr8sgbEq3BbQ=.25d6d5f7-48d3-4a13-ac7d-8844844490fa@github.com> Message-ID: <-kceEhIMg1R1fVYefLJ14cu5NeIRt2a_ZPw82ABwci8=.4dd111fc-bbf9-42dd-a17b-a3572a8c598d@github.com> On Fri, 23 May 2025 15:33:48 GMT, Srinivas Vamsi Parasa wrote: >> Hi Jatin (@jatin-bhateja), >> >> Incorporated the changes suggested for cpu_family and is_P6_or_later() and other minor changes. Please let me know if everything looks good. >> >> Thanks, >> Vamsi > >> @vamsi-parasa Testing launched, ping me again in 24h :) > > Thanks Emanuel (@eme64)! Please let me know if there're are any issues with the tests. > @vamsi-parasa Testing looked good, though now you pushed some more changes. I'd like to run tests one more time before integration. Please let me know when you are ready :) Hi Emanuel (@eme64), Thanks for the update! The new changes got approved and are ready for testing. Could you please launch the tests? Thanks, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2910733436 From fjiang at openjdk.org Tue May 27 01:12:44 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 27 May 2025 01:12:44 GMT Subject: RFR: 8357460: RISC-V: Optimize array fill stub for small size [v4] In-Reply-To: References: Message-ID: <0ehBjsyy-UF4Jlq917aFn8R187F61h4_Rdzg3-0vMx0=.a75868ec-957d-44c2-808f-78accf80a877@github.com> > Please consider. > As discussed in https://github.com/openjdk/jdk/pull/23890#discussion_r2094920943, we can also further optimize the array fill stub by unrolling the storage of values when the size is less than 8. > > This PR also removes the **aligned tail part** with the consideration of code size and testing coverage. As the test reveals there are no significant regressions. > > > Before: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 7 avgt 12 27.215 ? 0.073 ns/op > ArrayFill.fillByteArray 15 avgt 12 32.687 ? 0.904 ns/op > ArrayFill.fillIntArray 7 avgt 12 28.629 ? 0.006 ns/op > ArrayFill.fillIntArray 15 avgt 12 29.351 ? 0.009 ns/op > ArrayFill.fillShortArray 7 avgt 12 30.776 ? 0.006 ns/op > ArrayFill.fillShortArray 15 avgt 12 31.724 ? 0.447 ns/op > ArrayFill.zeroByteArray 7 avgt 12 27.199 ? 0.006 ns/op > ArrayFill.zeroByteArray 15 avgt 12 32.685 ? 0.900 ns/op > ArrayFill.zeroIntArray 7 avgt 12 28.630 ? 0.007 ns/op > ArrayFill.zeroIntArray 15 avgt 12 29.352 ? 0.011 ns/op > ArrayFill.zeroShortArray 7 avgt 12 30.776 ? 0.006 ns/op > ArrayFill.zeroShortArray 15 avgt 12 31.497 ? 0.012 ns/op > > After: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 7 avgt 12 20.137 ? 0.042 ns/op > ArrayFill.fillByteArray 15 avgt 12 32.928 ? 0.004 ns/op > ArrayFill.fillIntArray 7 avgt 12 28.630 ? 0.004 ns/op > ArrayFill.fillIntArray 15 avgt 12 29.344 ? 0.005 ns/op > ArrayFill.fillShortArray 7 avgt 12 31.494 ? 0.004 ns/op > ArrayFill.fillShortArray 15 avgt 12 31.492 ? 0.008 ns/op > ArrayFill.zeroByteArray 7 avgt 12 19.980 ? 0.164 ns/op > ArrayFill.zeroByteArray 15 avgt 12 32.927 ? 0.004 ns/op > ArrayFill.zeroIntArray 7 avgt 12 28.629 ? 0.005 ns/op > ArrayFill.zeroIntArray 15 avgt 12 29.346 ? 0.006 ns/op > ArrayFill.zeroShortArray 7 avgt 12 32.193 ? 0.027 ns/op > ArrayFill.zeroShortArray 15 avgt 12 31.495 ? 0.010 ns/op > > > Testing: > - [x] tier1 Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'openjdk:master' into riscv-optimize-generate-fill - Merge branch 'openjdk:master' into riscv-optimize-generate-fill - Merge branch 'openjdk:master' into riscv-optimize-generate-fill - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-optimize-generate-fill - optimize array fill stub for small size ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25350/files - new: https://git.openjdk.org/jdk/pull/25350/files/54bab0bb..39e98e7c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25350&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25350&range=02-03 Stats: 11444 lines in 229 files changed: 7461 ins; 3010 del; 973 mod Patch: https://git.openjdk.org/jdk/pull/25350.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25350/head:pull/25350 PR: https://git.openjdk.org/jdk/pull/25350 From jkarthikeyan at openjdk.org Tue May 27 02:07:59 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 27 May 2025 02:07:59 GMT Subject: RFR: 8355512: Test compiler/vectorization/TestVectorZeroCount.java times out with -XX:TieredStopAtLevel=3 In-Reply-To: References: <_ZX5WU8JRDjnkJg7YiE29_Pfzp6LhgpWxKBAfo9rKBE=.a7fc5946-b082-43f3-9ba6-e256304e43e3@github.com> Message-ID: On Mon, 26 May 2025 17:18:13 GMT, Emanuel Peter wrote: >> @eme64 Any results with testing? :) > > @jaskarth All :green_circle: , ship it! ? Thanks for the testing @eme64, an thank you for the review @chhagedorn! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25243#issuecomment-2910900697 From jkarthikeyan at openjdk.org Tue May 27 02:07:59 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 27 May 2025 02:07:59 GMT Subject: Integrated: 8355512: Test compiler/vectorization/TestVectorZeroCount.java times out with -XX:TieredStopAtLevel=3 In-Reply-To: References: Message-ID: On Thu, 15 May 2025 02:29:11 GMT, Jasmine Karthikeyan wrote: > Hi all, > This is a small patch to TestVectorZeroCount to make it only execute when C2 is enabled, to fix a timeout with -XX:TieredStopAtLevel=3. This test takes a long time to finish without C2 because it iterates through all of the integers twice. Since the intention of the test is to stress the C2-specific `numberOfLeadingZeros` and `numberOfTrailingZeros` intrinsics, I think it makes sense to limit it to running with C2 only. > > Reviews would be appreciated! This pull request has now been integrated. Changeset: 37d04a1e Author: Jasmine Karthikeyan URL: https://git.openjdk.org/jdk/commit/37d04a1e365d005afec3651c5e25fdceeceb9313 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8355512: Test compiler/vectorization/TestVectorZeroCount.java times out with -XX:TieredStopAtLevel=3 Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/25243 From fjiang at openjdk.org Tue May 27 03:41:59 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 27 May 2025 03:41:59 GMT Subject: RFR: 8357460: RISC-V: Optimize array fill stub for small size [v4] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 00:44:26 GMT, Fei Yang wrote: >> Feilong Jiang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into riscv-optimize-generate-fill >> - Merge branch 'openjdk:master' into riscv-optimize-generate-fill >> - Merge branch 'openjdk:master' into riscv-optimize-generate-fill >> - Merge branch 'master' of https://github.com/openjdk/jdk into riscv-optimize-generate-fill >> - optimize array fill stub for small size > > Looks good. Thanks. @RealFYang @Anjian-Wen Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25350#issuecomment-2911002676 From fjiang at openjdk.org Tue May 27 03:42:00 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 27 May 2025 03:42:00 GMT Subject: Integrated: 8357460: RISC-V: Optimize array fill stub for small size In-Reply-To: References: Message-ID: On Wed, 21 May 2025 11:40:37 GMT, Feilong Jiang wrote: > Please consider. > As discussed in https://github.com/openjdk/jdk/pull/23890#discussion_r2094920943, we can also further optimize the array fill stub by unrolling the storage of values when the size is less than 8. > > This PR also removes the **aligned tail part** with the consideration of code size and testing coverage. As the test reveals there are no significant regressions. > > > Before: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 7 avgt 12 27.215 ? 0.073 ns/op > ArrayFill.fillByteArray 15 avgt 12 32.687 ? 0.904 ns/op > ArrayFill.fillIntArray 7 avgt 12 28.629 ? 0.006 ns/op > ArrayFill.fillIntArray 15 avgt 12 29.351 ? 0.009 ns/op > ArrayFill.fillShortArray 7 avgt 12 30.776 ? 0.006 ns/op > ArrayFill.fillShortArray 15 avgt 12 31.724 ? 0.447 ns/op > ArrayFill.zeroByteArray 7 avgt 12 27.199 ? 0.006 ns/op > ArrayFill.zeroByteArray 15 avgt 12 32.685 ? 0.900 ns/op > ArrayFill.zeroIntArray 7 avgt 12 28.630 ? 0.007 ns/op > ArrayFill.zeroIntArray 15 avgt 12 29.352 ? 0.011 ns/op > ArrayFill.zeroShortArray 7 avgt 12 30.776 ? 0.006 ns/op > ArrayFill.zeroShortArray 15 avgt 12 31.497 ? 0.012 ns/op > > After: > Benchmark (size) Mode Cnt Score Error Units > ArrayFill.fillByteArray 7 avgt 12 20.137 ? 0.042 ns/op > ArrayFill.fillByteArray 15 avgt 12 32.928 ? 0.004 ns/op > ArrayFill.fillIntArray 7 avgt 12 28.630 ? 0.004 ns/op > ArrayFill.fillIntArray 15 avgt 12 29.344 ? 0.005 ns/op > ArrayFill.fillShortArray 7 avgt 12 31.494 ? 0.004 ns/op > ArrayFill.fillShortArray 15 avgt 12 31.492 ? 0.008 ns/op > ArrayFill.zeroByteArray 7 avgt 12 19.980 ? 0.164 ns/op > ArrayFill.zeroByteArray 15 avgt 12 32.927 ? 0.004 ns/op > ArrayFill.zeroIntArray 7 avgt 12 28.629 ? 0.005 ns/op > ArrayFill.zeroIntArray 15 avgt 12 29.346 ? 0.006 ns/op > ArrayFill.zeroShortArray 7 avgt 12 32.193 ? 0.027 ns/op > ArrayFill.zeroShortArray 15 avgt 12 31.495 ? 0.010 ns/op > > > Testing: > - [x] tier1 This pull request has now been integrated. Changeset: 78d0dc75 Author: Feilong Jiang URL: https://git.openjdk.org/jdk/commit/78d0dc75029dba7b4ba388f9a7f5f7b22e4b838e Stats: 61 lines in 1 file changed: 9 ins; 27 del; 25 mod 8357460: RISC-V: Optimize array fill stub for small size Reviewed-by: wenanjian, fyang ------------- PR: https://git.openjdk.org/jdk/pull/25350 From fyang at openjdk.org Tue May 27 03:55:00 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 27 May 2025 03:55:00 GMT Subject: RFR: 8357695: RISC-V: Move vector intrinsic condition checks into match_rule_supported_vector In-Reply-To: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> References: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> Message-ID: On Mon, 26 May 2025 02:52:01 GMT, Dingli Zhang wrote: > Hi all, > Please take a look and review this PR, thanks! > > Currently, the match_rule_supported function in riscv.ad contains checks for vector-related intrinsics (e.g., FmaVF, FmaVD, RoundVF, RoundVD). These checks can be centralized into the match_rule_supported_vector function in the riscv_v.ad file, ensuring consistent handling in their appropriate context. > > ### Testing > * [x] Linux riscv64 server release build on SG2042 src/hotspot/cpu/riscv/riscv_v.ad line 127: > 125: case Op_FmaVF: > 126: case Op_FmaVD: > 127: return UseRVV && UseFMA; Since `UseRVV` flag has already been checked on function entry at L58, the `UseRVV` check here could be removed. Similar for case `Op_RoundVF` and case `Op_RoundVD`. src/hotspot/cpu/riscv/riscv_v.ad line 132: > 130: // regression when MaxVectorSize == 16. So only enable the intrinsic when MaxVectorSize >= 32. > 131: case Op_RoundVF: > 132: return UseRVV && MaxVectorSize >= 32; Could we use the input vector length `vlen` to do the check? Maybe `return vlen >= 8;` will do. src/hotspot/cpu/riscv/riscv_v.ad line 139: > 137: // regression for double when MaxVectorSize == 64+. So only enable the intrinsic when MaxVectorSize >= 64. > 138: case Op_RoundVD: > 139: return UseRVV && MaxVectorSize >= 64; Simiar here. Maybe `return vlen >= 8;` will do. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25438#discussion_r2108101750 PR Review Comment: https://git.openjdk.org/jdk/pull/25438#discussion_r2108103657 PR Review Comment: https://git.openjdk.org/jdk/pull/25438#discussion_r2108104091 From dzhang at openjdk.org Tue May 27 04:15:50 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 27 May 2025 04:15:50 GMT Subject: RFR: 8357695: RISC-V: Move vector intrinsic condition checks into match_rule_supported_vector [v2] In-Reply-To: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> References: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> Message-ID: > Hi all, > Please take a look and review this PR, thanks! > > Currently, the match_rule_supported function in riscv.ad contains checks for vector-related intrinsics (e.g., FmaVF, FmaVD, RoundVF, RoundVD). These checks can be centralized into the match_rule_supported_vector function in the riscv_v.ad file, ensuring consistent handling in their appropriate context. > > ### Testing > * [x] Linux riscv64 server release build on SG2042 Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: Remove redundant UseRVV and use vlen as the condition ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25438/files - new: https://git.openjdk.org/jdk/pull/25438/files/dba658b5..8256e0d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25438&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25438&range=00-01 Stats: 9 lines in 1 file changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/25438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25438/head:pull/25438 PR: https://git.openjdk.org/jdk/pull/25438 From dzhang at openjdk.org Tue May 27 04:27:45 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 27 May 2025 04:27:45 GMT Subject: RFR: 8357695: RISC-V: Move vector intrinsic condition checks into match_rule_supported_vector [v3] In-Reply-To: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> References: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> Message-ID: > Hi all, > Please take a look and review this PR, thanks! > > Currently, the match_rule_supported function in riscv.ad contains checks for vector-related intrinsics (e.g., FmaVF, FmaVD, RoundVF, RoundVD). These checks can be centralized into the match_rule_supported_vector function in the riscv_v.ad file, ensuring consistent handling in their appropriate context. > > ### Testing > * [x] Linux riscv64 server release build on SG2042 Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: Merge Op_RoundVF and Op_RoundVD ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25438/files - new: https://git.openjdk.org/jdk/pull/25438/files/8256e0d7..2ee859b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25438&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25438&range=01-02 Stats: 5 lines in 1 file changed: 1 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25438/head:pull/25438 PR: https://git.openjdk.org/jdk/pull/25438 From dzhang at openjdk.org Tue May 27 04:27:46 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 27 May 2025 04:27:46 GMT Subject: RFR: 8357695: RISC-V: Move vector intrinsic condition checks into match_rule_supported_vector [v3] In-Reply-To: References: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> Message-ID: On Tue, 27 May 2025 03:48:58 GMT, Fei Yang wrote: >> Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: >> >> Merge Op_RoundVF and Op_RoundVD > > src/hotspot/cpu/riscv/riscv_v.ad line 127: > >> 125: case Op_FmaVF: >> 126: case Op_FmaVD: >> 127: return UseRVV && UseFMA; > > Since `UseRVV` flag has already been checked on function entry at L58, the `UseRVV` check here could be removed. > Similar for case `Op_RoundVF` and case `Op_RoundVD`. Thanks for review! Fixed. > src/hotspot/cpu/riscv/riscv_v.ad line 132: > >> 130: // regression when MaxVectorSize == 16. So only enable the intrinsic when MaxVectorSize >= 32. >> 131: case Op_RoundVF: >> 132: return UseRVV && MaxVectorSize >= 32; > > Could we use the input vector length `vlen` to do the check? Maybe `return vlen >= 8;` will do. Fixed. > src/hotspot/cpu/riscv/riscv_v.ad line 139: > >> 137: // regression for double when MaxVectorSize == 64+. So only enable the intrinsic when MaxVectorSize >= 64. >> 138: case Op_RoundVD: >> 139: return UseRVV && MaxVectorSize >= 64; > > Simiar here. Maybe `return vlen >= 8;` will do. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25438#discussion_r2108131681 PR Review Comment: https://git.openjdk.org/jdk/pull/25438#discussion_r2108131728 PR Review Comment: https://git.openjdk.org/jdk/pull/25438#discussion_r2108131755 From epeter at openjdk.org Tue May 27 06:45:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 May 2025 06:45:58 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v23] In-Reply-To: <-kceEhIMg1R1fVYefLJ14cu5NeIRt2a_ZPw82ABwci8=.4dd111fc-bbf9-42dd-a17b-a3572a8c598d@github.com> References: <6aZaHfVvUJFLz83fyZ42bnoSGseaRBYd0jEg_VLdS2Q=.4c681def-ee7c-4fcd-b147-348d317ac58f@github.com> <1e-92EcDWshsTiFbEmJt8z5SAVfhf5vpr8sgbEq3BbQ=.25d6d5f7-48d3-4a13-ac7d-8844844490fa@github.com> <-kceEhIMg1R1fVYefLJ14cu5NeIRt2a_ZPw82ABwci8=.4dd111fc-bbf9-42dd-a17b-a3572a8c598d@github.com> Message-ID: On Mon, 26 May 2025 22:57:11 GMT, Srinivas Vamsi Parasa wrote: >>> @vamsi-parasa Testing launched, ping me again in 24h :) >> >> Thanks Emanuel (@eme64)! Please let me know if there're are any issues with the tests. > >> @vamsi-parasa Testing looked good, though now you pushed some more changes. I'd like to run tests one more time before integration. Please let me know when you are ready :) > > Hi Emanuel (@eme64), > > Thanks for the update! The new changes got approved and are ready for testing. > Could you please launch the tests? > > Thanks, > Vamsi @vamsi-parasa Launched! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2911305988 From dzhang at openjdk.org Tue May 27 07:06:45 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Tue, 27 May 2025 07:06:45 GMT Subject: RFR: 8357695: RISC-V: Move vector intrinsic condition checks into match_rule_supported_vector [v4] In-Reply-To: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> References: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> Message-ID: > Hi all, > Please take a look and review this PR, thanks! > > Currently, the match_rule_supported function in riscv.ad contains checks for vector-related intrinsics (e.g., FmaVF, FmaVD, RoundVF, RoundVD). These checks can be centralized into the match_rule_supported_vector function in the riscv_v.ad file, ensuring consistent handling in their appropriate context. > > ### Testing > * [x] Linux riscv64 server release build on SG2042 Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: Fix typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25438/files - new: https://git.openjdk.org/jdk/pull/25438/files/2ee859b5..7a0e10df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25438&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25438&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25438/head:pull/25438 PR: https://git.openjdk.org/jdk/pull/25438 From rcastanedalo at openjdk.org Tue May 27 07:31:53 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 May 2025 07:31:53 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill [v4] In-Reply-To: References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Mon, 26 May 2025 12:56:24 GMT, Jatin Bhateja wrote: >> Patch spills APX EGPRs across runtime calls to slow-path barriers using PUSH2P/POP2 instructions with PPX hints. >> These instructions operate over a pair of registers resulting into an smaller save/restoration JIT code, on the hind side they have hard alignment and balancing constraints, as they operate over 16-byte aligned stack address. >> ZRuntimeCallSpill is agnostic to live register, thus resulting SPILL sequence should not modify the contents of the register. >> >> Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Axel's comments incorporated Test results look good! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25351#pullrequestreview-2869935527 From mchevalier at openjdk.org Tue May 27 07:39:37 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 27 May 2025 07:39:37 GMT Subject: RFR: 8357781: Deep recursion in PhaseCFG::set_next_call leads to stack overflow Message-ID: There is nothing very wrong here: - the graph is not broken - the algorithm is correct It just happens that the graph is very deep, it has a very long, narrow chain of nodes because of crazy unrolling, because of `LoopUnrollLimit=8192`. This depth simply makes the algorithm recurse deeper than the stack size allows. This kind of graph shape is not quite trivial to reproduce. The proposed reproducer is very easy to change into a non-reproducer, with many kinds of change, even that seem harmless to me. The fix is also pretty direct: let's change the recursive traversal, with a worklist-based iterative one. It's not as elegant, but it doesn't overflow. How sad stacks are still so bounded... ------------- Commit messages: - Make a non-recursive PhaseCFG::set_next_call for insanely deep graphs Changes: https://git.openjdk.org/jdk/pull/25448/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25448&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357781 Stats: 83 lines in 3 files changed: 75 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25448.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25448/head:pull/25448 PR: https://git.openjdk.org/jdk/pull/25448 From rcastanedalo at openjdk.org Tue May 27 07:46:43 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 May 2025 07:46:43 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v6] In-Reply-To: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: - Include address mode test in 'legitimize_address' - Excluded IR checks for testLoadVolatile on PPC64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25066/files - new: https://git.openjdk.org/jdk/pull/25066/files/b92500a2..cf4f3b30 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25066&range=04-05 Stats: 12 lines in 2 files changed: 3 ins; 2 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/25066.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25066/head:pull/25066 PR: https://git.openjdk.org/jdk/pull/25066 From rcastanedalo at openjdk.org Tue May 27 07:46:44 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 May 2025 07:46:44 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Fri, 16 May 2025 08:33:38 GMT, Axel Boldt-Christmas wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with one additional commit since the last revision: >> >> Replace control type with PhaseCFG::is_CFG test > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 141: > >> 139: Address legitimize_address(const Address &a, int size, Register scratch) { >> 140: if (a.getMode() == Address::base_plus_offset) { >> 141: if (legitimize_address_requires_lea(a, size)) { > > It is a little strange that `legitimize_address_requires_lea` is only the second condition and not > > return a.getMode() == Address::base_plus_offset && !Address::offset_ok_for_immed(a.offset(), exact_log2(size)); > > > And have the check in `legitimize_address` simply be `if (legitimize_address_requires_lea(a, size))` > > I guess we never end up calling `legitimize_address_requires_lea` with a literal address, where it would assert in `a.offset()`. But requiring the Address parameter of legitimize_address_requires_lea to be in a specific mode as a precondition seems weird to me. Thanks @xmas92, I fully agree, done (commit cf4f3b30). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2108444241 From rcastanedalo at openjdk.org Tue May 27 07:49:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 May 2025 07:49:54 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: <7A1m0eMpB5HCjbOZSGwjngpdkfTWMdTu_YjsqNif-Gk=.4d3d2d29-013e-4721-897e-d8e9f81f786d@github.com> On Fri, 16 May 2025 09:33:11 GMT, Martin Doerr wrote: > I guess it's not worth stepping over the memory barrier. Disabling this rule for PPC64 should be ok, too. Thanks for testing and reporting @TheRealMDoerr, I agree that it would be too much complexity for little return. I disabled the rule for PPC64 (commit fdf34f90). Please let me know if that works as expected. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2911516324 From rcastanedalo at openjdk.org Tue May 27 07:49:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 May 2025 07:49:54 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: <7A1m0eMpB5HCjbOZSGwjngpdkfTWMdTu_YjsqNif-Gk=.4d3d2d29-013e-4721-897e-d8e9f81f786d@github.com> References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> <7A1m0eMpB5HCjbOZSGwjngpdkfTWMdTu_YjsqNif-Gk=.4d3d2d29-013e-4721-897e-d8e9f81f786d@github.com> Message-ID: On Tue, 27 May 2025 07:46:22 GMT, Roberto Casta?eda Lozano wrote: >> Thanks for implementing it and thanks for the ping. It basically works on PPC64, but one IR rule is failing: >> >> Failed IR Rules (1) of Methods (1) >> ---------------------------------- >> 1) Method "static java.lang.Object compiler.gcbarriers.TestImplicitNullChecks.testLoadVolatile(compiler.gcbarriers.TestImplicitNullChecks$OuterWithVolatileField)" - [Failed IR rules: 1]: >> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={FINAL_CODE}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#NULL_CHECK#_", "1"}, applyIfPlatformOr={}, applyIfPlatform={"aarch64", "false"}, failOn={}, applyIfOr={"UseZGC", "true", "UseG1GC", "true"}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" >> > Phase "Final Code": >> - counts: Graph contains wrong number of nodes: >> * Constraint 1: "(\d+(\s){2}(NullCheck.*)+(\s){2}===.*)" >> - Failed comparison: [found] 0 = 1 [given] >> - No nodes matched! >> >> >> This is probably because PPC64 uses a membar_volatile before volatile load, so the graph looks differently: >> >> 33 Prolog === [[ ]] [2380000000033] >> 9 MachProj === 10 [[ 8 ]] #0/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) >> R3 11 MachProj === 10 [[ 8 26 ]] #5 Oop:compiler/gcbarriers/TestImplicitNullChecks$OuterWithVolatileField * !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) >> 12 MachProj === 10 [[ 4 17 ]] #1/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) >> 13 MachProj === 10 [[ 4 21 ]] #2/unmatched Memory: @BotPTR *+bot, idx=Bot; !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) >> R1 14 MachProj === 10 [[ 4 2 17 ]] #3 !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) >> 15 MachProj === 10 [[ 4 17 ]] #4 !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:-1 (line 85) >> 0 Con === 10 [[ ]] #top >> 8 zeroCheckP_reg_imm0 === 9 11 [[ 7 22 ]] P=0.000001, C=-1.000000 !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) >> >> BB#002: >> 31 Region === 31 22 [[ 31 21 26 ]] >> 21 membar_volatile === 31 0 13 0 0 [[ 20 23 ]] !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) >> 20 MachProj === 21 [[ 19 ]] #0/unmatched !jvms: TestImplicitNullChecks::testLoadVolatile @ bci:1 (line 85) >> 23 MachProj === 21 ... > >> I guess it's not worth stepping over the memory barrier. Disabling this rule for PPC64 should be ok, too. > > Thanks for testing and reporting @TheRealMDoerr, I agree that it would be too much complexity for little return. I disabled the rule for PPC64 (commit fdf34f90). Please let me know if that works as expected. > @robcasloz : Hi, Thanks for the ping! I performed tier1-3 tests on linux-riscv64 platform, result is good. The new test `test/hotspot/jtreg/compiler/gcbarriers/TestImplicitNullChecks.java` also pass when running with G1 and ZGC using fastdebug build. @RealFYang Thanks for testing and reporting! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2911518019 From epeter at openjdk.org Tue May 27 07:57:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 May 2025 07:57:07 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v32] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 08:20:50 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Nice work, @rwestrel ! And quite an impressive set of tests as well :) We should definitively hold off until JDK26 though, so we have enough time to fix possible follow-ups. src/hotspot/share/opto/loopTransform.cpp line 144: > 142: if (utrip_count * ABS(stride_con) != udiff) { > 143: // Guaranteed to not overflow because it can only happen for stride > 1 in which case, utrip_count can't be > 144: // max_juint I'm struggling a little to know what you are concerned could overflow here. The increment below? Would it overflow the long range or int range? And what exactly is the connection between `stride > 1` and `utrip_count` not being `max_juint`? Should it maybe be `ABS(stride) > 1`, or what exactly happens with negative `stride`? src/hotspot/share/opto/loopnode.cpp line 1096: > 1094: if (ShortRunningLongLoop) { > 1095: add_parse_predicate(Deoptimization::Reason_short_running_long_loop, inner_head, outer_ilt, cloned_sfpt); > 1096: } What happens if there are multiple loops in a method, and one traps because it is longer than expected? Do we then also re-compile the other loop without the new predicate? src/hotspot/share/opto/loopnode.cpp line 1138: > 1136: return _phase->is_member(_ilt, _phase->get_ctrl(node)); > 1137: } > 1138: }; This feels pretty general, and does not seem to have anything specifically to do with short-loop-body. Or does it? I wonder if we should name it more generically, so it could be reused for other purposes later? src/hotspot/share/opto/loopnode.cpp line 1141: > 1139: > 1140: // Make a copy of Parse/Template Assertion predicates below existing predicates at the loop passed as argument > 1141: class CloneShortLoopPredicatesVisitor : public PredicateVisitor { Suggestion: class CloneShortLoopPredicateVisitor : public PredicateVisitor { Nit: if you inherit from a `PredicateVisitor` without the `s`, you probably don't want to introduce the `s`. src/hotspot/share/opto/loopnode.cpp line 1178: > 1176: // - CountedLoop: Can be reused. > 1177: bool PhaseIdealLoop::short_running_loop(IdealLoopTree* loop, jint stride_con, const Node_List &range_checks, > 1178: uint iters_limit) { Are only long-loops allowed, or can there be int loops here too? You talk about long range checks, which makes me believe this is about long loops. But then below you check for the `bt` of the head, so could that be int? If it is about longs only: it would be nice to have an assert, and I would also rename the method to include the `long`. You may also want to say what the return `bool` means (success, the loop was indeed a short running (long?) loop). src/hotspot/share/opto/loopnode.cpp line 1188: > 1186: loop->compute_trip_count(this, bt); > 1187: // Loop must run for no more than iter_limits as it guarantees no overflow of scale * iv in long range checks. > 1188: // iters_limit / ABS(stride_con) is the largest trip count for which we know it's correct to not create a loop nest: Can you point to somewhere that argues this in more detail? Especially the thing about overflows sounds important. src/hotspot/share/opto/loopnode.cpp line 1235: > 1233: // Template Assertion Predicates > 1234: // | > 1235: // Loop What do you mean by `future predicates`? Where would they be in the ACSII art? src/hotspot/share/opto/loopnode.cpp line 1255: > 1253: assert(short_running_loop_predicate_proj->in(0)->is_ParsePredicate(), "must be parse predicate"); > 1254: > 1255: jlong limit_long = iters_limit; Suggestion: const jlong iters_limit_long = iters_limit; src/hotspot/share/opto/loopnode.cpp line 1278: > 1276: #endif > 1277: entry_control = head->skip_strip_mined()->in(LoopNode::EntryControl); > 1278: } else if (bt == T_LONG) { Suggestion: } else if (bt == T_LONG) { assert(known_short_running_loop, "follows from earlier bailout check."); I suppose that is what you mean by `won't need loop limit checks (iters_limit guarantees that)`, right? src/hotspot/share/opto/loopnode.cpp line 1279: > 1277: entry_control = head->skip_strip_mined()->in(LoopNode::EntryControl); > 1278: } else if (bt == T_LONG) { > 1279: // We're turning a long counted loop into a regular loop that will be converted into an int count loop. That loop Suggestion: // We're turning a long counted loop into a regular loop that will be converted into an int counted loop. That loop src/hotspot/share/opto/loopnode.cpp line 1289: > 1287: ConstraintCastNode::UnconditionalDependency, bt); > 1288: register_new_node(new_limit, predicates.entry()); > 1289: } What happens in the "silent" else case? I suppose that is the `bt == T_INT` case with `known_short_running_loop`. Can you at least add a comment, and maybe some asserts here, so the reader does not have to wonder if this was forgotten? ;) src/hotspot/share/opto/loopnode.cpp line 1293: > 1291: > 1292: if (bt == T_LONG) { > 1293: new_limit = new ConvL2INode(new_limit); Can you add a quick comment, about why this does not overflow? src/hotspot/share/opto/predicates.hpp line 83: > 81: * int counted loops with long range checks for which a loop nest also needs to be created > 82: * in the general case (so the transformation of long range checks to int range checks is > 83: * legal). `Short Short`? test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 60: > 58: /* > 59: * @test id=byte-array-AlignVector-NoShortRunningLongLoop > 60: * @bug 8329273 8348263 Suggestion: * @bug 8329273 8342692 * @summary Test vectorization of loops over MemorySegment * @library /test/lib / * @run driver compiler.loopopts.superword.TestMemorySegment ByteArray NoShortRunningLongLoop */ /* * @test id=byte-array-AlignVector-NoShortRunningLongLoop * @bug 8329273 8348263 8342692 ------------- PR Review: https://git.openjdk.org/jdk/pull/21630#pullrequestreview-2869815948 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2108331756 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2108338966 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2108347849 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2108351142 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2108372147 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2108380839 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2108402122 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2108427308 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2108420226 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2108420494 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2108422688 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2108431813 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2108452432 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2108461752 From epeter at openjdk.org Tue May 27 07:57:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 May 2025 07:57:07 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v28] In-Reply-To: <7r3C8BAViyHKVVJjv4w0YxfIUkfk9PmY0OEt73V_aRI=.baf51fc4-d996-44d0-a1f5-10cf6dc4de8d@github.com> References: <7r3C8BAViyHKVVJjv4w0YxfIUkfk9PmY0OEt73V_aRI=.baf51fc4-d996-44d0-a1f5-10cf6dc4de8d@github.com> Message-ID: On Mon, 26 May 2025 07:54:28 GMT, Roland Westrelin wrote: >> A few minor last comments but otherwise, it looks good to me! Thanks for all the updates, the patience and the credit! > >> A few minor last comments but otherwise, it looks good to me! Thanks for all the updates, the patience and the credit! > > Thanks for the careful review. I applied your suggestions. @rwestrel Let me know if you want us to run some extra testing. Christian said that you might be planning to wait until the JDK26 fork, and merge then, and then we can run testing. Up to you :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2911537944 From epeter at openjdk.org Tue May 27 07:57:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 May 2025 07:57:08 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v32] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 07:11:12 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/loopnode.cpp line 1178: > >> 1176: // - CountedLoop: Can be reused. >> 1177: bool PhaseIdealLoop::short_running_loop(IdealLoopTree* loop, jint stride_con, const Node_List &range_checks, >> 1178: uint iters_limit) { > > Are only long-loops allowed, or can there be int loops here too? > You talk about long range checks, which makes me believe this is about long loops. But then below you check for the `bt` of the head, so could that be int? > > If it is about longs only: it would be nice to have an assert, and I would also rename the method to include the `long`. > > You may also want to say what the return `bool` means (success, the loop was indeed a short running (long?) loop). Hmm, or does the `int` `bt` come from `StressLongCountedLoop`, which complicates things here? > src/hotspot/share/opto/loopnode.cpp line 1188: > >> 1186: loop->compute_trip_count(this, bt); >> 1187: // Loop must run for no more than iter_limits as it guarantees no overflow of scale * iv in long range checks. >> 1188: // iters_limit / ABS(stride_con) is the largest trip count for which we know it's correct to not create a loop nest: > > Can you point to somewhere that argues this in more detail? Especially the thing about overflows sounds important. Where are the assumptions about what is needed for correctness coming from? > src/hotspot/share/opto/loopnode.cpp line 1255: > >> 1253: assert(short_running_loop_predicate_proj->in(0)->is_ParsePredicate(), "must be parse predicate"); >> 1254: >> 1255: jlong limit_long = iters_limit; > > Suggestion: > > const jlong iters_limit_long = iters_limit; To keep name consistency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2108393380 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2108383543 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2108428465 From epeter at openjdk.org Tue May 27 07:57:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 May 2025 07:57:08 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v32] In-Reply-To: References: Message-ID: <4ejvO8emCHE-QpMRGYDht30RlDjfSZh86tLHq7uzYpM=.f43a8e15-353b-4fcd-b342-728f3affa0f1@github.com> On Tue, 27 May 2025 07:20:33 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopnode.cpp line 1178: >> >>> 1176: // - CountedLoop: Can be reused. >>> 1177: bool PhaseIdealLoop::short_running_loop(IdealLoopTree* loop, jint stride_con, const Node_List &range_checks, >>> 1178: uint iters_limit) { >> >> Are only long-loops allowed, or can there be int loops here too? >> You talk about long range checks, which makes me believe this is about long loops. But then below you check for the `bt` of the head, so could that be int? >> >> If it is about longs only: it would be nice to have an assert, and I would also rename the method to include the `long`. >> >> You may also want to say what the return `bool` means (success, the loop was indeed a short running (long?) loop). > > Hmm, or does the `int` `bt` come from `StressLongCountedLoop`, which complicates things here? Could `iters_limit` be constant? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2108397256 From epeter at openjdk.org Tue May 27 07:59:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 May 2025 07:59:55 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v6] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <8BJorsTgiK1pTElabu0NZFko5n4mlpAhadlt87w_v2s=.f19e86d1-d646-46eb-860f-cbbadf37ada3@github.com> Message-ID: On Wed, 21 May 2025 10:32:33 GMT, Roberto Casta?eda Lozano wrote: > > > I also think it would be good to investigate, separately, early elimination of dead array allocations, even after the integration of this work. Dead allocations may inhibit later optimizations so it would be good to eliminate them as early as possible anyway. One difficulty (not addressed in [c28f81a](https://github.com/openjdk/jdk/commit/c28f81a7ef2a4f3d3cb761ea23a80c09276e7e58)) is that early array elimination should still generate the nonnegative array size check code. > > > > > > That makes sense. It would be useful to have a bugs to track that one. > > Turns out there is one already: [JDK-8180290](https://bugs.openjdk.org/browse/JDK-8180290), I just added a comment there. Should we link it on JIRA? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2911544225 From bkilambi at openjdk.org Tue May 27 08:03:56 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 27 May 2025 08:03:56 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v4] In-Reply-To: References: <5OezxGXLCvvauaNiX7FkOacjbwvvB-sc3k8MdEjKmwo=.8d69862a-feee-4dbe-bcf9-b53620f823f7@github.com> <2gqcpkBJkkrb21MZlPPHMWqcVWRNEz0KXdolQxd8CkI=.80a3abdb-7a11-4aec-b307-66874de81757@github.com> <4DqS6Sfys6GAONfkAQk8KNGyuqD7UvgJ8t5taO99goU=.907d894e-872a-45de-b9cb-aaf44a3ffd35@github.com> Message-ID: On Thu, 22 May 2025 15:05:33 GMT, Emanuel Peter wrote: >> Hi @eme64 Hope the tests have passed ! > > @Bhavana-Kilambi We are at about `75%`, often the last `25%` takes a little longer when some platforms have a high load. Just ping me again tomorrow ;) @eme64 Thanks a lot for the testing. I will integrate the patch now ------------- PR Comment: https://git.openjdk.org/jdk/pull/25096#issuecomment-2911553522 From duke at openjdk.org Tue May 27 08:03:58 2025 From: duke at openjdk.org (duke) Date: Tue, 27 May 2025 08:03:58 GMT Subject: RFR: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations [v5] In-Reply-To: <-LdLobbf_wMuaEd7e4ietBnJHhDBJFUWk7Hw2EdnmuY=.c7fa1951-8874-4f56-afe7-75341e748899@github.com> References: <-LdLobbf_wMuaEd7e4ietBnJHhDBJFUWk7Hw2EdnmuY=.c7fa1951-8874-4f56-afe7-75341e748899@github.com> Message-ID: <8vyduN5Qn7vFazvCS5ev34hMDXqD7SHhH0LSnXGiVKU=.795372cf-02e7-457f-ae47-e0c57cc6daa5@github.com> On Thu, 22 May 2025 11:58:18 GMT, Bhavana Kilambi wrote: >> This patch adds aarch64 backend (both Neon and SVE) for FP16 vector operations - add, mul, sub, div, min, max, sqrt and fma. >> >> Testing: >> JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) pass on aarch64 which also includes the JTREG test to test the FP16 vector operations - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` > > Bhavana Kilambi has updated the pull request incrementally with one additional commit since the last revision: > > Reduce @Warmup from 10000 to 50 @Bhavana-Kilambi Your change (at version 710edeecc1ec2acd5e808f9d5160c81d48976696) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25096#issuecomment-2911557903 From mchevalier at openjdk.org Tue May 27 08:06:37 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 27 May 2025 08:06:37 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll [v5] In-Reply-To: References: Message-ID: > This assert seems a bit too tight. See the JBS issue to check the math: the bound of `trip_count` should be `<= 2^31`, while the current bound is ` < (julong)max_juint/2` = floor((2^32-1)/2) = (2^32-2) / 2 = 2^31-1. Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: - +message in assert - Move asserts around ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25295/files - new: https://git.openjdk.org/jdk/pull/25295/files/a5552ffd..fa4dd336 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25295&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25295&range=03-04 Stats: 21 lines in 2 files changed: 5 ins; 14 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25295/head:pull/25295 PR: https://git.openjdk.org/jdk/pull/25295 From mchevalier at openjdk.org Tue May 27 08:08:51 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 27 May 2025 08:08:51 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll In-Reply-To: <8q08MdGP6Oo_brVI0kxsPYm1XSH1wYEguhJbnD0i1LI=.ea512821-a21c-453b-96e6-b64f7e5fb94d@github.com> References: <8q08MdGP6Oo_brVI0kxsPYm1XSH1wYEguhJbnD0i1LI=.ea512821-a21c-453b-96e6-b64f7e5fb94d@github.com> Message-ID: On Wed, 21 May 2025 14:48:16 GMT, Christian Hagedorn wrote: >> This assert seems a bit too tight. See the JBS issue to check the math: the bound of `trip_count` should be `<= 2^31`, while the current bound is ` < (julong)max_juint/2` = floor((2^32-1)/2) = (2^32-2) / 2 = 2^31-1. > > Drive-by comment: Were you able to extract a regression test that does not require the stress peeling flag? Rather than the endlessly growing comment, I discussed with @chhagedorn and concluded to put asserts elsewhere to make more obvious and more unbreakable some invariants we rely on. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25295#issuecomment-2911571432 From galder at openjdk.org Tue May 27 08:14:54 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 27 May 2025 08:14:54 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) In-Reply-To: References: Message-ID: On Thu, 22 May 2025 08:35:18 GMT, Roland Westrelin wrote: > The test case has an out of loop `Store` with an `AddP` address > expression that has other uses and is in the loop body. Schematically, > only showing the address subgraph and the bases for the `AddP`s: > > > Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 > -> CastPP#110 > > > Both `AddP`s have the same base, a `CastPP` that's also in the loop > body. > > That loop is a counted loop and only has 3 iterations so is fully > unrolled. First, one iteration is peeled: > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > The `AddP`s and `CastPP` are cloned (because in the loop body). As > part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is > called. It finds the test that guards `CastPP#283` in the peeled > iteration dominates and replaces the test that guards `CastPP#110` > (the test in the peeled iteration is the clone of the test in the > loop). That causes `CastPP#110`'s control to be updated to that of the > test in the peeled iteration and to be yanked from the loop. So now > `CastPP#283` and `CastPP#110` have the same inputs. > > Next unrolling happens: > > > /-> CastPP#110 > /-> AddP#400 -> AddP#401 -> CastPP#110 > Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 > \ -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > `AddP`s are cloned once more but not the `CastPP`s because they are > both in the peeled iteration now. A new `Phi` is added. > > Next igvn runs. It's going to push the `AddP`s through the `Phi`s. > > Through `Phi#477`: > > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 > \ -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > Through `Phi#360`: > > > /-> AddP#134 -> CastPP#110 > /-> Phi#509 -> AddP#401 -> CastPP#110 > Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 > -> Phi#514 -> CastPP#283 > -> CastP#110 > > > Then `Phi#514` which has 2 `CastPP`s as input with identical inputs is > transformed into anot... test/hotspot/jtreg/compiler/c2/TestMismatchedAddPAfterMaxUnroll.java line 74: > 72: } > 73: if (flag) { > 74: if (flag2) { It looks a bit odd to have this if statement and the flag one. One would expect these to be dead code eliminated? Or does the DCE only happen after the problematic C2 crash? Might be useful to have some comment explaining the rationale for these if statements. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2108533887 From rcastanedalo at openjdk.org Tue May 27 08:18:57 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 May 2025 08:18:57 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Sat, 17 May 2025 09:04:28 GMT, Andrew Haley wrote: >> OK. C2 does not currently support creating exception table entries with arbitrary offsets relative to the start address of the code emitted for a Mach node, so that support would have to be added. I prototyped this support [here](https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-implicit-null-checks), see calls to `record_exception_pc_offset()`. I don't think it is, overall, simpler than the approach proposed in this PR - definitely not from a `PhaseOutput`/`C2_MacroAssembler` perspective. But if you still think it is worth exploring, I will create a new prototype with the `record_exception_pc_offset()` on top of this PR to make it easier to compare. > > I don't think you have to do that. I think you only have to mark both the lea and the memory access with an exception table entry. The segfault handler sees the two entries, deduces that this access is split into two instructions, and does the right thing. @theRealAph is it OK to proceed with this PR as it is, or do you still think it would be better to extend C2 with multiple implicit null exception table entries per Mach node? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2108544177 From bkilambi at openjdk.org Tue May 27 08:20:00 2025 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 27 May 2025 08:20:00 GMT Subject: Integrated: 8355585: Aarch64: Add aarch64 backend for Float16 vector operations In-Reply-To: References: Message-ID: On Wed, 7 May 2025 14:14:14 GMT, Bhavana Kilambi wrote: > This patch adds aarch64 backend (both Neon and SVE) for FP16 vector operations - add, mul, sub, div, min, max, sqrt and fma. > > Testing: > JTREG tests - hotspot_all, jdk (tier 1-3) and langtools (tier 1) pass on aarch64 which also includes the JTREG test to test the FP16 vector operations - `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` This pull request has now been integrated. Changeset: 7bc315fa Author: Bhavana Kilambi Committer: Xiaohong Gong URL: https://git.openjdk.org/jdk/commit/7bc315fa6ac4e539e52b077f15c061516e208278 Stats: 1116 lines in 9 files changed: 426 ins; 0 del; 690 mod 8355585: Aarch64: Add aarch64 backend for Float16 vector operations Reviewed-by: epeter, haosun, xgong, aph ------------- PR: https://git.openjdk.org/jdk/pull/25096 From rcastanedalo at openjdk.org Tue May 27 08:22:56 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 May 2025 08:22:56 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v6] In-Reply-To: References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <8BJorsTgiK1pTElabu0NZFko5n4mlpAhadlt87w_v2s=.f19e86d1-d646-46eb-860f-cbbadf37ada3@github.com> Message-ID: On Tue, 27 May 2025 07:56:46 GMT, Emanuel Peter wrote: > > Turns out there is one already: [JDK-8180290](https://bugs.openjdk.org/browse/JDK-8180290), I just added a comment there. > > Should we link it on JIRA? Sure, done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24570#issuecomment-2911615693 From jbhateja at openjdk.org Tue May 27 08:31:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 27 May 2025 08:31:57 GMT Subject: RFR: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill In-Reply-To: References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: On Mon, 26 May 2025 14:05:42 GMT, Roberto Casta?eda Lozano wrote: >>> > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. >>> > >>> > >>> > Have you checked that these tests exercise `ZRuntimeCallSpill` significantly? Most tests in that directory seem to exercise C2's generated ZGC barriers, which use other spilling/restoring logic across runtime calls (`SaveLiveRegisters`). Also, I expect the register pressure in these test cases to be minimal, so it could be good to randomize register assignment to improve the testing effectiveness. Finally, `ZRuntimeCallSpill` is typically used in slow paths, which are rarely exercised in short-lived test cases. Have you considered altering the users of `ZRuntimeCallSpill` so that they are forced to always, or at least more often, enter the slow path, for testing purposes? [This PR](https://github.com/openjdk/jdk/pull/18967) did something similar in the context of C2 ZGC barriers. >>> >>> Intel SDE allows us to collect execution traces with _-itrace_execute_emulate_ and we found quite a lot of register save/ restorations around native method, there is already an existing test point for it https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/gcbarriers/UnsafeIntrinsicsTest.java >> >> OK, thanks for checking Jatin! >> >> Have you also checked whether, at least in some of the cases, some of the APX EGPRs are live across the runtime call (i.e. are defined before the call and used after the call), and whether the called runtime routine typically clobbers these registers? Knowing that this case is exercised in the test runs would be good to be confident about the correctness of the patch. > >> Hi @robcasloz, The patch uses new push2/pop2 instructions, which reduces dynamic instruction count needed to save and restore all the caller-saved registers. New instruction sequence based on push2/pop2 not only saves EGPRs but also existing GPRs with shorter JIT sequence. We verified our fix using the following standalone gtest with the Intel Software Development Emulator. >> >> [test_ZRuntimeCallSpill_cpp.txt](https://github.com/user-attachments/files/20440415/test_ZRuntimeCallSpill_cpp.txt) >> >> Given that gtests is a build-time validation and the JVM itself is built with with minimum feature set, hence am hesitant to add this along with the patch. BTW, ZRuntimeCallSpill is called as part of the slow path barrier for native methods, which can modify EGPRs. >> >> Let me know if you think it's good to land in. > > Thanks for the details! Let me run some internal testing, since the PR affects spilling of non-extended registers too (due to special handling of `_result == rax`). Will come back with the results within a day or two. Thanks @robcasloz , @xmas92 and @sviswa7 for your reviews and approvals. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25351#issuecomment-2911639907 From jbhateja at openjdk.org Tue May 27 08:31:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 27 May 2025 08:31:57 GMT Subject: Integrated: 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill In-Reply-To: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> References: <6atjHzjVheepooxryAKrfEsA13NYrCe4-sDITfTJTAM=.3fd76574-6930-439e-8e6b-0dd20e399721@github.com> Message-ID: <80JRfWJnkss2B0sKMAPjyA9YyH1UHeRNhTKX3dqNpYo=.1b2ce9e4-20be-4fd9-86a0-a947e4a127bf@github.com> On Wed, 21 May 2025 12:33:26 GMT, Jatin Bhateja wrote: > Patch spills APX EGPRs across runtime calls to slow-path barriers using PUSH2P/POP2 instructions with PPX hints. > These instructions operate over a pair of registers resulting into an smaller save/restoration JIT code, on the hind side they have hard alignment and balancing constraints, as they operate over 16-byte aligned stack address. > ZRuntimeCallSpill is agnostic to live register, thus resulting SPILL sequence should not modify the contents of the register. > > Patch has been verified using Intel SDE all test under test/hotspot/jtreg/compiler/gcbarriers are green. > > Kindly review and share your feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 5924c2d6 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/5924c2d6c7f636b428bc7f43abe2115af4532358 Stats: 78 lines in 1 file changed: 55 ins; 0 del; 23 mod 8357267: ZGC: Handle APX EGPRs spilling in ZRuntimeCallSpill Reviewed-by: rcastanedalo, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/25351 From mhaessig at openjdk.org Tue May 27 08:55:52 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 27 May 2025 08:55:52 GMT Subject: RFR: 8357781: Deep recursion in PhaseCFG::set_next_call leads to stack overflow In-Reply-To: References: Message-ID: <7_PTE9pSxVmZWd2M6MRCLXuchuCrOXbGMUEEQNXVAfg=.4f4e3ef3-eee6-4d1a-85e3-ae5fc02fedd8@github.com> On Mon, 26 May 2025 12:07:08 GMT, Marc Chevalier wrote: > There is nothing very wrong here: > - the graph is not broken > - the algorithm is correct > > It just happens that the graph is very deep, it has a very long, narrow chain of nodes because of crazy unrolling, because of `LoopUnrollLimit=8192`. This depth simply makes the algorithm recurse deeper than the stack size allows. This kind of graph shape is not quite trivial to reproduce. The proposed reproducer is very easy to change into a non-reproducer, with many kinds of change, even that seem harmless to me. > > The fix is also pretty direct: let's change the recursive traversal, with a worklist-based iterative one. It's not as elegant, but it doesn't overflow. > > How sad stacks are still so bounded... Thanks for working on this fix, Marc. The conversion to a worklist-based loop looks good. I only found a few formatting nits. src/hotspot/share/opto/lcm.cpp line 800: > 798: void PhaseCFG::set_next_call(const Block* block, Node* init, VectorSet& next_call) const { > 799: Node_List worklist; > 800: worklist.push(init); I guess the only reason `init` is not `const` is because of `Node_List::push()`? src/hotspot/share/opto/lcm.cpp line 805: > 803: Node* n = worklist.pop(); > 804: if (next_call.test_set(n->_idx)) continue; > 805: for (uint i=0; i < n->len(); i++) { Suggestion: for (uint i = 0; i < n->len(); i++) { Formatting nit src/hotspot/share/opto/lcm.cpp line 807: > 805: for (uint i=0; i < n->len(); i++) { > 806: Node* m = n->in(i); > 807: if(m == nullptr) continue; // must see all nodes in block that precede call Suggestion: if (m == nullptr) continue; // must see all nodes in block that precede call Formatting nit test/hotspot/jtreg/compiler/c2/StackOverflowInSetNextCall.java line 30: > 28: * @summary Triggered a stack overflow in PhaseCFG::set_next_call. The graph is legitimately big (mostly deep and not wide) > 29: * which makes the old version of PhaseCFG::set_next_call crash. > 30: * Suggestion: * @summary Triggered a stack overflow in PhaseCFG::set_next_call due to a legitimately big (mostly deep and not wide) graph. A bit more concise. Feel free to ignore. test/hotspot/jtreg/compiler/c2/StackOverflowInSetNextCall.java line 66: > 64: public static void main(String[] args) { > 65: for (int i = 0; i < 400; ++i) { > 66: test (); Suggestion: test(); Formatting nit ------------- Marked as reviewed by mhaessig (Author). PR Review: https://git.openjdk.org/jdk/pull/25448#pullrequestreview-2870215197 PR Review Comment: https://git.openjdk.org/jdk/pull/25448#discussion_r2108603383 PR Review Comment: https://git.openjdk.org/jdk/pull/25448#discussion_r2108604728 PR Review Comment: https://git.openjdk.org/jdk/pull/25448#discussion_r2108605573 PR Review Comment: https://git.openjdk.org/jdk/pull/25448#discussion_r2108614005 PR Review Comment: https://git.openjdk.org/jdk/pull/25448#discussion_r2108608005 From mchevalier at openjdk.org Tue May 27 08:55:53 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 27 May 2025 08:55:53 GMT Subject: RFR: 8357781: Deep recursion in PhaseCFG::set_next_call leads to stack overflow In-Reply-To: <7_PTE9pSxVmZWd2M6MRCLXuchuCrOXbGMUEEQNXVAfg=.4f4e3ef3-eee6-4d1a-85e3-ae5fc02fedd8@github.com> References: <7_PTE9pSxVmZWd2M6MRCLXuchuCrOXbGMUEEQNXVAfg=.4f4e3ef3-eee6-4d1a-85e3-ae5fc02fedd8@github.com> Message-ID: On Tue, 27 May 2025 08:43:15 GMT, Manuel H?ssig wrote: >> There is nothing very wrong here: >> - the graph is not broken >> - the algorithm is correct >> >> It just happens that the graph is very deep, it has a very long, narrow chain of nodes because of crazy unrolling, because of `LoopUnrollLimit=8192`. This depth simply makes the algorithm recurse deeper than the stack size allows. This kind of graph shape is not quite trivial to reproduce. The proposed reproducer is very easy to change into a non-reproducer, with many kinds of change, even that seem harmless to me. >> >> The fix is also pretty direct: let's change the recursive traversal, with a worklist-based iterative one. It's not as elegant, but it doesn't overflow. >> >> How sad stacks are still so bounded... > > src/hotspot/share/opto/lcm.cpp line 800: > >> 798: void PhaseCFG::set_next_call(const Block* block, Node* init, VectorSet& next_call) const { >> 799: Node_List worklist; >> 800: worklist.push(init); > > I guess the only reason `init` is not `const` is because of `Node_List::push()`? Alas, yes. `Node_List` takes non-const and spit non-const. Unfortunate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25448#discussion_r2108625514 From mchevalier at openjdk.org Tue May 27 09:01:17 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 27 May 2025 09:01:17 GMT Subject: RFR: 8357781: Deep recursion in PhaseCFG::set_next_call leads to stack overflow [v2] In-Reply-To: References: Message-ID: > There is nothing very wrong here: > - the graph is not broken > - the algorithm is correct > > It just happens that the graph is very deep, it has a very long, narrow chain of nodes because of crazy unrolling, because of `LoopUnrollLimit=8192`. This depth simply makes the algorithm recurse deeper than the stack size allows. This kind of graph shape is not quite trivial to reproduce. The proposed reproducer is very easy to change into a non-reproducer, with many kinds of change, even that seem harmless to me. > > The fix is also pretty direct: let's change the recursive traversal, with a worklist-based iterative one. It's not as elegant, but it doesn't overflow. > > How sad stacks are still so bounded... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Address comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25448/files - new: https://git.openjdk.org/jdk/pull/25448/files/626529d5..bf04a1b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25448&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25448&range=00-01 Stats: 5 lines in 2 files changed: 0 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25448.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25448/head:pull/25448 PR: https://git.openjdk.org/jdk/pull/25448 From mchevalier at openjdk.org Tue May 27 09:01:17 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 27 May 2025 09:01:17 GMT Subject: RFR: 8357781: Deep recursion in PhaseCFG::set_next_call leads to stack overflow [v2] In-Reply-To: <7_PTE9pSxVmZWd2M6MRCLXuchuCrOXbGMUEEQNXVAfg=.4f4e3ef3-eee6-4d1a-85e3-ae5fc02fedd8@github.com> References: <7_PTE9pSxVmZWd2M6MRCLXuchuCrOXbGMUEEQNXVAfg=.4f4e3ef3-eee6-4d1a-85e3-ae5fc02fedd8@github.com> Message-ID: On Tue, 27 May 2025 08:47:56 GMT, Manuel H?ssig wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Address comments > > test/hotspot/jtreg/compiler/c2/StackOverflowInSetNextCall.java line 30: > >> 28: * @summary Triggered a stack overflow in PhaseCFG::set_next_call. The graph is legitimately big (mostly deep and not wide) >> 29: * which makes the old version of PhaseCFG::set_next_call crash. >> 30: * > > Suggestion: > > * @summary Triggered a stack overflow in PhaseCFG::set_next_call due to a legitimately big (mostly deep and not wide) graph. > > A bit more concise. Feel free to ignore. I like it, it's more natural. I take it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25448#discussion_r2108636910 From mhaessig at openjdk.org Tue May 27 09:06:52 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 27 May 2025 09:06:52 GMT Subject: RFR: 8357781: Deep recursion in PhaseCFG::set_next_call leads to stack overflow [v2] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 09:01:17 GMT, Marc Chevalier wrote: >> There is nothing very wrong here: >> - the graph is not broken >> - the algorithm is correct >> >> It just happens that the graph is very deep, it has a very long, narrow chain of nodes because of crazy unrolling, because of `LoopUnrollLimit=8192`. This depth simply makes the algorithm recurse deeper than the stack size allows. This kind of graph shape is not quite trivial to reproduce. The proposed reproducer is very easy to change into a non-reproducer, with many kinds of change, even that seem harmless to me. >> >> The fix is also pretty direct: let's change the recursive traversal, with a worklist-based iterative one. It's not as elegant, but it doesn't overflow. >> >> How sad stacks are still so bounded... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Address comments Marked as reviewed by mhaessig (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/25448#pullrequestreview-2870282874 From mdoerr at openjdk.org Tue May 27 09:12:05 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 27 May 2025 09:12:05 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v6] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: <1yOzUEBYJFMe75r2nTQYJIyk4bEia_Tx4rfT3RAG6OU=.c8cf0470-05ae-4484-b533-ef6d37a85b07@github.com> On Tue, 27 May 2025 07:46:43 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Include address mode test in 'legitimize_address' > - Excluded IR checks for testLoadVolatile on PPC64 > > I guess it's not worth stepping over the memory barrier. Disabling this rule for PPC64 should be ok, too. > > Thanks for testing and reporting @TheRealMDoerr, I agree that it would be too much complexity for little return. I disabled the rule for PPC64 (commit [fdf34f9](https://github.com/openjdk/jdk/commit/fdf34f905fd1ee4dde27374d66b1b7fb251e1622)). Please let me know if that works as expected. Thanks! TestImplicitNullChecks has passed on PPC64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25066#issuecomment-2911772562 From epeter at openjdk.org Tue May 27 09:15:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 May 2025 09:15:01 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> Message-ID: <4ShW7VcaJrO0v0cHwUN1vccOH8tNPlJSIh_K0W2RdS0=.14954a26-c962-41a1-9088-2e1a1bc01eb4@github.com> On Thu, 22 May 2025 15:20:14 GMT, Roland Westrelin wrote: >> An `Initialize` node for an `Allocate` node is created with a memory >> `Proj` of adr type raw memory. In order for stores to be captured, the >> memory state out of the allocation is a `MergeMem` with slices for the >> various object fields/array element set to the raw memory `Proj` of >> the `Initialize` node. If `Phi`s need to be created during later >> transformations from this memory state, The `Phi` for a particular >> slice gets its adr type from the type of the `Proj` which is raw >> memory. If during macro expansion, the `Allocate` is found to have no >> use and so can be removed, the `Proj` out of the `Initialize` is >> replaced by the memory state on input to the `Allocate`. A `Phi` for >> some slice for a field of an object will end up with the raw memory >> state on input to the `Allocate` node. As a result, memory state at >> the `Phi` is incorrect and incorrect execution can happen. >> >> The fix I propose is, rather than have a single `Proj` for the memory >> state out of the `Initialize` with adr type raw memory, to use one >> `Proj` per slice added to the memory state after the `Initalize`. Each >> of the `Proj` should return the right adr type for its slice. For that >> I propose having a new type of `Proj`: `NarrowMemProj` that captures >> the right adr type. >> >> Logic for the construction of the `Allocate`/`Initialize` subgraph is >> tweaked so the right adr type captured in is own `NarrowMemProj` is >> added to the memory sugraph. Code that removes an allocation or moves >> it also has to be changed so it correctly takes the multiple memory >> projections out of the `Initialize` node into account. >> >> One tricky issue is that when EA split types for a scalar replaceable >> `Allocate` node: >> >> 1- the adr type captured in the `NarrowMemProj` becomes out of sync >> with the type of the slices for the allocation >> >> 2- before EA, the memory state for one particular field out of the >> `Initialize` node can be used for a `Store` to the just allocated >> object or some other. So we can have a chain of `Store`s, some to >> the newly allocated object, some to some other objects, all of them >> using the state of `NarrowMemProj` out of the `Initialize`. After >> split unique types, the `NarrowMemProj` is for the slice of a >> particular allocation. So `Store`s to some other objects shouldn't >> use that memory state but the memory state before the `Allocate`. >> >> For that, I added logic to update the adr type of `NarrowMemProj` >> during split uni... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Thank you @rwestrel for taking this on! The solution seems very reasonable, using multiple projections. Thank you also for finding that extra test case that @robcasloz asked for, it makes sure that we have a reproducer that still works when @robcasloz makes changes that would make the other reproducers not reproduce any more ;) I have plenty of small things below, nothing too big. The largest source of confusion was the `apply_to_projs` construct. I'll copy my comment from `apply_to_projs_any_iterator`: ---------------------- I was a little confused about this apply_to_proj construct. I this something we already use, a familiar concept, the apply_to? > `// Iterate with i over all Proj uses calling callback` Do we really iterate over all Proj uses? It seems that is not true if callback returns true, right? This makes the semantic of the apply_to_projs a little subtle, right? A suggestion: You could expect from Callback that it does not return true/false, but some enum value that describes what happens when you return that value. Maybe ApplyToProjs::CONTINUE vs ApplyToProjs::BREAK_AND_RETURN_CURRENT_PROJ? src/hotspot/share/opto/escape.cpp line 4455: > 4453: assert(init != nullptr, "can't find Initialization node for this Allocate node"); > 4454: DUIterator i = init->outs(); > 4455: auto process_narrow_proj = [tinst, init, this, igvn](NarrowMemProjNode* proj) { In the style guide, it says: > Prefer [&] as the capture list of a lambda expression. https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md There is a lengthy section on by-value capture, which I have not studied in depth. Probably what you are doing here is correct, but the style guide also talks about by-value capturing is "syntactically subtle". I guess you are only passing the lambda down, so there should be no issue... still it makes me a little nervous. What do you think? src/hotspot/share/opto/escape.cpp line 4467: > 4465: set_map(proj, new_proj); // record it so ConnectionGraph::find_inst_mem() can find it > 4466: } > 4467: return false; Can you add a in-line comment about what the `return false` means? src/hotspot/share/opto/escape.cpp line 4804: > 4802: assert(n->is_Initialize(), "We only push projections of Initialize"); > 4803: if (use->as_Proj()->_con == TypeFunc::Memory) { // Ignore precedent edge > 4804: memnode_worklist.append_if_missing(use); Do you know why we are using a `GrowableArray` here? Would a `UnikeNodeList` not serve us better since we are always doing `append_if_missing`, which essentially has to scan the whole `GrowableArray`? src/hotspot/share/opto/library_call.cpp line 5568: > 5566: int klass_idx = C->get_alias_index(ary_type->add_offset(oopDesc::klass_offset_in_bytes())); > 5567: #endif > 5568: auto move_proj = [=](ProjNode* proj) { Same question about by-value capture here. src/hotspot/share/opto/library_call.cpp line 5570: > 5568: auto move_proj = [=](ProjNode* proj) { > 5569: int alias_idx = C->get_alias_index(proj->adr_type()); > 5570: assert(alias_idx == Compile::AliasIdxRaw || alias_idx == elemidx || alias_idx == mark_idx || alias_idx == klass_idx, "should be raw memory or array element type"); Suggestion: assert(alias_idx == Compile::AliasIdxRaw || alias_idx == elemidx || alias_idx == mark_idx || alias_idx == klass_idx, "should be raw memory or array element type"); Might be more readable. src/hotspot/share/opto/macro.cpp line 1030: > 1028: #endif > 1029: init->replace_mem_projs_by(mem, &_igvn); > 1030: assert(init->outcnt() == 0, "only a control and some memory projections expected"); I'm a little confused. Where should the control or memory projections be? I suppose not below `init`... Ah, are you trying to say that there only "should have been control and memory projections, which now are all removed"? Suggestion: assert(init->outcnt() == 0, "should only have had a control and some memory projections, and we removed them"); src/hotspot/share/opto/macro.cpp line 1647: > 1645: transform_later(ctrl); > 1646: Node* existing_raw_mem_proj = nullptr; > 1647: auto find_raw_mem = [&, this](ProjNode* proj) { Same question about lambda capture. src/hotspot/share/opto/macro.cpp line 1657: > 1655: Node* raw_mem_proj = new ProjNode(init, TypeFunc::Memory); > 1656: transform_later(raw_mem_proj); > 1657: assert(existing_raw_mem_proj != nullptr, ""); Suggestion: init->apply_to_projs(find_raw_mem, TypeFunc::Memory); assert(existing_raw_mem_proj != nullptr, "should have found it"); Node* raw_mem_proj = new ProjNode(init, TypeFunc::Memory); transform_later(raw_mem_proj); I would do the assert as early as possible. src/hotspot/share/opto/memnode.cpp line 5483: > 5481: return true; > 5482: } > 5483: return false; Suggestion: return proj->adr_type() == adr_type && callback(proj->as_NarrowMemProj(); src/hotspot/share/opto/memnode.hpp line 1411: > 1409: return true; > 1410: } > 1411: return false; Suggestion: return proj->is_NarrowMemProj() && callback(proj->as_NarrowMemProj(); src/hotspot/share/opto/multnode.cpp line 48: > 46: ProjNode* MultiNode::proj_out_or_null(uint which_proj) const { > 47: assert((Opcode() != Op_If && Opcode() != Op_RangeCheck) || which_proj == (uint)true || which_proj == (uint)false, "must be 1 or 0"); > 48: assert(number_of_projs(which_proj) <= 1, "only when there's a single projection"); Does this hold for all `MultiNode`s under all circumstances? Or should we consider returning `nullptr` in this case? src/hotspot/share/opto/multnode.cpp line 51: > 49: auto find_proj = [which_proj, this](ProjNode* proj) { > 50: assert((Opcode() != Op_If && Opcode() != Op_RangeCheck) || proj->Opcode() == (which_proj ? Op_IfTrue : Op_IfFalse), > 51: "bad if #2"); What do you mean by `bad if #2`? Can you write something more descriptive? src/hotspot/share/opto/multnode.cpp line 70: > 68: return true; > 69: } > 70: return false; Could not make a code suggestion unfortunately, as GitHub is blocked by the removal lines. `return proj->_is_io_use == is_io_use && callback(proj);` src/hotspot/share/opto/multnode.cpp line 257: > 255: Compile::AliasType* atp = C->alias_type(_adr_type); > 256: ciField* field = atp->field(); > 257: if (field) { Suggestion: if (field != nullptr) { Style guide forbids implicit null checks. src/hotspot/share/opto/multnode.hpp line 55: > 53: uint number_of_projs(uint which_proj, bool is_io_use) const; > 54: > 55: // Run callback on all Proj projection from this node Confused. Are there non-Proj projections? src/hotspot/share/opto/multnode.hpp line 134: > 132: } > 133: return nullptr; > 134: } I was a little confused about this `apply_to_proj` construct. I this something we already use, a familiar concept, the `apply_to`? Do we really iterate over `all` Proj uses? It seems that is not true if `callback` returns true, right? This makes the semantic of the `apply_to_projs` a little subtle, right? A suggestion: You could expect from `Callback` that it does not return `true/false`, but some `enum` value that describes what happens when you return that value. Maybe `ApplyToProjs::CONTINUE` vs `ApplyToProjs::BREAK_AND_RETURN_CURRENT_PROJ`? src/hotspot/share/opto/multnode.hpp line 230: > 228: return true; > 229: } > 230: return false; Could also be simplified to a single line `return ...` test/hotspot/jtreg/compiler/macronodes/TestEliminationOfAllocationWithoutUse.java line 2: > 1: /* > 2: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. Is the copyright year accurate? test/hotspot/jtreg/compiler/macronodes/TestInitializingStoreCapturing.java line 2: > 1: /* > 2: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. Is the copyright year accurate? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24570#pullrequestreview-2870115629 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2108535409 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2108540079 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2108546984 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2108558177 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2108561165 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2108574535 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2108579647 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2108586458 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2108598966 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2108614731 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2108619844 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2108620767 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2108625866 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2108632723 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2108634420 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2108650897 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2108655508 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2108657963 PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2108659208 From epeter at openjdk.org Tue May 27 09:15:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 27 May 2025 09:15:01 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: <4ShW7VcaJrO0v0cHwUN1vccOH8tNPlJSIh_K0W2RdS0=.14954a26-c962-41a1-9088-2e1a1bc01eb4@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> <4ShW7VcaJrO0v0cHwUN1vccOH8tNPlJSIh_K0W2RdS0=.14954a26-c962-41a1-9088-2e1a1bc01eb4@github.com> Message-ID: On Tue, 27 May 2025 08:51:17 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/multnode.cpp line 51: > >> 49: auto find_proj = [which_proj, this](ProjNode* proj) { >> 50: assert((Opcode() != Op_If && Opcode() != Op_RangeCheck) || proj->Opcode() == (which_proj ? Op_IfTrue : Op_IfFalse), >> 51: "bad if #2"); > > What do you mean by `bad if #2`? Can you write something more descriptive? Ah, this was just copied. Would still be nice if it was improved ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2108622334 From thartmann at openjdk.org Tue May 27 09:37:53 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 27 May 2025 09:37:53 GMT Subject: RFR: 8357781: Deep recursion in PhaseCFG::set_next_call leads to stack overflow [v2] In-Reply-To: References: Message-ID: <90OanIqgFB1FIuOgFaAPwPTulSQlPUZxz71riQBPaCE=.64fa5b6f-01fc-4161-8ee6-d4c4d96f72bb@github.com> On Tue, 27 May 2025 09:01:17 GMT, Marc Chevalier wrote: >> There is nothing very wrong here: >> - the graph is not broken >> - the algorithm is correct >> >> It just happens that the graph is very deep, it has a very long, narrow chain of nodes because of crazy unrolling, because of `LoopUnrollLimit=8192`. This depth simply makes the algorithm recurse deeper than the stack size allows. This kind of graph shape is not quite trivial to reproduce. The proposed reproducer is very easy to change into a non-reproducer, with many kinds of change, even that seem harmless to me. >> >> The fix is also pretty direct: let's change the recursive traversal, with a worklist-based iterative one. It's not as elegant, but it doesn't overflow. >> >> How sad stacks are still so bounded... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Address comments Changes requested by thartmann (Reviewer). test/hotspot/jtreg/compiler/c2/StackOverflowInSetNextCall.java line 27: > 25: /** > 26: * @test > 27: * @bug 8324837 Suggestion: * @bug 8357781 test/hotspot/jtreg/compiler/c2/StackOverflowInSetNextCall.java line 30: > 28: * @summary Triggered a stack overflow in PhaseCFG::set_next_call due to a legitimately big (mostly deep and not wide) graph. > 29: * > 30: * @run main/othervm -Xcomp -XX:LoopUnrollLimit=8192 -XX:CompileCommand=compileonly,StackOverflowInSetNextCall::test StackOverflowInSetNextCall What happens if we set the `LoopUnrollLimit` to its max value, i.e. `max_jint / 4`? ------------- PR Review: https://git.openjdk.org/jdk/pull/25448#pullrequestreview-2870374384 PR Review Comment: https://git.openjdk.org/jdk/pull/25448#discussion_r2108714363 PR Review Comment: https://git.openjdk.org/jdk/pull/25448#discussion_r2108713495 From mchevalier at openjdk.org Tue May 27 09:43:11 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 27 May 2025 09:43:11 GMT Subject: RFR: 8357781: Deep recursion in PhaseCFG::set_next_call leads to stack overflow [v3] In-Reply-To: References: Message-ID: > There is nothing very wrong here: > - the graph is not broken > - the algorithm is correct > > It just happens that the graph is very deep, it has a very long, narrow chain of nodes because of crazy unrolling, because of `LoopUnrollLimit=8192`. This depth simply makes the algorithm recurse deeper than the stack size allows. This kind of graph shape is not quite trivial to reproduce. The proposed reproducer is very easy to change into a non-reproducer, with many kinds of change, even that seem harmless to me. > > The fix is also pretty direct: let's change the recursive traversal, with a worklist-based iterative one. It's not as elegant, but it doesn't overflow. > > How sad stacks are still so bounded... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: address comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25448/files - new: https://git.openjdk.org/jdk/pull/25448/files/bf04a1b7..9a98727d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25448&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25448&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25448.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25448/head:pull/25448 PR: https://git.openjdk.org/jdk/pull/25448 From mchevalier at openjdk.org Tue May 27 10:00:58 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 27 May 2025 10:00:58 GMT Subject: RFR: 8357781: Deep recursion in PhaseCFG::set_next_call leads to stack overflow [v2] In-Reply-To: <90OanIqgFB1FIuOgFaAPwPTulSQlPUZxz71riQBPaCE=.64fa5b6f-01fc-4161-8ee6-d4c4d96f72bb@github.com> References: <90OanIqgFB1FIuOgFaAPwPTulSQlPUZxz71riQBPaCE=.64fa5b6f-01fc-4161-8ee6-d4c4d96f72bb@github.com> Message-ID: On Tue, 27 May 2025 09:33:32 GMT, Tobias Hartmann wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Address comments > > test/hotspot/jtreg/compiler/c2/StackOverflowInSetNextCall.java line 30: > >> 28: * @summary Triggered a stack overflow in PhaseCFG::set_next_call due to a legitimately big (mostly deep and not wide) graph. >> 29: * >> 30: * @run main/othervm -Xcomp -XX:LoopUnrollLimit=8192 -XX:CompileCommand=compileonly,StackOverflowInSetNextCall::test StackOverflowInSetNextCall > > What happens if we set the `LoopUnrollLimit` to its max value, i.e. `max_jint / 4`? It doesn't crash because the loops that could be unrolled are already unrolled as much as possible with this example. But what if I replace the `100` with `1000`, then it takes longer, but at some point, the node budget is exceeded and it ends well. But what if I increase the node budget like crazy (then NodeLimitFudgeFactor is too small, but I can increase that as well). Then it takes 100% CPU for a while until it crashes because it exhausts the memory limit (and in my experiments, we crash even after this, later in code generation). I could raise the memory limit, but it will be the same: either eventually it works because I can fit all the nodes and everything in memory (including my worklist), or I'll explode the limit and get terminated eventually. But that seems orthogonal to this problem: if you use options that creates many nodes, the memory might be too limited. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25448#discussion_r2108764834 From thartmann at openjdk.org Tue May 27 10:06:51 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 27 May 2025 10:06:51 GMT Subject: RFR: 8357781: Deep recursion in PhaseCFG::set_next_call leads to stack overflow [v2] In-Reply-To: References: <90OanIqgFB1FIuOgFaAPwPTulSQlPUZxz71riQBPaCE=.64fa5b6f-01fc-4161-8ee6-d4c4d96f72bb@github.com> Message-ID: On Tue, 27 May 2025 09:57:54 GMT, Marc Chevalier wrote: >> test/hotspot/jtreg/compiler/c2/StackOverflowInSetNextCall.java line 30: >> >>> 28: * @summary Triggered a stack overflow in PhaseCFG::set_next_call due to a legitimately big (mostly deep and not wide) graph. >>> 29: * >>> 30: * @run main/othervm -Xcomp -XX:LoopUnrollLimit=8192 -XX:CompileCommand=compileonly,StackOverflowInSetNextCall::test StackOverflowInSetNextCall >> >> What happens if we set the `LoopUnrollLimit` to its max value, i.e. `max_jint / 4`? > > It doesn't crash because the loops that could be unrolled are already unrolled as much as possible with this example. > > But what if I replace the `100` with `1000`, then it takes longer, but at some point, the node budget is exceeded and it ends well. > > But what if I increase the node budget like crazy (then NodeLimitFudgeFactor is too small, but I can increase that as well). Then it takes 100% CPU for a while until it crashes because it exhausts the memory limit (and in my experiments, we crash even after this, later in code generation). > > I could raise the memory limit, but it will be the same: either eventually it works because I can fit all the nodes and everything in memory (including my worklist), or I'll explode the limit and get terminated eventually. But that seems orthogonal to this problem: if you use options that creates many nodes, the memory might be too limited. Right, I think that's expected. Thanks for checking! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25448#discussion_r2108777197 From thartmann at openjdk.org Tue May 27 10:10:55 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 27 May 2025 10:10:55 GMT Subject: RFR: 8357781: Deep recursion in PhaseCFG::set_next_call leads to stack overflow [v3] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 09:43:11 GMT, Marc Chevalier wrote: >> There is nothing very wrong here: >> - the graph is not broken >> - the algorithm is correct >> >> It just happens that the graph is very deep, it has a very long, narrow chain of nodes because of crazy unrolling, because of `LoopUnrollLimit=8192`. This depth simply makes the algorithm recurse deeper than the stack size allows. This kind of graph shape is not quite trivial to reproduce. The proposed reproducer is very easy to change into a non-reproducer, with many kinds of change, even that seem harmless to me. >> >> The fix is also pretty direct: let's change the recursive traversal, with a worklist-based iterative one. It's not as elegant, but it doesn't overflow. >> >> How sad stacks are still so bounded... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > address comment Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25448#pullrequestreview-2870475892 From shade at openjdk.org Tue May 27 10:36:35 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 27 May 2025 10:36:35 GMT Subject: RFR: 8231269: CompileTask::is_unloaded is slow due to JNIHandles type checks [v21] In-Reply-To: References: Message-ID: > [JDK-8163511](https://bugs.openjdk.org/browse/JDK-8163511) made the `CompileTask` improvement to avoid blocking class unloading if a relevant compile task is in queue. Current code does a sleight-of-hand to make sure the the `method*` in `CompileTask` are still valid before using them. Still a noble goal, so we keep trying to do this. > > The code tries to switch weak JNI handle with a strong one when it wants to capture the holder to block unloading. Since we are reusing the same field, we have to do type checks like `JNIHandles::is_weak_global_handle(_method_holder)`. Unfortunately, that type-check goes all the way to `OopStorage` allocation code to verify the handle is really allocated in the relevant `OopStorage`. This takes internal `OopStorage` locks, and thus is slow. > > This issue is clearly visible in Leyden, when there are lots of `CompileTask`-s in the queue, dumped by AOT code loader. It also does not help that `CompileTask::select_task` is effectively quadratic in number of methods in queue, so we end up calling `CompileTask::is_unloaded` very often. > > It is possible to mitigate this issue by splitting the related fields into weak and strong ones. But as Kim mentions in the bug, we should not be using JNI handles here at all, and instead go directly for relevant `OopStorage`-s. This is what this PR does, among other things that should hopefully make the whole mechanics clearer. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `compiler/classUnloading`, 100x still passes; these tests are sensitive to bugs in this code > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 36 commits: - Merge branch 'master' into JDK-8231269-compile-task-weaks - Switch to mutable - Merge branch 'master' into JDK-8231269-compile-task-weaks - More touchups - Spin lock induces false sharing - Merge branch 'master' into JDK-8231269-compile-task-weaks - Merge branch 'master' into JDK-8231269-compile-task-weaks - Rename CompilerTask::is_unloaded back to avoid losing comment context - Simplify select_for_compilation - Merge branch 'master' into JDK-8231269-compile-task-weaks - ... and 26 more: https://git.openjdk.org/jdk/compare/7cb6e5eb...d5e482ac ------------- Changes: https://git.openjdk.org/jdk/pull/24018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24018&range=20 Stats: 429 lines in 11 files changed: 389 ins; 22 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/24018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24018/head:pull/24018 PR: https://git.openjdk.org/jdk/pull/24018 From rcastanedalo at openjdk.org Tue May 27 11:25:55 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 May 2025 11:25:55 GMT Subject: RFR: 8324720: Instruction selection does not respect -XX:-UseBMI2Instructions flag In-Reply-To: References: Message-ID: <3fqlVv5KCFLrX1xglg7lIwfUPNqqHs5uKf8jb0J9WdY=.b3cdc837-8c7a-46b6-9857-4a64da6d48be@github.com> On Fri, 23 May 2025 13:50:09 GMT, Saranya Natarajan wrote: > While executing a function performing `a >> b` operation with `?XX:-UseBMI2Instructions` flag, the generated code contains BMI2 instruction `sarx eax,esi,edx`. The expected output should not contain any BMI2 instruction. > > ### Analysis and solution > > As suggested by @merykitty in [JDK-8324720](https://bugs.openjdk.org/browse/JDK-8324720) , the initial idea was to make `VM_Version::supports_bmi2()` respect` UseBMI2Instructions `flag by disabling BMI2 feature when `UseBMI2Instructions` runtime flag is explicitly set to false. This fix is similar to how other runtime flags such as, `UseAPX` and `UseAVX`, enable or disable specific code and register set. However, some test failures were encountered while running tests on this fix. > > The first set of failures were caused by assertion check on `VM_Version::supports_bmi2()` statement while generating some BMI2 specific instructions. This was caused by the stub generator generating AVX-512 specific code that uses these BMI2 instructions. It should be noted that the `UseAVX` flag is set by default to the highest supported version available in x86 machine. This in turn allows AVX-512 specific code generation whenever possible. In order to not comprise the performance benefits of using AVX-512, the proposed fix only disables BMI2 feature if AVX-512 features are also disabled (or not available in the machine) along with the UseBMI2Instructions flag. > > The second failure occured in `compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java` where a warning "_Intrinsics for SHA-384 and SHA-512 crypto hash functions not available on this CPU_." was returned on a AMD64 machine that had support for SHA512. Looking into `compiler/testlibrary/sha/predicate/IntrinsicPredicates.java` it was found that the predicate for AMD64 was not in line with the changes introduced by [JDK-8341052](https://bugs.openjdk.org/browse/JDK-8341052) in commit [85c1aea](https://github.com/openjdk/jdk/pull/20633/commits/85c1aea90b10014aa34dfc902dff2bfd31bd70c0) . This change affects the final value of `UseBMI2Instructions` when the JVM is run with its default configuration on machines without AVX-512 (i.e. `UseAVX` <= 2), I guess that is unexpected? My CPU flags: $ cat /proc/cpuinfo | grep flags flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi md_clear flush_l1d arch_capabilities Before this change: $ java -XX:+PrintFlagsFinal | grep UseBMI bool UseBMI1Instructions = true {ARCH product} {default} bool UseBMI2Instructions = true {ARCH product} {default} With this change: $ java -XX:+PrintFlagsFinal --version | grep BMI bool UseBMI1Instructions = true {ARCH product} {default} bool UseBMI2Instructions = false {ARCH product} {default} ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25415#pullrequestreview-2870679818 From qamai at openjdk.org Tue May 27 11:28:55 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 27 May 2025 11:28:55 GMT Subject: RFR: 8324720: Instruction selection does not respect -XX:-UseBMI2Instructions flag In-Reply-To: References: Message-ID: On Fri, 23 May 2025 13:50:09 GMT, Saranya Natarajan wrote: > This in turn allows AVX-512 specific code generation whenever possible. In order to not comprise the performance benefits of using AVX-512, the proposed fix only disables BMI2 feature if AVX-512 features are also disabled (or not available in the machine) along with the `UseBMI2Instructions` flag. This seems unreasonable, you are sacrificing correctness for performance. `UseBMI2Instructions` has the default value being `true`, and if `UseBMI2Instruction` is explicitly set to `false`, the VM should respect that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25415#issuecomment-2912161550 From syan at openjdk.org Tue May 27 11:41:53 2025 From: syan at openjdk.org (SendaoYan) Date: Tue, 27 May 2025 11:41:53 GMT Subject: RFR: 8357781: Deep recursion in PhaseCFG::set_next_call leads to stack overflow [v3] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 09:43:11 GMT, Marc Chevalier wrote: >> There is nothing very wrong here: >> - the graph is not broken >> - the algorithm is correct >> >> It just happens that the graph is very deep, it has a very long, narrow chain of nodes because of crazy unrolling, because of `LoopUnrollLimit=8192`. This depth simply makes the algorithm recurse deeper than the stack size allows. This kind of graph shape is not quite trivial to reproduce. The proposed reproducer is very easy to change into a non-reproducer, with many kinds of change, even that seem harmless to me. >> >> The fix is also pretty direct: let's change the recursive traversal, with a worklist-based iterative one. It's not as elegant, but it doesn't overflow. >> >> How sad stacks are still so bounded... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > address comment test/hotspot/jtreg/compiler/c2/StackOverflowInSetNextCall.java line 65: > 63: public static void main(String[] args) { > 64: for (int i = 0; i < 400; ++i) { > 65: test(); Do we need to use the return value of function `test`, to avoid the compiler do the dead code elimination ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25448#discussion_r2108959027 From mchevalier at openjdk.org Tue May 27 11:54:52 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 27 May 2025 11:54:52 GMT Subject: RFR: 8357781: Deep recursion in PhaseCFG::set_next_call leads to stack overflow [v3] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 11:39:07 GMT, SendaoYan wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> address comment > > test/hotspot/jtreg/compiler/c2/StackOverflowInSetNextCall.java line 65: > >> 63: public static void main(String[] args) { >> 64: for (int i = 0; i < 400; ++i) { >> 65: test(); > > Do we need to use the return value of function `test`, to avoid the compiler do the dead code elimination Experimentally it's not useful since if the call was overall eliminated, it wouldn't reproduce the crash. Moreover the test uses `-XX:CompileCommand=compileonly,StackOverflowInSetNextCall::test` so `main` is not compiled so no dead code elimination can kick in. This is not even necessary: one can just force compilation of `test` (and more) without `-Xcomp` and CompileCommand just by having enough iterations of this loop. It's just not very nice as a test since it compiles a lot more things, and takes longer overall, without benefit. Also, the code couldn't be eliminated overall: `test()` could have some side effects, it would need inlining to conclude it can be removed, and even then, it can't since `test()` assigns `d` and reads `arr`: even if nothing happens actually, I don't think it could remove everything. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25448#discussion_r2108984147 From rcastanedalo at openjdk.org Tue May 27 12:38:56 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 May 2025 12:38:56 GMT Subject: RFR: 8356246: C2: Compilation fails with "assert(bol->is_Bool()) failed: unexpected if shape" in StringConcat::eliminate_unneeded_control In-Reply-To: References: Message-ID: On Tue, 27 May 2025 07:56:59 GMT, Daniel Skantz wrote: > This pull request contains a fix for JDK-8356246. > > During stacked concatenations, a pair of `StringBuilder.append().toString()` links SB and SB2 could have a diamond if structure `(Region -> -> If)` created by String.valueOf that depends on the return value of SB1, which is going away (replaced by top() in `eliminate_call` in stringopts). > > JDK-8271341 added folding of the region of the diamond-if to stringopts to avoid the case where a live part of the graph becomes unreachable as this top() propagates through the graph too quickly. > > JDK-8291775 was a follow-up fix and instead used a constant test as input to the diamond If, as a case was discovered where the If was processed before the Region leading to a broken graph. > > The code in JDK-8271341 assumes that the input to the If is a boolean, not a constant. If two diamond if-region structures in the same StringBuilder candidate share the same test, the second iteration in `eliminate_unneeded_control` will fail with an unexpected input. The proposed solution is to skip over the second iteration as the test has already been replaced by a constant -- both structures will be simplified independently during IGVN. > > Testing: > T1-4. Thanks for finding, reporting, and fixing this issue Daniel! The analysis and fix look good to me, I only have a few minor comments and suggestions. src/hotspot/share/opto/stringopts.cpp line 263: > 261: // was replaced by a constant zero in a previous call to this method. > 262: // Do nothing as the transformation in the previous call ensures both are folded away. > 263: assert(bol == _stringopts->gvn()->intcon(0), "set below."); Could you make the assertion failure message more informative? E.g. something like: Suggestion: assert(bol == _stringopts->gvn()->intcon(0), "shared condition should have been set to false"); test/hotspot/jtreg/compiler/stringopts/TestStackedConcatsSharedTest.java line 30: > 28: * is used as a shared test by two diamond Ifs in the second StringBuilder. > 29: * @run main/othervm compiler.stringopts.TestStackedConcatsSharedTest > 30: * @run main/othervm -Xbatch -XX:-TieredCompilation -Xcomp `-Xcomp` already implies `-Xbatch`: Suggestion: * @run main/othervm -XX:-TieredCompilation -Xcomp ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25461#pullrequestreview-2870772010 PR Review Comment: https://git.openjdk.org/jdk/pull/25461#discussion_r2109063255 PR Review Comment: https://git.openjdk.org/jdk/pull/25461#discussion_r2108990869 From rcastanedalo at openjdk.org Tue May 27 12:38:57 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 May 2025 12:38:57 GMT Subject: RFR: 8356246: C2: Compilation fails with "assert(bol->is_Bool()) failed: unexpected if shape" in StringConcat::eliminate_unneeded_control In-Reply-To: References: Message-ID: On Tue, 27 May 2025 07:58:22 GMT, Daniel Skantz wrote: >> This pull request contains a fix for JDK-8356246. >> >> During stacked concatenations, a pair of `StringBuilder.append().toString()` links SB and SB2 could have a diamond if structure `(Region -> -> If)` created by String.valueOf that depends on the return value of SB1, which is going away (replaced by top() in `eliminate_call` in stringopts). >> >> JDK-8271341 added folding of the region of the diamond-if to stringopts to avoid the case where a live part of the graph becomes unreachable as this top() propagates through the graph too quickly. >> >> JDK-8291775 was a follow-up fix and instead used a constant test as input to the diamond If, as a case was discovered where the If was processed before the Region leading to a broken graph. >> >> The code in JDK-8271341 assumes that the input to the If is a boolean, not a constant. If two diamond if-region structures in the same StringBuilder candidate share the same test, the second iteration in `eliminate_unneeded_control` will fail with an unexpected input. The proposed solution is to skip over the second iteration as the test has already been replaced by a constant -- both structures will be simplified independently during IGVN. >> >> Testing: >> T1-4. > > test/hotspot/jtreg/compiler/stringopts/TestStackedConcatsSharedTest.java line 30: > >> 28: * is used as a shared test by two diamond Ifs in the second StringBuilder. >> 29: * @run main/othervm compiler.stringopts.TestStackedConcatsSharedTest >> 30: * @run main/othervm -Xbatch -XX:-TieredCompilation -Xcomp > > Using -Xcomp for the test. Warming up with many iterations invalidated the optimization due to an unstable If. Please add this information as a comment somewhere in the test file. Perhaps as a comment above the `String.valueOf(s)` calls, where the stable if speculation happens. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25461#discussion_r2109000022 From dskantz at openjdk.org Tue May 27 12:45:13 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Tue, 27 May 2025 12:45:13 GMT Subject: RFR: 8356246: C2: Compilation fails with "assert(bol->is_Bool()) failed: unexpected if shape" in StringConcat::eliminate_unneeded_control [v2] In-Reply-To: References: Message-ID: > This pull request contains a fix for JDK-8356246. > > During stacked concatenations, a pair of `StringBuilder.append().toString()` links SB and SB2 could have a diamond if structure `(Region -> -> If)` created by String.valueOf that depends on the return value of SB1, which is going away (replaced by top() in `eliminate_call` in stringopts). > > JDK-8271341 added folding of the region of the diamond-if to stringopts to avoid the case where a live part of the graph becomes unreachable as this top() propagates through the graph too quickly. > > JDK-8291775 was a follow-up fix and instead used a constant test as input to the diamond If, as a case was discovered where the If was processed before the Region leading to a broken graph. > > The code in JDK-8271341 assumes that the input to the If is a boolean, not a constant. If two diamond if-region structures in the same StringBuilder candidate share the same test, the second iteration in `eliminate_unneeded_control` will fail with an unexpected input. The proposed solution is to skip over the second iteration as the test has already been replaced by a constant -- both structures will be simplified independently during IGVN. > > Testing: > T1-4. Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25461/files - new: https://git.openjdk.org/jdk/pull/25461/files/77a8a5f4..5ae8e411 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25461&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25461&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/25461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25461/head:pull/25461 PR: https://git.openjdk.org/jdk/pull/25461 From dskantz at openjdk.org Tue May 27 12:49:40 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Tue, 27 May 2025 12:49:40 GMT Subject: RFR: 8356246: C2: Compilation fails with "assert(bol->is_Bool()) failed: unexpected if shape" in StringConcat::eliminate_unneeded_control [v3] In-Reply-To: References: Message-ID: > This pull request contains a fix for JDK-8356246. > > During stacked concatenations, a pair of `StringBuilder.append().toString()` links SB and SB2 could have a diamond if structure `(Region -> -> If)` created by String.valueOf that depends on the return value of SB1, which is going away (replaced by top() in `eliminate_call` in stringopts). > > JDK-8271341 added folding of the region of the diamond-if to stringopts to avoid the case where a live part of the graph becomes unreachable as this top() propagates through the graph too quickly. > > JDK-8291775 was a follow-up fix and instead used a constant test as input to the diamond If, as a case was discovered where the If was processed before the Region leading to a broken graph. > > The code in JDK-8271341 assumes that the input to the If is a boolean, not a constant. If two diamond if-region structures in the same StringBuilder candidate share the same test, the second iteration in `eliminate_unneeded_control` will fail with an unexpected input. The proposed solution is to skip over the second iteration as the test has already been replaced by a constant -- both structures will be simplified independently during IGVN. > > Testing: > T1-4. Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: test comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25461/files - new: https://git.openjdk.org/jdk/pull/25461/files/5ae8e411..b1794b74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25461&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25461&range=01-02 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25461.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25461/head:pull/25461 PR: https://git.openjdk.org/jdk/pull/25461 From rcastanedalo at openjdk.org Tue May 27 13:02:52 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 27 May 2025 13:02:52 GMT Subject: RFR: 8356246: C2: Compilation fails with "assert(bol->is_Bool()) failed: unexpected if shape" in StringConcat::eliminate_unneeded_control [v3] In-Reply-To: References: Message-ID: <5YjGsw_5_DKpXKgPbgU_cTDGnKOjSz4ZWJV6BBg0Rkw=.cab481b0-5d92-4278-866e-0c0b70ec7577@github.com> On Tue, 27 May 2025 12:49:40 GMT, Daniel Skantz wrote: >> This pull request contains a fix for JDK-8356246. >> >> During stacked concatenations, a pair of `StringBuilder.append().toString()` links SB and SB2 could have a diamond if structure `(Region -> -> If)` created by String.valueOf that depends on the return value of SB1, which is going away (replaced by top() in `eliminate_call` in stringopts). >> >> JDK-8271341 added folding of the region of the diamond-if to stringopts to avoid the case where a live part of the graph becomes unreachable as this top() propagates through the graph too quickly. >> >> JDK-8291775 was a follow-up fix and instead used a constant test as input to the diamond If, as a case was discovered where the If was processed before the Region leading to a broken graph. >> >> The code in JDK-8271341 assumes that the input to the If is a boolean, not a constant. If two diamond if-region structures in the same StringBuilder candidate share the same test, the second iteration in `eliminate_unneeded_control` will fail with an unexpected input. The proposed solution is to skip over the second iteration as the test has already been replaced by a constant -- both structures will be simplified independently during IGVN. >> >> Testing: >> T1-4. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > test comment Thanks for addressing my comments, looks good! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25461#pullrequestreview-2870981345 From jbhateja at openjdk.org Tue May 27 14:11:34 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 27 May 2025 14:11:34 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v4] In-Reply-To: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: > This is a follow-up PR#22755 to improve Float16 operations inferencing. > > The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352635 - Enabling some test points - Adding test points and some re-factoring - Merge branch 'master' of https://github.com/openjdk/jdk into JDK-8352635 - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352635 - 8352635: Improve inferencing of Float16 operations with constant inputs ------------- Changes: https://git.openjdk.org/jdk/pull/24179/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24179&range=03 Stats: 336 lines in 5 files changed: 237 ins; 38 del; 61 mod Patch: https://git.openjdk.org/jdk/pull/24179.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24179/head:pull/24179 PR: https://git.openjdk.org/jdk/pull/24179 From kvn at openjdk.org Tue May 27 14:50:54 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 27 May 2025 14:50:54 GMT Subject: RFR: 8357781: Deep recursion in PhaseCFG::set_next_call leads to stack overflow [v3] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 09:43:11 GMT, Marc Chevalier wrote: >> There is nothing very wrong here: >> - the graph is not broken >> - the algorithm is correct >> >> It just happens that the graph is very deep, it has a very long, narrow chain of nodes because of crazy unrolling, because of `LoopUnrollLimit=8192`. This depth simply makes the algorithm recurse deeper than the stack size allows. This kind of graph shape is not quite trivial to reproduce. The proposed reproducer is very easy to change into a non-reproducer, with many kinds of change, even that seem harmless to me. >> >> The fix is also pretty direct: let's change the recursive traversal, with a worklist-based iterative one. It's not as elegant, but it doesn't overflow. >> >> How sad stacks are still so bounded... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > address comment Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25448#pullrequestreview-2871414772 From chagedorn at openjdk.org Tue May 27 14:51:56 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 27 May 2025 14:51:56 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll [v5] In-Reply-To: References: Message-ID: <32NPpg_DWMt-_p7PlDpRrm93zuU3_QbgMM1DxRHp6cg=.8e678f48-6339-4760-9922-9327591260c5@github.com> On Tue, 27 May 2025 08:06:37 GMT, Marc Chevalier wrote: >> This assert seems a bit too tight. See the JBS issue to check the math: the bound of `trip_count` should be `<= 2^31`, while the current bound is ` < (julong)max_juint/2` = floor((2^32-1)/2) = (2^32-2) / 2 = 2^31-1. > > Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: > > - +message in assert > - Move asserts around Looks good, thanks for another iteration to improve the asserts and comments :-) ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25295#pullrequestreview-2871419116 From kvn at openjdk.org Tue May 27 14:56:52 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 27 May 2025 14:56:52 GMT Subject: RFR: 8357434: x86: Simplify Interpreter::profile_taken_branch In-Reply-To: References: Message-ID: On Wed, 21 May 2025 08:23:14 GMT, Aleksey Shipilev wrote: > Noticed that `Interpreter::profile_taken_branch` has the same `sbbptr` pattern we have eliminated with [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946). The same logic applies here: the counter is 64-bit, never practically overflows, and no other code cares about it. > > Also tidied up some comments around it. > > Additional testing; > - [x] Linux x86_64 server fastdebug, `tier1 tier2` Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25343#pullrequestreview-2871438278 From kvn at openjdk.org Tue May 27 14:59:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 27 May 2025 14:59:55 GMT Subject: RFR: 8356246: C2: Compilation fails with "assert(bol->is_Bool()) failed: unexpected if shape" in StringConcat::eliminate_unneeded_control [v3] In-Reply-To: References: Message-ID: <6McwkHJf9G6-WyNqvAlA2siOTvxMLbf1ZwaPEdTi_tw=.1408d5b5-cf78-44c5-87be-6b2cf117dca9@github.com> On Tue, 27 May 2025 12:49:40 GMT, Daniel Skantz wrote: >> This pull request contains a fix for JDK-8356246. >> >> During stacked concatenations, a pair of `StringBuilder.append().toString()` links SB and SB2 could have a diamond if structure `(Region -> -> If)` created by String.valueOf that depends on the return value of SB1, which is going away (replaced by top() in `eliminate_call` in stringopts). >> >> JDK-8271341 added folding of the region of the diamond-if to stringopts to avoid the case where a live part of the graph becomes unreachable as this top() propagates through the graph too quickly. >> >> JDK-8291775 was a follow-up fix and instead used a constant test as input to the diamond If, as a case was discovered where the If was processed before the Region leading to a broken graph. >> >> The code in JDK-8271341 assumes that the input to the If is a boolean, not a constant. If two diamond if-region structures in the same StringBuilder candidate share the same test, the second iteration in `eliminate_unneeded_control` will fail with an unexpected input. The proposed solution is to skip over the second iteration as the test has already been replaced by a constant -- both structures will be simplified independently during IGVN. >> >> Testing: >> T1-4. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > test comment Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25461#pullrequestreview-2871451002 From roland at openjdk.org Tue May 27 15:01:30 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 27 May 2025 15:01:30 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v33] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21630/files - new: https://git.openjdk.org/jdk/pull/21630/files/b1da1b13..5ad3f22e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=31-32 Stats: 125 lines in 9 files changed: 85 ins; 2 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From roland at openjdk.org Tue May 27 15:01:30 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 27 May 2025 15:01:30 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v32] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 06:54:29 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/loopnode.cpp line 1096: > >> 1094: if (ShortRunningLongLoop) { >> 1095: add_parse_predicate(Deoptimization::Reason_short_running_long_loop, inner_head, outer_ilt, cloned_sfpt); >> 1096: } > > What happens if there are multiple loops in a method, and one traps because it is longer than expected? Do we then also re-compile the other loop without the new predicate? FTR, there's a bug here: https://bugs.openjdk.org/browse/JDK-8350330 The count of traps of this type for this method needs to exceed `PerMethodTrapLimit` for the speculation to be disabled. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2109437663 From roland at openjdk.org Tue May 27 15:18:02 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 27 May 2025 15:18:02 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v32] In-Reply-To: <4ejvO8emCHE-QpMRGYDht30RlDjfSZh86tLHq7uzYpM=.f43a8e15-353b-4fcd-b342-728f3affa0f1@github.com> References: <4ejvO8emCHE-QpMRGYDht30RlDjfSZh86tLHq7uzYpM=.f43a8e15-353b-4fcd-b342-728f3affa0f1@github.com> Message-ID: On Tue, 27 May 2025 07:22:17 GMT, Emanuel Peter wrote: >> Hmm, or does the `int` `bt` come from `StressLongCountedLoop`, which complicates things here? > > Could `iters_limit` be constant? > Are only long-loops allowed, or can there be int loops here too? You talk about long range checks, which makes me believe this is about long loops. But then below you check for the `bt` of the head, so could that be int? No. There are 2 cases: 1- a long counted loop that may or may not have long range checks 2- an int counted loop that has long range checks With 2-, the counted loop is turned into a 2 loops nest. That's required to transform the long range checks into int range checks so they can be optimized by range check elimination. This is not new in this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2109479394 From kvn at openjdk.org Tue May 27 15:20:59 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 27 May 2025 15:20:59 GMT Subject: RFR: 8357473: Compilation spike leaves many CompileTasks in free list [v2] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 17:34:36 GMT, Aleksey Shipilev wrote: >> See bug for more discussion. >> >> This PR implements the "all the way" solution by removing the free list completely. It complements https://github.com/openjdk/jdk/pull/25364, and can go either first, or second. We will remerge the other one once either integrates. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `compiler` >> - [ ] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8357473-compile-task-free-list > - Also free the lock! > - Comments and indenting > - Basic deletion My testing passed. But let wait AOT JEPs integration. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25409#pullrequestreview-2871531892 From chagedorn at openjdk.org Tue May 27 15:26:52 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 27 May 2025 15:26:52 GMT Subject: RFR: 8354383: C2: enable sinking of Type nodes out of loop [v3] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 08:36:37 GMT, Roland Westrelin wrote: >> `PhaseIdealLoop::try_sink_out_of_loop()` excludes `Type` nodes because >> we ran into some issues where a `Type` node is sunk and then becomes >> `top` but the control path of its uses doesn't become unreachable. >> >> 8349479 should have fixed that so that exception no longer makes >> sense. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopopts.cpp > > Co-authored-by: Christian Hagedorn I'm not sure either, we would not to further investigate if we can find cases that benefit from it. Should we file an RFE either way? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25396#issuecomment-2913004024 From roland at openjdk.org Tue May 27 15:34:21 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 27 May 2025 15:34:21 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v28] In-Reply-To: References: <7r3C8BAViyHKVVJjv4w0YxfIUkfk9PmY0OEt73V_aRI=.baf51fc4-d996-44d0-a1f5-10cf6dc4de8d@github.com> Message-ID: <5MNXUlpqPx6eio4BcGF8tPYxd89k63RPxJ36okkemYQ=.c2c00938-eff3-4a00-92bb-044e6257616f@github.com> On Tue, 27 May 2025 07:54:33 GMT, Emanuel Peter wrote: >>> A few minor last comments but otherwise, it looks good to me! Thanks for all the updates, the patience and the credit! >> >> Thanks for the careful review. I applied your suggestions. > > @rwestrel Let me know if you want us to run some extra testing. Christian said that you might be planning to wait until the JDK26 fork, and merge then, and then we can run testing. Up to you :) Thanks for the careful review @eme64 . I think I addressed your comments in the new commit. > We should definitively hold off until JDK26 though, so we have enough time to fix possible follow-ups. Sure. Fine with me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2913026181 From roland at openjdk.org Tue May 27 15:34:22 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 27 May 2025 15:34:22 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v32] In-Reply-To: References: Message-ID: <_KeVJgVN1At8EIsq7rNG96MM8gXujQLVpftKI-yb9EI=.66fd8284-9173-474f-9019-ff75fcadfb13@github.com> On Tue, 27 May 2025 07:33:05 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/loopnode.cpp line 1278: > >> 1276: #endif >> 1277: entry_control = head->skip_strip_mined()->in(LoopNode::EntryControl); >> 1278: } else if (bt == T_LONG) { > > Suggestion: > > } else if (bt == T_LONG) { > assert(known_short_running_loop, "follows from earlier bailout check."); > > I suppose that is what you mean by `won't need loop limit checks (iters_limit guarantees that)`, right? The `LongCountedLoop` is turned into a `Loop` here which, on a following pass of loop opts, is turned into a `CountedLoop`. That `CountedLoop` doesn't need loop limit check predicates because of the way its bounds are constructed. But, sometimes, by the time the `CountedLoop` is created, some transformation happened to the bounds and the code that creates the `CountedLoop` can't tell the loop limit checks are not needed. The `Cast` is added to make sure the `CountedLoop` logic sees the bounds are such that the loop limit checks should not be added. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r2109517371 From aph at openjdk.org Tue May 27 15:36:14 2025 From: aph at openjdk.org (Andrew Haley) Date: Tue, 27 May 2025 15:36:14 GMT Subject: RFR: 8354674: AArch64: Intrinsify Unsafe::setMemory [v8] In-Reply-To: References: <4LLR5zxDlX1kFvbC9wHErVh6IGD1fH3fponKnlSaICg=.62e5e428-f00b-4fc3-8f1d-973639eceac2@github.com> Message-ID: On Thu, 15 May 2025 16:03:44 GMT, Andrew Haley wrote: >> This intrinsic is generally faster than the current implementation for Panama segment operations for all writes larger than about 8 bytes in size, increasing to more than 2* the performance on larger memory blocks on Graviton 2, between "panama" (C2 generated, what we use now) and "unsafe" (this intrinsic). >> >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentFillUnsafe.panama true 262143 avgt 10 7295.638 ? 0.422 ns/op >> MemorySegmentFillUnsafe.panama false 262143 avgt 10 8345.300 ? 80.161 ns/op >> MemorySegmentFillUnsafe.unsafe true 262143 avgt 10 2930.594 ? 0.180 ns/op >> MemorySegmentFillUnsafe.unsafe false 262143 avgt 10 3136.828 ? 0.232 ns/op > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Copyright format correction > One thing that sometimes helps is a count leading zeroes followed by a multiway switch at the start, or just before the tail, to get started at the right place in the tail (its log-size cascade), for very small inputs. > > This PR #25383 uses clz in that way. > > It also uses an overlapping-store technique to reduce an O(lg N) tail to an O(1) tail, which also depends on the clz step. > My rough notes on the relative performance of overlapping loads and stores are here FWIW: https://cr.openjdk.org/~jrose/jvm/PartialMemoryWord.cpp Mmm, interesting. I had a look at the M1 timings to see what might be going on, and I think it's because the processor can in each clock execute 8 instructions but only one taken branch. It can, however, execute two not-taken branches per clock. At present, if our block is 1 (mod 64) bytes long, then we have a string of 5 taken branches and 1 not-taken branch. However, I couldn't see anything in the JMH results. I realized on closer inspection that performance was very much limited by the C2-generated caller, which is doing far more work than the intrinsic itself. I eventually tweaked the benchmark to call the intrinsic 1000 times, and trivially converting us to ns. I'm not keen on the overlapping-store technique in this case because the code gets IMHO unjustifiably complex, but also we would have different timing behaviour for differently aligned fill operations. This seems to me a bit much for something that should be fairly simple. I did, however, implement the clz-optimized tail. It's great for very short strings (mod 64) but it's worse for the range 32...63 (mod 64). It's also missing the early exit from the log-size cascade, which short-circuits fills that are a whole number of words long. I tried another thing, which was to have _two_ cascades: one of whole-word-sized stores, and one from 0 to 7 bytes. This was better for fills that are a whole number of words long and some other cases, but had its own timing spikes in a few places (e.g. 36 bytes.) I measured the total time for arrays of size 1-128 bytes, and took the average throughput. A: this PR, as checked in. 6.5 cycles/op, 1.8ns. B: one clz-optimized tail. 6.9 cycles/op, 1.9ns. C: two clz-optimized tails. 7.2 cycles/op, 2.0ns. In conclusion: there isn't much in it. We could do better by keeping this code as short as possible, which would allow us to inline the whole thing into its caller rather than the palaver of a C-ABI call to the stub. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25147#issuecomment-2913030626 From mhaessig at openjdk.org Tue May 27 17:32:13 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Tue, 27 May 2025 17:32:13 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences Message-ID: ## Summary On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: ; OptoAssembly 03d decode_heap_oop_not_null R8,R10 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 ; x86 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset This PR adds a peephole optimization to remove such redundant `lea`s. ## The Issue in Detail The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes LoadN -> decodeHeapOop_not_null -> leaP* ______________________________? where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. This leaves us with a handful of possible solutions: 1. implement narrow bases for derived oops in oop maps, 2. perform some dead code elimination after we know which oops are part of oop maps, 3. add a peephole optimization to simply remove unused `lea`s. Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the register allocator. However, rewriting the oop map machinery to remove a single lea is a bit overkill. Because the contents of oop maps are not definitive until after global code motion or the register allocator, we might as well just do a peephole instead of performing more DCE, since this only affects x86. So this PR introduces that peephole. ## Changes This PR - adds an x86 peephole optimization to remove `decodeHeapOop_not_null`s with unused results, - adds a regression IR test with positive and negative tests from all reproducers for this issue, and - adds a microbenchmark to see the effect of the peephole. The peephole is a bit more powerful than just removing a decode with an unused result preceding a `leaP*`. The peephole can also remove the decode if multiple `leaP*`s have it as base, but its result is still unused, the decode can still be removed. Further, if the removal of a decode will lead to a redundant `MemToRegSpillCopy`, that spill copy will also be removed. ## Microbenchmark Results Benchmark Mode Cnt Baseline Error Peephole Error Speedup Units RedundantLeaPeephole.benchStoreNNoAllocParallel avgt 30 1.471 ? 0.146 1.374 ? 0.056 7.06% ns/op RedundantLeaPeephole.benchStoreNNoAllocSerial avgt 30 1.454 ? 0.059 1.345 ? 0.046 8.10% ns/op RedundantLeaPeephole.benchStoreNRemoveSpillParallel avgt 30 10.789 ? 0.307 10.537 ? 0.302 2.39% ns/op RedundantLeaPeephole.benchStoreNRemoveSpillSerial avgt 30 11.364 ? 0.240 11.206 ? 0.165 1.41% ns/op RedundantLeaPeephole.benchStringEquals avgt 30 1.355 ? 0.054 1.23 ? 0.033 10.16% ns/op

Discussion of microbenchmark results The `benchStringEquals` and `benchStoreNNoAlloc*` benchmarks both remove two `lea` instructions and thus exhibit similar speedup. The `benchStoreNRemoveSpill*` benchmarks remove one `lea` and one `mov` from a `MemToRegSpillCopy`. Hence, one would expect a higher speedup. This is the case in absolute numbers, but less significant next to the allocations in this benchmark. The allocations would also explain the higher errors for the `RemoveSpill` benchmarks.
## Testing - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15281434572) - [ ] tier1 through tier2 plus Oracle internal testing for all Oracle supported platforms and OSs - [ ] tier3 through tier5 plus Oracle internal testing for x86 on all supported OSs ## Acknowledgements My thanks go out to @robcasloz for introducing me to the backend, answering my questions, and discussing this issue with me. ------------- Commit messages: - Add microbenchmark - Add peephole to remove redundant leas - Add regression test - Remove trailing spaces Changes: https://git.openjdk.org/jdk/pull/25471/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25471&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8020282 Stats: 723 lines in 6 files changed: 720 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/25471.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25471/head:pull/25471 PR: https://git.openjdk.org/jdk/pull/25471 From duke at openjdk.org Tue May 27 18:48:15 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 27 May 2025 18:48:15 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v17] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Only exclude JVMCI methods that contain a mirror ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/398a4dc4..edefbf62 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=15-16 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Tue May 27 18:52:58 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 27 May 2025 18:52:58 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v16] In-Reply-To: References: Message-ID: <9cNHauTDO-bNBXljLW_q8_NvqjHHSBwv-aG52O-1zdo=.adb08e05-6c64-41f5-b730-8a5f2ca87711@github.com> On Fri, 23 May 2025 17:11:48 GMT, Tom Rodriguez wrote: > I'd like to clarify a bit what's actually done here. Some JVMCI compilation can have an associated instance of InstalledCode that has value written into it by hotspot that point at the nmethod* and the verified entry point. If the mirror object is reclaimed by the garbage collector before the nmethod dies, the mirror field will be cleared. Graal may read those fields but will never write them. JVMCI compilations initiated by the CompileBroker will never have an associated mirror. The mirror object is associated with the method at construction time and will never be changed. So it's not necessary to exclude all JVMCI compiled nmethods from this relocation, only ones which have a non-null mirror object. Thanks for this clarification. I have updated it to only exclude nmethod with mirrors. I want to confirm that is okay to pass false for `phatom_ref` in `get_nmethod_mirror`. We don't read or write to the mirror and only call that to see if one exists so I believe it should be okay. Another option is to add a function to return the mirror_index and that can be used as the check instead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2913591178 From vlivanov at openjdk.org Tue May 27 19:22:52 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 27 May 2025 19:22:52 GMT Subject: RFR: 8357434: x86: Simplify Interpreter::profile_taken_branch In-Reply-To: References: Message-ID: <-1ImNaQ5oCLWnqgqU3e7W9bA9c0xjHW2ctqoDCoOukE=.94d36cac-4301-4492-949d-be50f813d08a@github.com> On Wed, 21 May 2025 08:23:14 GMT, Aleksey Shipilev wrote: > Noticed that `Interpreter::profile_taken_branch` has the same `sbbptr` pattern we have eliminated with [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946). The same logic applies here: the counter is 64-bit, never practically overflows, and no other code cares about it. > > Also tidied up some comments around it. > > Additional testing; > - [x] Linux x86_64 server fastdebug, `tier1 tier2` src/hotspot/cpu/x86/templateTable_x86.cpp line 1690: > 1688: void TemplateTable::branch(bool is_jsr, bool is_wide) { > 1689: __ get_method(rcx); // rcx holds method > 1690: __ profile_taken_branch(rax); // rax holds updated MDP Stale comment at line 1741? Otherwise, looks good. if (UseLoopCounter) { // increment backedge counter for backward branches // rax: MDO // rbx: MDO bumped taken-count ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25343#discussion_r2110003697 From never at openjdk.org Tue May 27 19:25:58 2025 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 27 May 2025 19:25:58 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v17] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 18:48:15 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Only exclude JVMCI methods that contain a mirror We'll address this flag in https://bugs.openjdk.org/browse/JDK-8357619 as I think it no longer serves a purpose other than confusion. Passing false should be correct, but I think we'll probably end up adding this: diff --git a/src/hotspot/share/jvmci/jvmciRuntime.hpp b/src/hotspot/share/jvmci/jvmciRuntime.hpp index 884d11f792e..0efc957aa88 100644 --- a/src/hotspot/share/jvmci/jvmciRuntime.hpp +++ b/src/hotspot/share/jvmci/jvmciRuntime.hpp @@ -117,6 +117,11 @@ class JVMCINMethodData : public ResourceObj { // Gets the JVMCI name of the nmethod (which may be null). const char* name() { return _has_name ? (char*)(((address) this) + sizeof(JVMCINMethodData)) : nullptr; } + // Returns true if this nmethod has a mirror + bool has_mirror() const { + return _nmethod_mirror_index != -1; + } + // Clears the HotSpotNmethod.address field in the mirror. If nm // is dead, the HotSpotNmethod.entryPoint field is also cleared. void invalidate_nmethod_mirror(nmethod* nm); which you could adopt and use for your change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2913746463 From duke at openjdk.org Tue May 27 21:44:46 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 27 May 2025 21:44:46 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v18] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Add JVMCINMethodData::has_mirror() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/edefbf62..a0134a87 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=16-17 Stats: 8 lines in 2 files changed: 5 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Tue May 27 21:44:47 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 27 May 2025 21:44:47 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v17] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 18:48:15 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Only exclude JVMCI methods that contain a mirror All previously raised concerns have been addressed. When you have a moment, I?d appreciate a review. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2914151812 From iveresov at openjdk.org Tue May 27 21:57:11 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Tue, 27 May 2025 21:57:11 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v25] In-Reply-To: References: Message-ID: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: - Merge branch 'master' into pp2 - Missing part of the merge - Merge branch 'master' into pp2 - Merge branch 'master' into pp2 - 8357284: runtime/cds/appcds/aotProfile/AOTProfileFlags.java fails on non-debug platform - 8357283: compiler/debug/TestStressBailout.java hangs when running with AOT cache - Merge branch 'master' into pp2 - Address Ioi's comments - Merge branch 'master' into pp2 - Address Ioi's comments - ... and 80 more: https://git.openjdk.org/jdk/compare/2e8b195a...ed213368 ------------- Changes: https://git.openjdk.org/jdk/pull/24886/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24886&range=24 Stats: 3324 lines in 59 files changed: 3111 ins; 100 del; 113 mod Patch: https://git.openjdk.org/jdk/pull/24886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24886/head:pull/24886 PR: https://git.openjdk.org/jdk/pull/24886 From duke at openjdk.org Tue May 27 23:16:00 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 27 May 2025 23:16:00 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v18] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 21:44:46 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Add JVMCINMethodData::has_mirror() I filed [JDK-8357926](https://bugs.openjdk.org/browse/JDK-8357926) to track progress for allowing JVMCI nmethods with mirrors to be relocated ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2914405959 From fjiang at openjdk.org Wed May 28 01:32:51 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 28 May 2025 01:32:51 GMT Subject: RFR: 8357695: RISC-V: Move vector intrinsic condition checks into match_rule_supported_vector [v4] In-Reply-To: References: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> Message-ID: On Tue, 27 May 2025 07:06:45 GMT, Dingli Zhang wrote: >> Hi all, >> Please take a look and review this PR, thanks! >> >> Currently, the match_rule_supported function in riscv.ad contains checks for vector-related intrinsics (e.g., FmaVF, FmaVD, RoundVF, RoundVD). These checks can be centralized into the match_rule_supported_vector function in the riscv_v.ad file, ensuring consistent handling in their appropriate context. >> >> ### Testing >> * [x] Linux riscv64 server release build on SG2042 > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo Marked as reviewed by fjiang (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25438#pullrequestreview-2873247547 From fyang at openjdk.org Wed May 28 01:44:56 2025 From: fyang at openjdk.org (Fei Yang) Date: Wed, 28 May 2025 01:44:56 GMT Subject: RFR: 8357695: RISC-V: Move vector intrinsic condition checks into match_rule_supported_vector [v4] In-Reply-To: References: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> Message-ID: On Tue, 27 May 2025 07:06:45 GMT, Dingli Zhang wrote: >> Hi all, >> Please take a look and review this PR, thanks! >> >> Currently, the match_rule_supported function in riscv.ad contains checks for vector-related intrinsics (e.g., FmaVF, FmaVD, RoundVF, RoundVD). These checks can be centralized into the match_rule_supported_vector function in the riscv_v.ad file, ensuring consistent handling in their appropriate context. >> >> ### Testing >> * [x] Linux riscv64 server release build on SG2042 > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo Thanks for the update. Latest version looks fine. I built fastdebug build and performed `jdk_vector`, `compiler/vectorization` and `compiler/vectorapi` tests using qemu-system (with rvv vlen=256). Result looks good. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25438#pullrequestreview-2873267658 From dzhang at openjdk.org Wed May 28 02:11:03 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 28 May 2025 02:11:03 GMT Subject: RFR: 8357695: RISC-V: Move vector intrinsic condition checks into match_rule_supported_vector [v4] In-Reply-To: References: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> Message-ID: On Tue, 27 May 2025 07:06:45 GMT, Dingli Zhang wrote: >> Hi all, >> Please take a look and review this PR, thanks! >> >> Currently, the match_rule_supported function in riscv.ad contains checks for vector-related intrinsics (e.g., FmaVF, FmaVD, RoundVF, RoundVD). These checks can be centralized into the match_rule_supported_vector function in the riscv_v.ad file, ensuring consistent handling in their appropriate context. >> >> ### Testing >> * [x] Linux riscv64 server release build on SG2042 > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25438#issuecomment-2914664284 From duke at openjdk.org Wed May 28 02:11:03 2025 From: duke at openjdk.org (duke) Date: Wed, 28 May 2025 02:11:03 GMT Subject: RFR: 8357695: RISC-V: Move vector intrinsic condition checks into match_rule_supported_vector [v4] In-Reply-To: References: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> Message-ID: On Tue, 27 May 2025 07:06:45 GMT, Dingli Zhang wrote: >> Hi all, >> Please take a look and review this PR, thanks! >> >> Currently, the match_rule_supported function in riscv.ad contains checks for vector-related intrinsics (e.g., FmaVF, FmaVD, RoundVF, RoundVD). These checks can be centralized into the match_rule_supported_vector function in the riscv_v.ad file, ensuring consistent handling in their appropriate context. >> >> ### Testing >> * [x] Linux riscv64 server release build on SG2042 > > Dingli Zhang has updated the pull request incrementally with one additional commit since the last revision: > > Fix typo @DingliZhang Your change (at version 7a0e10df6353db09dae4a7eb0756b74756ae3b30) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25438#issuecomment-2914665145 From dzhang at openjdk.org Wed May 28 02:29:56 2025 From: dzhang at openjdk.org (Dingli Zhang) Date: Wed, 28 May 2025 02:29:56 GMT Subject: Integrated: 8357695: RISC-V: Move vector intrinsic condition checks into match_rule_supported_vector In-Reply-To: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> References: <5XUDoYl5ywYR2LRdiEUlcfXCAXoWD0Ls7uewvoGIsHE=.98f21012-5cb0-4f63-a24e-1bab668de05e@github.com> Message-ID: On Mon, 26 May 2025 02:52:01 GMT, Dingli Zhang wrote: > Hi all, > Please take a look and review this PR, thanks! > > Currently, the match_rule_supported function in riscv.ad contains checks for vector-related intrinsics (e.g., FmaVF, FmaVD, RoundVF, RoundVD). These checks can be centralized into the match_rule_supported_vector function in the riscv_v.ad file, ensuring consistent handling in their appropriate context. > > ### Testing > * [x] Linux riscv64 server release build on SG2042 This pull request has now been integrated. Changeset: 96fb31e2 Author: Dingli Zhang Committer: Feilong Jiang URL: https://git.openjdk.org/jdk/commit/96fb31e2dbc16875c6c8183096cd03f30d0632ee Stats: 29 lines in 2 files changed: 14 ins; 15 del; 0 mod 8357695: RISC-V: Move vector intrinsic condition checks into match_rule_supported_vector Reviewed-by: fyang, fjiang ------------- PR: https://git.openjdk.org/jdk/pull/25438 From shade at openjdk.org Wed May 28 07:26:42 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 28 May 2025 07:26:42 GMT Subject: RFR: 8357434: x86: Simplify Interpreter::profile_taken_branch [v2] In-Reply-To: References: Message-ID: > Noticed that `Interpreter::profile_taken_branch` has the same `sbbptr` pattern we have eliminated with [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946). The same logic applies here: the counter is 64-bit, never practically overflows, and no other code cares about it. > > Also tidied up some comments around it. > > Additional testing; > - [x] Linux x86_64 server fastdebug, `tier1 tier2` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Stale comment - Merge branch 'master' into JDK-8357434-x86-profile-taken - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25343/files - new: https://git.openjdk.org/jdk/pull/25343/files/6dfa2db9..816b7af7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25343&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25343&range=00-01 Stats: 60495 lines in 967 files changed: 40475 ins; 14309 del; 5711 mod Patch: https://git.openjdk.org/jdk/pull/25343.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25343/head:pull/25343 PR: https://git.openjdk.org/jdk/pull/25343 From rcastanedalo at openjdk.org Wed May 28 07:36:56 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 28 May 2025 07:36:56 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> Message-ID: On Thu, 22 May 2025 15:20:14 GMT, Roland Westrelin wrote: >> An `Initialize` node for an `Allocate` node is created with a memory >> `Proj` of adr type raw memory. In order for stores to be captured, the >> memory state out of the allocation is a `MergeMem` with slices for the >> various object fields/array element set to the raw memory `Proj` of >> the `Initialize` node. If `Phi`s need to be created during later >> transformations from this memory state, The `Phi` for a particular >> slice gets its adr type from the type of the `Proj` which is raw >> memory. If during macro expansion, the `Allocate` is found to have no >> use and so can be removed, the `Proj` out of the `Initialize` is >> replaced by the memory state on input to the `Allocate`. A `Phi` for >> some slice for a field of an object will end up with the raw memory >> state on input to the `Allocate` node. As a result, memory state at >> the `Phi` is incorrect and incorrect execution can happen. >> >> The fix I propose is, rather than have a single `Proj` for the memory >> state out of the `Initialize` with adr type raw memory, to use one >> `Proj` per slice added to the memory state after the `Initalize`. Each >> of the `Proj` should return the right adr type for its slice. For that >> I propose having a new type of `Proj`: `NarrowMemProj` that captures >> the right adr type. >> >> Logic for the construction of the `Allocate`/`Initialize` subgraph is >> tweaked so the right adr type captured in is own `NarrowMemProj` is >> added to the memory sugraph. Code that removes an allocation or moves >> it also has to be changed so it correctly takes the multiple memory >> projections out of the `Initialize` node into account. >> >> One tricky issue is that when EA split types for a scalar replaceable >> `Allocate` node: >> >> 1- the adr type captured in the `NarrowMemProj` becomes out of sync >> with the type of the slices for the allocation >> >> 2- before EA, the memory state for one particular field out of the >> `Initialize` node can be used for a `Store` to the just allocated >> object or some other. So we can have a chain of `Store`s, some to >> the newly allocated object, some to some other objects, all of them >> using the state of `NarrowMemProj` out of the `Initialize`. After >> split unique types, the `NarrowMemProj` is for the slice of a >> particular allocation. So `Store`s to some other objects shouldn't >> use that memory state but the memory state before the `Allocate`. >> >> For that, I added logic to update the adr type of `NarrowMemProj` >> during split uni... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review In the common case where allocations are not eliminated, matching transforms the introduced `NarrowMemProj` nodes into a sequence of redundant, raw `MemProj` nodes, see e.g. B6 here: [after-gcm.pdf](https://github.com/user-attachments/files/20477560/after-gcm.pdf). Would it be possible to clean them up during matching (or perhaps already during, or right after, macro expansion)? ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24570#pullrequestreview-2873937067 From epeter at openjdk.org Wed May 28 07:48:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 May 2025 07:48:53 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short In-Reply-To: References: Message-ID: On Mon, 26 May 2025 07:15:31 GMT, Jasmine Karthikeyan wrote: > Hi all, > This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. > > The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. > > I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! @jaskarth Thanks for looking at this! How do we know that we have all relevant instructions in the list of `can_subword_truncate`? It seems some shift operations should also be allowed, at least left shift. I also wonder if there could be ways to truncate long-operations, and we would have to list those as well in `can_subword_truncate`? I was wondering if we maybe needed a few more tests, given my comments in the `TestShort.java` attached in JBS: // While we are at it, we should also have tests for this, even though it currently does not vectorize, // but it may in the future and then we have to catch the truncation. // out[i] = (short)Long.bitCount(a[i]); //out[i] = (short)Integer.rotateLeft(a[i], b[i]); And just for good measure: should we also add tests for `char`? src/hotspot/share/opto/superword.cpp line 2496: > 2494: int opc = in->Opcode(); > 2495: return opc == Op_AddI || opc == Op_SubI || opc == Op_MulI || opc == Op_AndI || opc == Op_OrI || opc == Op_XorI > 2496: || opc == Op_ReverseBytesS || opc == Op_ReverseBytesUS; A switch might look nicer here, and be easier to extend later on ;) src/hotspot/share/opto/superword.cpp line 2553: > 2551: const Type* vt = vtn; > 2552: int op = in->Opcode(); > 2553: if (!can_subword_truncate(in)) { It seems `can_subword_truncate` does not cover `VectorNode::is_shift_opcode`, is that correct? Maybe we are missing IR tests that catch this, scary! test/hotspot/jtreg/compiler/vectorization/TestSubwordTruncation.java line 64: > 62: > 63: // Shorts > 64: Suggestion: Nit: you don't have a similar comment for other types, so just drop it here too ;) ------------- PR Review: https://git.openjdk.org/jdk/pull/25440#pullrequestreview-2873934331 PR Comment: https://git.openjdk.org/jdk/pull/25440#issuecomment-2915316621 PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2111162309 PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2111153772 PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2111155491 From epeter at openjdk.org Wed May 28 07:48:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 May 2025 07:48:54 GMT Subject: RFR: 8350177: C2 SuperWord: Integer.numberOfLeadingZeros, numberOfTrailingZeros, reverse and bitCount have input types wrongly turncated for byte and short In-Reply-To: References: Message-ID: <9d8EW1k2YAwyyeLvIG5Fnqpjx-3PdrnBq_bildM8jsE=.a3fd55f7-5586-4302-8a14-c2d251cf6fe4@github.com> On Wed, 28 May 2025 07:38:16 GMT, Emanuel Peter wrote: >> Hi all, >> This patch fixes cases in SuperWord when compiling subword types where vectorized code would be given a narrower type than expected, leading to miscompilation due to truncation. This fix is a generalization of the same fix applied for `Integer.reverseBytes` in [JDK-8305324](https://bugs.openjdk.org/browse/JDK-8305324). The patch introduces a check for nodes that are known to tolerate truncation, so that any future cases of subword truncation will avoid creating miscompiled code. >> >> The patch reuses the existing logic to set the type of the vectors to int, which currently disables vectorization for the affected patterns entirely. Once [JDK-8342095](https://bugs.openjdk.org/browse/JDK-8342095) is merged and automatic casting support is added the autovectorizer should automatically insert casts to and from int, maintaining correctness. >> >> I've added an IR test that checks for correctly compiled outputs. Thoughts and reviews would be appreciated! > > src/hotspot/share/opto/superword.cpp line 2496: > >> 2494: int opc = in->Opcode(); >> 2495: return opc == Op_AddI || opc == Op_SubI || opc == Op_MulI || opc == Op_AndI || opc == Op_OrI || opc == Op_XorI >> 2496: || opc == Op_ReverseBytesS || opc == Op_ReverseBytesUS; > > A switch might look nicer here, and be easier to extend later on ;) This list is a little scary... how do we know that we have all cases in it, and we are not getting regressions because we are missing some? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25440#discussion_r2111167881 From mchevalier at openjdk.org Wed May 28 07:55:04 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 28 May 2025 07:55:04 GMT Subject: RFR: 8357781: Deep recursion in PhaseCFG::set_next_call leads to stack overflow [v3] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 09:43:11 GMT, Marc Chevalier wrote: >> There is nothing very wrong here: >> - the graph is not broken >> - the algorithm is correct >> >> It just happens that the graph is very deep, it has a very long, narrow chain of nodes because of crazy unrolling, because of `LoopUnrollLimit=8192`. This depth simply makes the algorithm recurse deeper than the stack size allows. This kind of graph shape is not quite trivial to reproduce. The proposed reproducer is very easy to change into a non-reproducer, with many kinds of change, even that seem harmless to me. >> >> The fix is also pretty direct: let's change the recursive traversal, with a worklist-based iterative one. It's not as elegant, but it doesn't overflow. >> >> How sad stacks are still so bounded... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > address comment Thank you all for review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25448#issuecomment-2915330483 From mchevalier at openjdk.org Wed May 28 07:55:05 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 28 May 2025 07:55:05 GMT Subject: Integrated: 8357781: Deep recursion in PhaseCFG::set_next_call leads to stack overflow In-Reply-To: References: Message-ID: On Mon, 26 May 2025 12:07:08 GMT, Marc Chevalier wrote: > There is nothing very wrong here: > - the graph is not broken > - the algorithm is correct > > It just happens that the graph is very deep, it has a very long, narrow chain of nodes because of crazy unrolling, because of `LoopUnrollLimit=8192`. This depth simply makes the algorithm recurse deeper than the stack size allows. This kind of graph shape is not quite trivial to reproduce. The proposed reproducer is very easy to change into a non-reproducer, with many kinds of change, even that seem harmless to me. > > The fix is also pretty direct: let's change the recursive traversal, with a worklist-based iterative one. It's not as elegant, but it doesn't overflow. > > How sad stacks are still so bounded... This pull request has now been integrated. Changeset: 1d57ff8a Author: Marc Chevalier URL: https://git.openjdk.org/jdk/commit/1d57ff8ad4938bc9ca9b1996eb200c1b51bdf300 Stats: 82 lines in 3 files changed: 74 ins; 0 del; 8 mod 8357781: Deep recursion in PhaseCFG::set_next_call leads to stack overflow Reviewed-by: thartmann, kvn, mhaessig ------------- PR: https://git.openjdk.org/jdk/pull/25448 From duke at openjdk.org Wed May 28 08:10:51 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Wed, 28 May 2025 08:10:51 GMT Subject: RFR: 8324720: Instruction selection does not respect -XX:-UseBMI2Instructions flag In-Reply-To: References: Message-ID: <9-DLn3ISAWwAh5540SZ7I6gHu5Q9WoGLF-k-Etu_tYs=.e9bc84a7-11f2-416e-b0c8-909e0b45d162@github.com> On Fri, 23 May 2025 13:50:09 GMT, Saranya Natarajan wrote: > While executing a function performing `a >> b` operation with `?XX:-UseBMI2Instructions` flag, the generated code contains BMI2 instruction `sarx eax,esi,edx`. The expected output should not contain any BMI2 instruction. > > ### Analysis and solution > > As suggested by @merykitty in [JDK-8324720](https://bugs.openjdk.org/browse/JDK-8324720) , the initial idea was to make `VM_Version::supports_bmi2()` respect` UseBMI2Instructions `flag by disabling BMI2 feature when `UseBMI2Instructions` runtime flag is explicitly set to false. This fix is similar to how other runtime flags such as, `UseAPX` and `UseAVX`, enable or disable specific code and register set. However, some test failures were encountered while running tests on this fix. > > The first set of failures were caused by assertion check on `VM_Version::supports_bmi2()` statement while generating some BMI2 specific instructions. This was caused by the stub generator generating AVX-512 specific code that uses these BMI2 instructions. It should be noted that the `UseAVX` flag is set by default to the highest supported version available in x86 machine. This in turn allows AVX-512 specific code generation whenever possible. In order to not comprise the performance benefits of using AVX-512, the proposed fix only disables BMI2 feature if AVX-512 features are also disabled (or not available in the machine) along with the UseBMI2Instructions flag. > > The second failure occured in `compiler/intrinsics/sha/cli/TestUseSHA512IntrinsicsOptionOnSupportedCPU.java` where a warning "_Intrinsics for SHA-384 and SHA-512 crypto hash functions not available on this CPU_." was returned on a AMD64 machine that had support for SHA512. Looking into `compiler/testlibrary/sha/predicate/IntrinsicPredicates.java` it was found that the predicate for AMD64 was not in line with the changes introduced by [JDK-8341052](https://bugs.openjdk.org/browse/JDK-8341052) in commit [85c1aea](https://github.com/openjdk/jdk/pull/20633/commits/85c1aea90b10014aa34dfc902dff2bfd31bd70c0) . Thank you for the review. My first approach to the fix was inline with your comments. I will go back and implement the changes based on this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25415#issuecomment-2915378455 From epeter at openjdk.org Wed May 28 08:36:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 May 2025 08:36:51 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) In-Reply-To: References: Message-ID: On Thu, 22 May 2025 08:35:18 GMT, Roland Westrelin wrote: > The test case has an out of loop `Store` with an `AddP` address > expression that has other uses and is in the loop body. Schematically, > only showing the address subgraph and the bases for the `AddP`s: > > > Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 > -> CastPP#110 > > > Both `AddP`s have the same base, a `CastPP` that's also in the loop > body. > > That loop is a counted loop and only has 3 iterations so is fully > unrolled. First, one iteration is peeled: > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > The `AddP`s and `CastPP` are cloned (because in the loop body). As > part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is > called. It finds the test that guards `CastPP#283` in the peeled > iteration dominates and replaces the test that guards `CastPP#110` > (the test in the peeled iteration is the clone of the test in the > loop). That causes `CastPP#110`'s control to be updated to that of the > test in the peeled iteration and to be yanked from the loop. So now > `CastPP#283` and `CastPP#110` have the same inputs. > > Next unrolling happens: > > > /-> CastPP#110 > /-> AddP#400 -> AddP#401 -> CastPP#110 > Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 > \ -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > `AddP`s are cloned once more but not the `CastPP`s because they are > both in the peeled iteration now. A new `Phi` is added. > > Next igvn runs. It's going to push the `AddP`s through the `Phi`s. > > Through `Phi#477`: > > > > /-> CastPP#110 > Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 > \ -> AddP#134 -> CastPP#110 > -> AddP#277 -> AddP#278 -> CastPP#283 > -> CastPP#283 > > > > Through `Phi#360`: > > > /-> AddP#134 -> CastPP#110 > /-> Phi#509 -> AddP#401 -> CastPP#110 > Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 > -> Phi#514 -> CastPP#283 > -> CastP#110 > > > Then `Phi#514` which has 2 `CastPP`s as input with identical inputs is > transformed into anot... @rwestrel Thanks for looking into this! My (somewhat limited) experience with delaying optimizations is that this can be quite brittle. You need to get the condition just right, otherwise it just happens again in some generalized case again - maybe you check for 1 level, and later it happens with 2 or more layers. I'm half-understanding the example you present. Can you show the IR nodes for your last step: Store#195 -> AddP#516 -> AddP#544 -> CastPP#110 -> CastPP#529 What exactly are the bases there? Your simplified drawings seem to show the flow of computation, but I cannot see what the bases are in it, right? You could enhance it, for example with `AddP#nnn(base:nnn)`. I think that would help me follow the example. Maybe some more full IR snippets could be helpful, maybe even IGV drawings. But that may be more work for you. I'm wondering if we could not have some other "cleanup" optimizations that fix up the bases. What are the assumptions about merging AddP's at a Phi? Is the base from before the Phi propagated to after the Phi? I'm missing some base understanding here to see through this ;) src/hotspot/share/opto/cfgnode.cpp line 2107: > 2105: } > 2106: return false; > 2107: } You check for a single level here. Could the same happen over multiple levels? ------------- PR Review: https://git.openjdk.org/jdk/pull/25386#pullrequestreview-2874076352 PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2111247366 From epeter at openjdk.org Wed May 28 08:36:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 May 2025 08:36:52 GMT Subject: RFR: 8351889: C2 crash: assertion failed: Base pointers must match (addp 344) In-Reply-To: References: Message-ID: On Wed, 28 May 2025 08:21:04 GMT, Emanuel Peter wrote: >> The test case has an out of loop `Store` with an `AddP` address >> expression that has other uses and is in the loop body. Schematically, >> only showing the address subgraph and the bases for the `AddP`s: >> >> >> Store#195 -> AddP#133 -> AddP#134 -> CastPP#110 >> -> CastPP#110 >> >> >> Both `AddP`s have the same base, a `CastPP` that's also in the loop >> body. >> >> That loop is a counted loop and only has 3 iterations so is fully >> unrolled. First, one iteration is peeled: >> >> >> /-> CastPP#110 >> Store#195 -> Phi#360 -> AddP#133 -> AddP#134 -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> The `AddP`s and `CastPP` are cloned (because in the loop body). As >> part of peeling, `PhaseIdealLoop::peeled_dom_test_elim()` is >> called. It finds the test that guards `CastPP#283` in the peeled >> iteration dominates and replaces the test that guards `CastPP#110` >> (the test in the peeled iteration is the clone of the test in the >> loop). That causes `CastPP#110`'s control to be updated to that of the >> test in the peeled iteration and to be yanked from the loop. So now >> `CastPP#283` and `CastPP#110` have the same inputs. >> >> Next unrolling happens: >> >> >> /-> CastPP#110 >> /-> AddP#400 -> AddP#401 -> CastPP#110 >> Store#195 -> Phi#360 -> Phi#477 -> AddP#133 -> AddP#134 -> CastPP#110 >> \ -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> `AddP`s are cloned once more but not the `CastPP`s because they are >> both in the peeled iteration now. A new `Phi` is added. >> >> Next igvn runs. It's going to push the `AddP`s through the `Phi`s. >> >> Through `Phi#477`: >> >> >> >> /-> CastPP#110 >> Store#195 -> Phi#360 -> AddP#510 -> Phi#509 -> AddP#401 -> CastPP#110 >> \ -> AddP#134 -> CastPP#110 >> -> AddP#277 -> AddP#278 -> CastPP#283 >> -> CastPP#283 >> >> >> >> Through `Phi#360`: >> >> >> /-> AddP#134 -> CastPP#110 >> /-> Phi#509 -> AddP#401 -> CastPP#110 >> Store#195 -> AddP#516 -> Phi#515 -> AddP#278 -> CastPP#283 >> -> Phi#514 -> CastPP#283 >> ... > > src/hotspot/share/opto/cfgnode.cpp line 2107: > >> 2105: } >> 2106: return false; >> 2107: } > > You check for a single level here. Could the same happen over multiple levels? If an update should come from further up, but has not propagated down? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25386#discussion_r2111253290 From chagedorn at openjdk.org Wed May 28 08:41:54 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 28 May 2025 08:41:54 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v5] In-Reply-To: References: Message-ID: On Tue, 6 May 2025 10:28:59 GMT, Jatin Bhateja wrote: >>> @jatin-bhateja I think I'd rather wait until you have more thorough testing and the proofs I asked for, otherwise I would need to run the testing twice ;) >> >> I have added comments in the code which give sufficient details, let me know if you still need more explanation > >> @jatin-bhateja I really don't want you to feel forced to do anything here. If you don't want to write the tests or proofs, then I would suggest just to "backout" the problematic changes ?? I'm sure someone else will do both proofs and tests once we can do these optimizations even more powerfully with `KnownBits`. > > Hi @eme64 , Thanks for your pointers, let me do the needful. Hi @jatin-bhateja, any update on this? Just to let you know, the fork is coming up next Thursday but we would still have time to fix it in RDP 1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2915466367 From eosterlund at openjdk.org Wed May 28 08:54:26 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 28 May 2025 08:54:26 GMT Subject: RFR: 8351997: AArch64: Interpreter volatile reference stores with G1 are not sequentially consistent Message-ID: The optimized fast_aputfield bytecode on AArch64 stores the field flags in r3, and performs the leading and trailing fencing depending on its volatile bit being set or not. However, r3 is also the last temp register passed in to the barrier set for reference stores, and G1 clobbers it in a way that may clear the volatile bit. Then the trailing fence won't get executed, and sequential consistency is broken. My fix puts the flags in r5 instead, which is the register that was used by normal aputfield bytecodes. This way, barriers don't clobber the volatile bits. This bug has been observed to mess up a classic Dekker duality in the java.util.concurrent.Exchanger class, leading to a hang in the test/jdk/java/util/concurrent/Exchanger/ExchangeLoops.java test that exercises it. Using G1 and -Xint a reproducer hangs 30/100 times in mach5. With the fix, the same reproducer hangs 0/100 times. ------------- Commit messages: - 8351997: AArch64: Interpreter volatile reference stores with G1 are not sequentially consistent Changes: https://git.openjdk.org/jdk/pull/25483/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25483&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351997 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25483.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25483/head:pull/25483 PR: https://git.openjdk.org/jdk/pull/25483 From vklang at openjdk.org Wed May 28 08:59:52 2025 From: vklang at openjdk.org (Viktor Klang) Date: Wed, 28 May 2025 08:59:52 GMT Subject: RFR: 8351997: AArch64: Interpreter volatile reference stores with G1 are not sequentially consistent In-Reply-To: References: Message-ID: On Wed, 28 May 2025 08:49:17 GMT, Erik ?sterlund wrote: > The optimized fast_aputfield bytecode on AArch64 stores the field flags in r3, and performs the leading and trailing fencing depending on its volatile bit being set or not. However, r3 is also the last temp register passed in to the barrier set for reference stores, and G1 clobbers it in a way that may clear the volatile bit. Then the trailing fence won't get executed, and sequential consistency is broken. > > My fix puts the flags in r5 instead, which is the register that was used by normal aputfield bytecodes. This way, barriers don't clobber the volatile bits. > > This bug has been observed to mess up a classic Dekker duality in the java.util.concurrent.Exchanger class, leading to a hang in the test/jdk/java/util/concurrent/Exchanger/ExchangeLoops.java test that exercises it. Using G1 and -Xint a reproducer hangs 30/100 times in mach5. With the fix, the same reproducer hangs 0/100 times. Can confirm that this observably mitigates the reported issue with Exchanger in ExchangeLoops. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25483#issuecomment-2915523409 From epeter at openjdk.org Wed May 28 09:17:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 May 2025 09:17:55 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v4] In-Reply-To: References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> Message-ID: <6PFX21b9eT5mQv8Ym7b_RuKNpnuQ5CVqhc8TKxstlYo=.eb7d9f85-5e49-4e8f-b17a-c8e3728e7624@github.com> On Tue, 27 May 2025 14:11:34 GMT, Jatin Bhateja wrote: >> This is a follow-up PR#22755 to improve Float16 operations inferencing. >> >> The existing scheme to detect Float16 operations for some operations is based on pattern matching which expects to receive inputs through ConvHF2F IR, this patch extends matching to accept constant floating point inputs within the Float16 value range. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352635 > - Enabling some test points > - Adding test points and some re-factoring > - Merge branch 'master' of https://github.com/openjdk/jdk into JDK-8352635 > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352635 > - 8352635: Improve inferencing of Float16 operations with constant inputs @jatin-bhateja That looks very promising, thanks for working on that! src/hotspot/share/opto/convertnode.cpp line 265: > 263: Node* con_inp = nullptr; > 264: Node* var_inp = nullptr; > 265: if (Float16NodeFactory::is_float32_binary_oper(in(1)->Opcode())) { It would be nice if you had a summary here, mentioning what "patterns" you are transforming `from -> to`. src/hotspot/share/opto/convertnode.cpp line 280: > 278: } > 279: > 280: if (con_inp && var_inp && These are implicit null checks. The style guide does not allow this: https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md > Do not use ints or pointers as (implicit) booleans with &&, ||, if, while. Instead, compare explicitly, i.e. if (x != 0) or if (ptr != nullptr), etc. Suggestion: if (con_inp != nullptr && var_inp != nullptr && src/hotspot/share/opto/convertnode.cpp line 290: > 288: // If constant lie within Float16 value range, convert it to > 289: // a half-float constant. > 290: if (StubRoutines::hf2f(StubRoutines::f2hf(conF)) == conF) { How does this behave with `NaN` values? Do you have a test for that below? src/hotspot/share/opto/convertnode.cpp line 298: > 296: } else { > 297: f16bOp = phase->transform(Float16NodeFactory::make(f32bOp->Opcode(), f32bOp->in(0), new_var_inp, new_con_inp)); > 298: } Why is the order important here? A comment could help :) src/hotspot/share/opto/subnode.cpp line 566: > 564: // applicable to other floating point types. > 565: // There are no known undefined, unspecified or implimentation specific > 566: // behaviors w.r.t to floating point non-pointer subtraction. That sounds like we are not quite sure "no known" ... problems. Could there be any, or are we sure there are none? test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 320: > 318: res += Float.floatToFloat16(POSITIVE_ZERO_VAR.floatValue() / INEXACT_FP16); > 319: assertResult(Float.float16ToFloat(res), 32.125f, "testInexactFP16ConstantPatterns"); > 320: } Alignment is messed up by one space indentation. Can you add a comment why we are expecting none of the `HF` ops here? Are we expecting any other ops, maybe `F` ops? It could be good to check for that, so that we are sure that we get anything even close to our expectation. test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 363: > 361: res += Float.floatToFloat16(POSITIVE_ZERO_VAR.floatValue() / EXACT_FP16); > 362: assertResult(Float.float16ToFloat(res), 32.125f, "testExactFP16ConstantPatterns"); > 363: } Can we have a test that picks a random `FP16` value, and does result verification on it? Because currently, you are testing the new pattern only with a few example values. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24179#pullrequestreview-2874142792 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2111290680 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2111316864 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2111320705 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2111334455 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2111337686 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2111347398 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2111351316 From epeter at openjdk.org Wed May 28 09:17:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 May 2025 09:17:55 GMT Subject: RFR: 8352635: Improve inferencing of Float16 operations with constant inputs [v4] In-Reply-To: <6PFX21b9eT5mQv8Ym7b_RuKNpnuQ5CVqhc8TKxstlYo=.eb7d9f85-5e49-4e8f-b17a-c8e3728e7624@github.com> References: <44nVQBYgzCOB2mAB9xtAPvkUcOMJOITA2VjMdDFgm1g=.48266693-48bf-41db-8871-a7dcafe93509@github.com> <6PFX21b9eT5mQv8Ym7b_RuKNpnuQ5CVqhc8TKxstlYo=.eb7d9f85-5e49-4e8f-b17a-c8e3728e7624@github.com> Message-ID: <-d846uXzYApO-CUq6peUgguY2YLpvG6ioAdVkN1wHG0=.94a09310-9d87-481c-b374-05ae99db0133@github.com> On Wed, 28 May 2025 08:42:07 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352635 >> - Enabling some test points >> - Adding test points and some re-factoring >> - Merge branch 'master' of https://github.com/openjdk/jdk into JDK-8352635 >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352635 >> - 8352635: Improve inferencing of Float16 operations with constant inputs > > src/hotspot/share/opto/convertnode.cpp line 265: > >> 263: Node* con_inp = nullptr; >> 264: Node* var_inp = nullptr; >> 265: if (Float16NodeFactory::is_float32_binary_oper(in(1)->Opcode())) { > > It would be nice if you had a summary here, mentioning what "patterns" you are transforming `from -> to`. Something like: Suggestion: // Pattern: ConvF2HF(binopF(conF, ConvHF2F(varS))) // -> ReinterpretHF2SNode(binopHF(conHF, ReinterpretS2HFNode(varS))) // This allows other HF operations in inputs and outputs to fold away the reinterpret nodes, // hopefully ending up with mostly HF arithmetic operations only. Node* con_inp = nullptr; Node* var_inp = nullptr; if (Float16NodeFactory::is_float32_binary_oper(in(1)->Opcode())) { You could also rename `f32bOp -> binopF`, `con_inp -> conF` and `var_inp -> varS`. I think these names are a bit more expressive, and carry the expected type in the name, that would make reading this code easier. > test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 320: > >> 318: res += Float.floatToFloat16(POSITIVE_ZERO_VAR.floatValue() / INEXACT_FP16); >> 319: assertResult(Float.float16ToFloat(res), 32.125f, "testInexactFP16ConstantPatterns"); >> 320: } > > Alignment is messed up by one space indentation. > > Can you add a comment why we are expecting none of the `HF` ops here? > Are we expecting any other ops, maybe `F` ops? > It could be good to check for that, so that we are sure that we get anything even close to our expectation. Same for the tests below :) > test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 363: > >> 361: res += Float.floatToFloat16(POSITIVE_ZERO_VAR.floatValue() / EXACT_FP16); >> 362: assertResult(Float.float16ToFloat(res), 32.125f, "testExactFP16ConstantPatterns"); >> 363: } > > Can we have a test that picks a random `FP16` value, and does result verification on it? Because currently, you are testing the new pattern only with a few example values. And: your pattern matching allows the constant to be lhs or rhs, so you should add corresponding tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2111305918 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2111348118 PR Review Comment: https://git.openjdk.org/jdk/pull/24179#discussion_r2111355331 From epeter at openjdk.org Wed May 28 09:22:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 May 2025 09:22:54 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) In-Reply-To: References: Message-ID: On Tue, 20 May 2025 19:39:30 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > This pr is splited from https://github.com/openjdk/jdk/pull/25341, and contains only share code change. > > This patch enable the vectorization of statement like `fd_1 bop fd_2 ? res_1 : res_2` in a loop. > > The current behaviour on other platforms support vecatorization of `fd_1 bop fd_2 ? res_1 : res_2` in a loop only when `fd` and `res` have the same size, but this constraint seems not necessary at least not necessary on riscv, so I relax this constraint on riscv, maybe on other platforms it can be relaxed too, but currently I only made it work on riscv. > Besides of this, I also relax the constraint on transforming Op_CMoveI/L to Op_VectorBlend on riscv, this bring some extra benefit when the `res` is not float or double types. > Both relaxation bring performance benefit via vectorization. > > Compared with other runs (master, master with `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on, patch without flags turned on), average improvement introduced by the patch with `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on is more than 2.1 times, in some cases it can bring more than 4 times improvement. > When `-XX:-UseVectorCmov -XX:-UseCMoveUnconditionally` turned off, there is no regression on average. > > Check more details at: https://github.com/openjdk/jdk/pull/25341. > > Thanks @Hamlin-Li Thanks for working on this! Can you please provide the the JMH benchmark results for your measurements? It would also be good to have some IR tests, that cover the newly vectorized cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25336#issuecomment-2915593133 From rcastanedalo at openjdk.org Wed May 28 09:26:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 28 May 2025 09:26:54 GMT Subject: RFR: 8327963: C2: fix construction of memory graph around Initialize node to prevent incorrect execution if allocation is removed [v8] In-Reply-To: <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> References: <3jUFOPYDIqmzEywhzf58guwS0qZGBUCMZ3lXeltlS3c=.5c82601f-cf4d-4b2a-a525-1f8f4c7c4a3b@github.com> <1gdeBnZ7YuIf9CgQW2bCXkDDBWPjUgRnickHts-fvzE=.e6e901ba-3e9f-41a2-9c68-167a879e9655@github.com> Message-ID: On Thu, 22 May 2025 15:20:14 GMT, Roland Westrelin wrote: >> An `Initialize` node for an `Allocate` node is created with a memory >> `Proj` of adr type raw memory. In order for stores to be captured, the >> memory state out of the allocation is a `MergeMem` with slices for the >> various object fields/array element set to the raw memory `Proj` of >> the `Initialize` node. If `Phi`s need to be created during later >> transformations from this memory state, The `Phi` for a particular >> slice gets its adr type from the type of the `Proj` which is raw >> memory. If during macro expansion, the `Allocate` is found to have no >> use and so can be removed, the `Proj` out of the `Initialize` is >> replaced by the memory state on input to the `Allocate`. A `Phi` for >> some slice for a field of an object will end up with the raw memory >> state on input to the `Allocate` node. As a result, memory state at >> the `Phi` is incorrect and incorrect execution can happen. >> >> The fix I propose is, rather than have a single `Proj` for the memory >> state out of the `Initialize` with adr type raw memory, to use one >> `Proj` per slice added to the memory state after the `Initalize`. Each >> of the `Proj` should return the right adr type for its slice. For that >> I propose having a new type of `Proj`: `NarrowMemProj` that captures >> the right adr type. >> >> Logic for the construction of the `Allocate`/`Initialize` subgraph is >> tweaked so the right adr type captured in is own `NarrowMemProj` is >> added to the memory sugraph. Code that removes an allocation or moves >> it also has to be changed so it correctly takes the multiple memory >> projections out of the `Initialize` node into account. >> >> One tricky issue is that when EA split types for a scalar replaceable >> `Allocate` node: >> >> 1- the adr type captured in the `NarrowMemProj` becomes out of sync >> with the type of the slices for the allocation >> >> 2- before EA, the memory state for one particular field out of the >> `Initialize` node can be used for a `Store` to the just allocated >> object or some other. So we can have a chain of `Store`s, some to >> the newly allocated object, some to some other objects, all of them >> using the state of `NarrowMemProj` out of the `Initialize`. After >> split unique types, the `NarrowMemProj` is for the slice of a >> particular allocation. So `Store`s to some other objects shouldn't >> use that memory state but the memory state before the `Allocate`. >> >> For that, I added logic to update the adr type of `NarrowMemProj` >> during split uni... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review src/hotspot/share/opto/multnode.cpp line 49: > 47: assert((Opcode() != Op_If && Opcode() != Op_RangeCheck) || which_proj == (uint)true || which_proj == (uint)false, "must be 1 or 0"); > 48: assert(number_of_projs(which_proj) <= 1, "only when there's a single projection"); > 49: auto find_proj = [which_proj, this](ProjNode* proj) { This does not build on macosx-aarch64: src/hotspot/share/opto/multnode.cpp:49:21: error: lambda capture 'which_proj' is not used [-Werror,-Wunused-lambda-capture] auto find_proj = [which_proj, this](ProjNode* proj) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24570#discussion_r2111379254 From epeter at openjdk.org Wed May 28 09:29:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 May 2025 09:29:56 GMT Subject: RFR: 8357554: Enable vectorization of Bool -> CMove with different type size (on riscv) In-Reply-To: References: Message-ID: <63hCLve9SdiTGRuvoxjSjee_rD1CGBSVWzBG5cIz6iQ=.55fa01e4-ebae-4eef-aec8-dfddab23e84b@github.com> On Tue, 20 May 2025 19:39:30 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > This pr is splited from https://github.com/openjdk/jdk/pull/25341, and contains only share code change. > > This patch enable the vectorization of statement like `fd_1 bop fd_2 ? res_1 : res_2` in a loop. > > The current behaviour on other platforms support vecatorization of `fd_1 bop fd_2 ? res_1 : res_2` in a loop only when `fd` and `res` have the same size, but this constraint seems not necessary at least not necessary on riscv, so I relax this constraint on riscv, maybe on other platforms it can be relaxed too, but currently I only made it work on riscv. > Besides of this, I also relax the constraint on transforming Op_CMoveI/L to Op_VectorBlend on riscv, this bring some extra benefit when the `res` is not float or double types. > Both relaxation bring performance benefit via vectorization. > > Compared with other runs (master, master with `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on, patch without flags turned on), average improvement introduced by the patch with `-XX:+UseVectorCmov -XX:+UseCMoveUnconditionally` turned on is more than 2.1 times, in some cases it can bring more than 4 times improvement. > When `-XX:-UseVectorCmov -XX:-UseCMoveUnconditionally` turned off, there is no regression on average. > > Check more details at: https://github.com/openjdk/jdk/pull/25341. > > Thanks src/hotspot/cpu/riscv/matcher_riscv.hpp line 204: > 202: static bool supports_vectorize_cmove_bool_unconditionally() { > 203: return true; > 204: } Does RISCV support the use of any input vector element type, including 8bit, 16bit, 32bit and 64bit masks, and any elements we would be blending, incl `byte, short, char, int, long, HF, F, D`? Because it sounds you are promissing this really "unconditionally". Or what exactly do you mean by "unconditionally"? src/hotspot/share/opto/superword.cpp line 2363: > 2361: VectorNode::is_vectorize_cmove_bool_unconditionally_supported()) { > 2362: return true; > 2363: } Can you please list which additional cases this now allows? I suppose `D/F` comparison for the `Bool`, and then `D/F` inputs for `CMove`, but we can mismatch, e.g. compare `F` but blend `D`, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25336#discussion_r2111380138 PR Review Comment: https://git.openjdk.org/jdk/pull/25336#discussion_r2111384982 From epeter at openjdk.org Wed May 28 09:55:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 May 2025 09:55:57 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v2] In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: On Sun, 25 May 2025 13:17:49 GMT, Hannes Greule wrote: >> This change improves the precision of the `Mod(I|L)Node::Value()` functions. >> >> I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early. >> The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions. >> >> ### Monotonicity >> >> Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range). >> >> ### Testing >> >> I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something). >> >> Please review and let me know what you think. >> >> ### Other >> >> The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508. >> >> During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into: >> - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement? >> - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd. > > Hannes Greule has updated the pull request incrementally with three additional commits since the last revision: > > - Update ModL comment > - Use TOP instead of ZERO > - Apply suggested test changes @SirYwell Thanks for looking into this, that looks promising! I have two bigger comments: - Could we unify the L and I code, either using C++ templating or `BasicType`? It would reduce code duplication. - Can we have some tests where the input ranges are random as well, and where we check the output ranges with some comparisons? ------------------ Copied from the code comment: > Nice work with the examples you already have, and randomizing some of it! > > I would like to see one more generalized test. > - compute `res = lhs % rhs` > - Truncate both `lhs` and `rhs` with randomly produced bounds from Generators, like this: `lhs = Math.max(lo, Math.min(hi, lhs))`. > - Below, add all sorts of comparisons with random constants, like this: `if (res < CON) { sum += 1; }`. If the output range is wrong, this could wrongly constant fold, and allow us to catch that. > > Then fuzz the generated method a few times with random inputs for `lhs` and `rhs`, and check that the `sum` and `res` value are the same for compiled and interpreted code. > > I hope that makes sense :) > This is currently my best method to check if ranges are correct, and I think it is quite important because often tests are only written with constants in mind, but less so with ranges, and then we mess up the ranges because it is just too tricky. > > This is an example, where I asked someone to try this out as well: > https://github.com/openjdk/jdk/pull/23089/files#diff-12bebea175a260a6ab62c22a3681ccae0c3d9027900d2fdbd8c5e856ae7d1123R404-R422 src/hotspot/share/opto/divnode.cpp line 1222: > 1220: > 1221: const TypeInt *i1 = t1->isa_int(); > 1222: const TypeInt *i2 = t2->isa_int(); Suggestion: const TypeInt* i1 = t1->isa_int(); const TypeInt* i2 = t2->isa_int(); In new code style, we put the asterisk on the left. We fix it whenever we touch old code. src/hotspot/share/opto/divnode.cpp line 1515: > 1513: > 1514: const TypeLong *i1 = t1->isa_long(); > 1515: const TypeLong *i2 = t2->isa_long(); The code below is basically a duplication. Could you unify the code, either using c++ templates or `BasicType`? test/hotspot/jtreg/compiler/c2/gvn/ModINodeValueTests.java line 92: > 90: @IR(failOn = {IRNode.MOD_I, IRNode.CMP_I}) > 91: // The sign of the result of % is the same as the sign of the dividend, > 92: // i.e., posVal % x < 0 => false. Suggestion: // i.e., POS_INT % x < 0 => false. Same everywhere below. test/hotspot/jtreg/compiler/c2/gvn/ModINodeValueTests.java line 201: > 199: // in bounds, cannot optimize > 200: return ((byte) x) % (((char) y) + 1) <= -128; > 201: } Nice work with the examples you already have, and randomizing some of it! I would like to see one more generalized test. - compute `res = lhs % rhs` - Truncate both `lhs` and `rhs` with randomly produced bounds from Generators, like this: `lhs = Math.max(lo, Math.min(hi, lhs))`. - Below, add all sorts of comparisons with random constants, like this: `if (res < CON) { sum += 1; }`. If the output range is wrong, this could wrongly constant fold, and allow us to catch that. Then fuzz the generated method a few times with random inputs for `lhs` and `rhs`, and check that the `sum` and `res` value are the same for compiled and interpreted code. I hope that makes sense :) This is currently my best method to check if ranges are correct, and I think it is quite important because often tests are only written with constants in mind, but less so with ranges, and then we mess up the ranges because it is just too tricky. This is an example, where I asked someone to try this out as well: https://github.com/openjdk/jdk/pull/23089/files#diff-12bebea175a260a6ab62c22a3681ccae0c3d9027900d2fdbd8c5e856ae7d1123R404-R422 ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25254#pullrequestreview-2874299006 PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2111393407 PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2111407282 PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2111413138 PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2111426388 From galder at openjdk.org Wed May 28 11:39:53 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 28 May 2025 11:39:53 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences In-Reply-To: References: Message-ID: On Tue, 27 May 2025 17:26:59 GMT, Manuel H?ssig wrote: > ## Summary > > On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: > > ; OptoAssembly > 03d decode_heap_oop_not_null R8,R10 > 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 > > ; x86 > 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused > 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset > > > This PR adds a peephole optimization to remove such redundant `lea`s. > > ## The Issue in Detail > > The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes > > LoadN -> decodeHeapOop_not_null -> leaP* > ______________________________? > > where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: > > https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 > > On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. > > This leaves us with a handful of possible solutions: > 1. implement narrow bases for derived oops in oop maps, > 2. perform some dead code elimination after we know which oops are part of oop maps, > 3. add a peephole optimization to simply remove unused `lea`s. > > Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the register allocator. However, rewriting the oop map machinery to remove a... test/hotspot/jtreg/compiler/codegen/TestRedundantLea.java line 287: > 285: phase = {CompilePhase.FINAL_CODE}, > 286: applyIfAnd = {"MaxHeapSize", "<1073741824", "UseAVX", "=3"}, > 287: applyIfPlatform = {"mac", "false"}) Doesn't `UseAVX=3` already imply that `mac=false`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2111622778 From epeter at openjdk.org Wed May 28 12:01:12 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 May 2025 12:01:12 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v3] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 02:35:50 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This is a small patch that improves the implementation of Value() for `AbsINode` and `AbsLNode` by returning the absolute value of the input range. Most of the logic is trivial except for the special case where `_lo == jint_min/jlong_min` which must return the entire type range when encountered, for which I've added a small proof in the comments. I've also added some unit tests and updated the file to limit IR check platforms with more granularity. >> >> Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: > > - Replace uabs usage with ABS > - Merge branch 'master' into abs-value > - Merge > - Improve AbsNode::Value @jaskarth Nice work! I have a few comments below. One is about more randomized tests. I'm thinking about something like this: - compute `res = Math.abs(x)` - Truncate `x` with randomly produced bounds from Generators, like this: `x = Math.max(lo, Math.min(hi, x))`. - Below, add all sorts of comparisons with random constants, like this: `if (res < CON) { sum += 1; }`. If the output range is wrong, this could wrongly constant fold, and allow us to catch that. - Then fuzz the generated method a few times with random inputs for `x`, and check that the sum and res value are the same for compiled and interpreted code. I hope that makes sense :) This is currently my best method to check if ranges are correct, and I think it is quite important because often tests are only written with constants in mind, but less so with ranges, and then we mess up the ranges because it is just too tricky. This is an example, where I asked someone to try this out as well: https://github.com/openjdk/jdk/pull/23089/files#diff-12bebea175a260a6ab62c22a3681ccae0c3d9027900d2fdbd8c5e856ae7d1123R404-R422 src/hotspot/share/opto/subnode.cpp line 1947: > 1945: > 1946: return IntegerType::make(ABS(t->get_con())); > 1947: } We used `uabs` before, what prevents you from doing that now? I guess you would need a templated version, hmm. Could be worth looking into creating one. src/hotspot/share/opto/subnode.cpp line 1956: > 1954: // - As abs(type_min+1) == type_max and for all n from type_min+1 to hi, abs(n) <= type_max, the upper bound must be type_max. > 1955: > 1956: return IntegerType::TYPE_DOMAIN; Nice, I like proofs! I was wondering if we can make it a little conciser, but up to you which verison you want. Suggestion: // Both type_min and type_max are in the output type, hence we return the whole type domain: // - type_min is in t -> abs(type_min) = type_min // - type_min+1 is in t, because t is not a constant -> abs(type_min+1) = type_max return IntegerType::TYPE_DOMAIN; Meh, but yours is a bit more complete. Maybe better keep yours. src/hotspot/share/opto/subnode.cpp line 1960: > 1958: > 1959: NativeType lo_abs = ABS(t->_lo); > 1960: NativeType hi_abs = ABS(t->_hi); Suggestion: // Knowing that min_type is not in t, we know there is no overflow. NativeType lo_abs = ABS(t->_lo); NativeType hi_abs = ABS(t->_hi); test/hotspot/jtreg/compiler/c2/irTests/TestIRAbs.java line 333: > 331: // [-9, -2] => [2, 9] > 332: return Math.abs(-((in & 7) + 2)) > 9; > 333: } Could we have some randomized cases here too? Or do we already have them somewhere? ------------- PR Review: https://git.openjdk.org/jdk/pull/23685#pullrequestreview-2874674288 PR Review Comment: https://git.openjdk.org/jdk/pull/23685#discussion_r2111638516 PR Review Comment: https://git.openjdk.org/jdk/pull/23685#discussion_r2111655487 PR Review Comment: https://git.openjdk.org/jdk/pull/23685#discussion_r2111658190 PR Review Comment: https://git.openjdk.org/jdk/pull/23685#discussion_r2111666504 From mhaessig at openjdk.org Wed May 28 12:05:26 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 28 May 2025 12:05:26 GMT Subject: RFR: 8354930: IGV: dump C2 graph before and after live range stretching Message-ID: This PR introduces a new phase `LIVE_RANGE_STRETCHING` that prints after live ranges have been stretched, if that happens at all. The phase `INITIAL_LIVENESS` is moved before live range stretching so we can compare the live ranges before and after stretching in IGV, which is useful for debugging why an oop suddenly belongs to an oop map. ## Testing - [ ] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15299362485) - [ ] tier1 and tier1, plus additional Oracle internal testing for all Oracle supported platforms and OSs - [x] verified that the new phase prints when it should in IGV and with `-XX:PrintPhaseLevel=4` ------------- Commit messages: - Introduce new phase "live range stretching" Changes: https://git.openjdk.org/jdk/pull/25492/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25492&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8354930 Stats: 7 lines in 3 files changed: 5 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25492.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25492/head:pull/25492 PR: https://git.openjdk.org/jdk/pull/25492 From mhaessig at openjdk.org Wed May 28 12:29:55 2025 From: mhaessig at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 28 May 2025 12:29:55 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences In-Reply-To: References: Message-ID: On Wed, 28 May 2025 11:33:13 GMT, Galder Zamarre?o wrote: >> ## Summary >> >> On x86, chained dereferences of narrow oops at a constant offset from the base oop can use a `lea` instruction to perform the address computation in one go using the `leaP8Narrow`, `leaP32Narrow`, and `leaPCompressedOopOffset` matching rules. However, the generated code contains an additional `lea` with an unused result: >> >> ; OptoAssembly >> 03d decode_heap_oop_not_null R8,R10 >> 041 leaq R10, [R12 + R10 << 3 + #12] (compressed oop addressing) ; ptr compressedoopoff32 >> >> ; x86 >> 0x00007f1f210625bd: lea (%r12,%r10,8),%r8 ; result is unused >> 0x00007f1f210625c1: lea 0xc(%r12,%r10,8),%r10 ; the same computation as decode, but with offset >> >> >> This PR adds a peephole optimization to remove such redundant `lea`s. >> >> ## The Issue in Detail >> >> The ideal subgraph producing redundant `lea`s, or rather redundant `decodeHeapOop_not_null`s, is `LoadN -> DecodeN -> AddP`, where both the address and base edge of the `AddP` originate from the `DecodeN`. After matching, this becomes >> >> LoadN -> decodeHeapOop_not_null -> leaP* >> ______________________________? >> >> where `leaP*` is either of `leaP8Narrow`, `leaP32Narrow`, or `leaPCompressedOopOffset` (depending on the heap location and size). Here, the base input of `leaP*` comes from the decode. Looking at the matching code path, we find that the `leaP*` rules match both the `AddP` and the `DecodeN`, since x86 can fold this, but the following code adds the decode back as the base input to `leaP*`: >> >> https://github.com/openjdk/jdk/blob/c29537740efb04e061732a700582d43b1956cff4/src/hotspot/share/opto/matcher.cpp#L1894-L1897 >> >> On its face, this is completely unnecessary if we matched a `leaP*`, since it already computes the result of the decode, so adding the `LoadN` node as base seems like the logical choice. However, if the derived oop computed by the `leaP*` gets added to an oop map, this `DecodeN` is needed as the base for the derived oop. Because as of now, derived oops in oop maps cannot have narrow base pointers. >> >> This leaves us with a handful of possible solutions: >> 1. implement narrow bases for derived oops in oop maps, >> 2. perform some dead code elimination after we know which oops are part of oop maps, >> 3. add a peephole optimization to simply remove unused `lea`s. >> >> Option 1 would have been ideal in the sense, that it is the earliest possible point to remove the decode, which would simplify the graph and reduce pressure on the regi... > > test/hotspot/jtreg/compiler/codegen/TestRedundantLea.java line 287: > >> 285: phase = {CompilePhase.FINAL_CODE}, >> 286: applyIfAnd = {"MaxHeapSize", "<1073741824", "UseAVX", "=3"}, >> 287: applyIfPlatform = {"mac", "false"}) > > Doesn't `UseAVX=3` already imply that `mac=false`? Almost, but not quite. The 2020 model of the Macbook Air and the Macbook Pro 13'' feature 10th generation Intel CPUs supporting AVX512 ([source](https://blog.reyem.dev/post/which-consumer-computers-support-avx-512/)). Also, both conditions have different purposes here. `mac=false` is set, because on MacOS we cannot guarantee what `leaP*` variant will be generated due to variations in the heap layout due to ASLR. `UseAVX=3` is there, because the test only works in that case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2111726320 From epeter at openjdk.org Wed May 28 12:31:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 May 2025 12:31:59 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6] In-Reply-To: References: Message-ID: On Wed, 14 May 2025 02:44:14 GMT, erifan wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Refactor the JTReg tests for compare.xor(maskAll) > > Also made a bit change to support pattern `VectorMask.fromLong()`. > - Merge branch 'master' into JDK-8354242 > - Refactor code > > Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this > optimization, making the code more modular. > - Merge branch 'master' into JDK-8354242 > - Update the jtreg test > - Merge branch 'master' into JDK-8354242 > - Addressed some review comments > > 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. > 2. Improve code comments. > - Merge branch 'master' into JDK-8354242 > - Merge branch 'master' into JDK-8354242 > - 8354242: VectorAPI: combine vector not operation with compare > > This patch optimizes the following patterns: > For integer types: > ``` > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > ``` > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > ``` > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > ``` > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: > With option `-XX:UseSVE=2`: > ``` > Benchmark Unit Before Score Error After Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 2352855... @erifan Looks like you really improved things, nice work! I have some more comments below :) src/hotspot/share/opto/vectornode.cpp line 2213: > 2211: Node* in1 = in(1); > 2212: Node* in2 = in(2); > 2213: // Transformations for predicated IRs are not supported for now. Suggestion: // Transformations for predicated vectors are not supported for now. src/hotspot/share/opto/vectornode.cpp line 2215: > 2213: // Transformations for predicated IRs are not supported for now. > 2214: if (is_predicated_vector() || in1->is_predicated_vector() || > 2215: in2->is_predicated_vector()) { I would either put all on the same line, or all on separate lines. src/hotspot/share/opto/vectornode.cpp line 2219: > 2217: } > 2218: > 2219: // XorV/XorVMask is commutative, swap VectorMaskCmp/Op_VectorMaskCast to in1. Suggestion: // XorV/XorVMask is commutative, swap VectorMaskCmp/VectorMaskCast to in1. Would look a little cleaner, and you did also not write `Op_VectorMaskCmp` either ;) src/hotspot/share/opto/vectornode.cpp line 2225: > 2223: } > 2224: > 2225: const TypeVect* vmcast_vt = nullptr; Suggestion: const TypeVect* vector_mask_cast_vt = nullptr; I think it would not hurt to write it out. Otherwise, the reader always has to reconstruct that in their head. src/hotspot/share/opto/vectornode.cpp line 2230: > 2228: vmcast_vt = in1->as_Vector()->vect_type(); > 2229: in1 = in1->in(1); > 2230: } Add a comment why you check `in1->outcnt() == 1`. src/hotspot/share/opto/vectornode.cpp line 2233: > 2231: if (in2->Opcode() == Op_VectorMaskCast) { > 2232: in2 = in2->in(1); > 2233: } Wow, this seems to be an addition that is not covered in the patterns you mention above, right? But is that even necessary? I suppose here `in2 = VectorMaskCast(all_ones_vector)`. Would we not already want to transform this pattern in `VectorMaskCast::Ideal`, is that not possible and more powerful? src/hotspot/share/opto/vectornode.cpp line 2244: > 2242: // BoolTest doesn't support unsigned comparisons. > 2243: BoolTest::mask neg_cond = > 2244: (BoolTest::mask) (((VectorMaskCmpNode*) in1)->get_predicate() ^ 4); What is the hard-coded `^ 4` here? This whole line looks like we are looking at internals of the `VectorMaskCmpNode` or its predicate, and we should probably do that in some method there? Or maybe it should be part of the `BoolTest(::mask)` interface? src/hotspot/share/opto/vectornode.cpp line 2251: > 2249: predicate_node, vt); > 2250: if (vmcast_vt != nullptr) { > 2251: // We optimized out an VectorMaskCast, and in order to ensure type Suggestion: // We optimized out a VectorMaskCast, and in order to ensure type src/hotspot/share/opto/vectornode.cpp line 2253: > 2251: // We optimized out an VectorMaskCast, and in order to ensure type > 2252: // correctness, we need to regenerate one. VectorMaskCast will be encoded as > 2253: // empty for types with the same size. Suggestion: // a no-op (identity function) for types with the same size. Or what do you mean by "empty"? `TOP`? All zeros? test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 49: > 47: private static final VectorSpecies L_SPECIES = LongVector.SPECIES_MAX; > 48: private static final VectorSpecies F_SPECIES = FloatVector.SPECIES_MAX; > 49: private static final VectorSpecies D_SPECIES = DoubleVector.SPECIES_MAX; @jatin-bhateja Do you think it is sufficient to only test on `MAX` sizes here? test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 96: > 94: Generator lGen = RD.uniformLongs(Long.MIN_VALUE, Long.MAX_VALUE); > 95: Generator fGen = RD.uniformFloats(Float.MIN_VALUE, Float.MAX_VALUE); > 96: Generator dGen = RD.uniformDoubles(Double.MIN_VALUE, Double.MAX_VALUE); Are you sure you only want to draw from the uniform distribution? If you don't super care about the distribution, please just take `RD.ints/longs/floats/doubles()`. That way, you get all sorts of distributions, and also some that include NaN values etc. I think that would be important for your float cmp cases, no? test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 237: > 235: // Byte tests > 236: @Test > 237: @IR(counts = { IRNode.XOR_V_MASK, "= 0", IRNode.XOR_VB, "= 0" }, Could you still assert the presence of some other vectors, just to make sure we are indeed getting vectors here? test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java line 49: > 47: private static final VectorSpecies L_SPECIES = LongVector.SPECIES_MAX; > 48: private static final VectorSpecies F_SPECIES = FloatVector.SPECIES_MAX; > 49: private static final VectorSpecies D_SPECIES = DoubleVector.SPECIES_MAX; Are you taking `SPECIES_MAX` on purpose here, or could we take `SPECIES_PREFERRED` instead? @jatin-bhateja What is the best to do in these tests? I suppose best would be to test with all vector lengths... ------------- PR Review: https://git.openjdk.org/jdk/pull/24674#pullrequestreview-2874740189 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111679899 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111679178 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111681447 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111684880 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111688348 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111695428 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111699396 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111705378 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111706856 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111726416 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111711829 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111717556 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111723220 From epeter at openjdk.org Wed May 28 12:32:00 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 May 2025 12:32:00 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6] In-Reply-To: References: Message-ID: <9u6hJ-WgnHLMaYBa8ViRdpUZY-bI2wOk-TCRKWJJdqk=.b3303f1f-da3b-4c2e-8f0c-a2e16ba9688e@github.com> On Wed, 28 May 2025 12:14:56 GMT, Emanuel Peter wrote: >> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Refactor the JTReg tests for compare.xor(maskAll) >> >> Also made a bit change to support pattern `VectorMask.fromLong()`. >> - Merge branch 'master' into JDK-8354242 >> - Refactor code >> >> Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this >> optimization, making the code more modular. >> - Merge branch 'master' into JDK-8354242 >> - Update the jtreg test >> - Merge branch 'master' into JDK-8354242 >> - Addressed some review comments >> >> 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >> 2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >> This patch optimizes the following patterns: >> For integer types: >> ``` >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> ``` >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> ``` >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> ``` >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: >> With option `-XX:UseSVE=2`: >> ``` >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4... > > src/hotspot/share/opto/vectornode.cpp line 2244: > >> 2242: // BoolTest doesn't support unsigned comparisons. >> 2243: BoolTest::mask neg_cond = >> 2244: (BoolTest::mask) (((VectorMaskCmpNode*) in1)->get_predicate() ^ 4); > > What is the hard-coded `^ 4` here? This whole line looks like we are looking at internals of the `VectorMaskCmpNode` or its predicate, and we should probably do that in some method there? Or maybe it should be part of the `BoolTest(::mask)` interface? Also: You now cast `(VectorMaskCmpNode*) in1` twice. Can we not do `as_VectorMaskCmp()`? Or could we at least cast it only once, and then use it as `in1_mask_cmp` instead? > test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 237: > >> 235: // Byte tests >> 236: @Test >> 237: @IR(counts = { IRNode.XOR_V_MASK, "= 0", IRNode.XOR_VB, "= 0" }, > > Could you still assert the presence of some other vectors, just to make sure we are indeed getting vectors here? Not testing for any present vectors makes me a little nervous: what if we just don't get any vectors because inlining fails or something else silly happens? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111701943 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111728478 From epeter at openjdk.org Wed May 28 13:23:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 28 May 2025 13:23:52 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll [v5] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 08:06:37 GMT, Marc Chevalier wrote: >> This assert seems a bit too tight. See the JBS issue to check the math: the bound of `trip_count` should be `<= 2^31`, while the current bound is ` < (julong)max_juint/2` = floor((2^32-1)/2) = (2^32-2) / 2 = 2^31-1. > > Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: > > - +message in assert > - Move asserts around Looks reasonable to me :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25295#pullrequestreview-2875067357 From mchevalier at openjdk.org Wed May 28 13:29:00 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 28 May 2025 13:29:00 GMT Subject: RFR: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll [v5] In-Reply-To: References: Message-ID: <5wCzP6R2VlseRYe87x_sH6Kj2M07LkvvowIAwbFMGhM=.7b5f016e-86c4-417b-bb2b-810978133520@github.com> On Tue, 27 May 2025 08:06:37 GMT, Marc Chevalier wrote: >> This assert seems a bit too tight. See the JBS issue to check the math: the bound of `trip_count` should be `<= 2^31`, while the current bound is ` < (julong)max_juint/2` = floor((2^32-1)/2) = (2^32-2) / 2 = 2^31-1. > > Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: > > - +message in assert > - Move asserts around Thanks you all for reviews and comments! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25295#issuecomment-2916344895 From mchevalier at openjdk.org Wed May 28 13:29:01 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 28 May 2025 13:29:01 GMT Subject: Integrated: 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll In-Reply-To: References: Message-ID: On Mon, 19 May 2025 06:43:38 GMT, Marc Chevalier wrote: > This assert seems a bit too tight. See the JBS issue to check the math: the bound of `trip_count` should be `<= 2^31`, while the current bound is ` < (julong)max_juint/2` = floor((2^32-1)/2) = (2^32-2) / 2 = 2^31-1. This pull request has now been integrated. Changeset: 4b9290af Author: Marc Chevalier URL: https://git.openjdk.org/jdk/commit/4b9290af0a46bdf662735c24d00732a4c1601102 Stats: 59 lines in 3 files changed: 57 ins; 0 del; 2 mod 8356647: C2: Excessively strict assert in PhaseIdealLoop::do_unroll Reviewed-by: chagedorn, epeter, dlong ------------- PR: https://git.openjdk.org/jdk/pull/25295 From syan at openjdk.org Wed May 28 13:31:59 2025 From: syan at openjdk.org (SendaoYan) Date: Wed, 28 May 2025 13:31:59 GMT Subject: RFR: 8357781: Deep recursion in PhaseCFG::set_next_call leads to stack overflow [v3] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 11:52:14 GMT, Marc Chevalier wrote: >> test/hotspot/jtreg/compiler/c2/StackOverflowInSetNextCall.java line 65: >> >>> 63: public static void main(String[] args) { >>> 64: for (int i = 0; i < 400; ++i) { >>> 65: test(); >> >> Do we need to use the return value of function `test`, to avoid the compiler do the dead code elimination > > Experimentally it's not useful since if the call was overall eliminated, it wouldn't reproduce the crash. > > Moreover the test uses `-XX:CompileCommand=compileonly,StackOverflowInSetNextCall::test` so `main` is not compiled so no dead code elimination can kick in. This is not even necessary: one can just force compilation of `test` (and more) without `-Xcomp` and CompileCommand just by having enough iterations of this loop. It's just not very nice as a test since it compiles a lot more things, and takes longer overall, without benefit. > > Also, the code couldn't be eliminated overall: `test()` could have some side effects, it would need inlining to conclude it can be removed, and even then, it can't since `test()` assigns `d` and reads `arr`: even if nothing happens actually, I don't think it could remove everything. Thanks for your detail explanations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25448#discussion_r2111925304 From eastigeevich at openjdk.org Wed May 28 13:34:59 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 28 May 2025 13:34:59 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v18] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 21:44:46 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Add JVMCINMethodData::has_mirror() src/hotspot/cpu/aarch64/relocInfo_aarch64.cpp line 90: > 88: } > 89: } > 90: call->set_destination(x); The new code does not update trampoline with `x`. Also you need to handle properly the case of `trampoline` being null. IMO it should never be null. So `if` is not needed. I'd use `guarantee` here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2111934804 From dlunden at openjdk.org Wed May 28 13:46:10 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 28 May 2025 13:46:10 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v7] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 14:17:09 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. >> >> My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies. >> >> A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. >> >> - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive. >> - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly. >> - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix uncommon_freq Another update: after running more benchmarks on `master` with this changeset applied, it now looks like there are some nice confirmed improvements. Furthermore, the regressions found when applying the changes and running on an earlier version seem to have disappeared. So, the results look good! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2916415615 From kvn at openjdk.org Wed May 28 14:11:16 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 28 May 2025 14:11:16 GMT Subject: RFR: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling [v25] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 21:57:11 GMT, Igor Veresov wrote: >> Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. >> >> More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 > > Igor Veresov has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: > > - Merge branch 'master' into pp2 > - Missing part of the merge > - Merge branch 'master' into pp2 > - Merge branch 'master' into pp2 > - 8357284: runtime/cds/appcds/aotProfile/AOTProfileFlags.java fails on non-debug platform > - 8357283: compiler/debug/TestStressBailout.java hangs when running with AOT cache > - Merge branch 'master' into pp2 > - Address Ioi's comments > - Merge branch 'master' into pp2 > - Address Ioi's comments > - ... and 80 more: https://git.openjdk.org/jdk/compare/2e8b195a...ed213368 Re-approved ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24886#pullrequestreview-2875268813 From jbhateja at openjdk.org Wed May 28 14:35:33 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 28 May 2025 14:35:33 GMT Subject: RFR: 8351635: C2 ROR/ROL: assert failed: Long constant expected Message-ID: This bug fix patch relaxes the strict assertion check to allow other pattern matches for degenerated long vector ROL/ROR operations with non-constant scalar shift values. Kindly review and share feedback. Best Regards, Jatin ------------- Commit messages: - 8351635: C2 ROR/ROL: assert failed: Long constant expected Changes: https://git.openjdk.org/jdk/pull/25493/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25493&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351635 Stats: 131 lines in 2 files changed: 129 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25493.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25493/head:pull/25493 PR: https://git.openjdk.org/jdk/pull/25493 From galder at openjdk.org Wed May 28 14:49:51 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 28 May 2025 14:49:51 GMT Subject: RFR: 8020282: Generated code quality: redundant LEAs in the chained dereferences In-Reply-To: References: Message-ID: On Wed, 28 May 2025 12:27:29 GMT, Manuel H?ssig wrote: > Almost, but not quite. The 2020 model of the Macbook Air and the Macbook Pro 13'' feature 10th generation Intel CPUs supporting AVX512 ([source](https://blog.reyem.dev/post/which-consumer-computers-support-avx-512/)). Oh right, I had not realised the intel Macs had AVX512. Thanks for the clarification :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25471#discussion_r2112105615 From kxu at openjdk.org Wed May 28 14:49:59 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 28 May 2025 14:49:59 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Tue, 29 Apr 2025 16:42:41 GMT, Emanuel Peter wrote: >> Hello @eme64. I pinged you in [an in-line review](https://github.com/openjdk/jdk/pull/23506#discussion_r2042974649). Could you please provide some commons on this assertion? This is currently blocking my progress and breaking the build. Thank you very much! > > @tabjy Thanks for your patience, this one took me longer than I wanted. I responded like this above: > >> Hmm, ok I see. Why don't you remove the asserts for now, and we see how clear the code looks now. I think I asked for the consistency check because I was confused by the previous code structure. Maybe it is ok now as it is. Ping @eme64 again for awareness. :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-2916624820 From iveresov at openjdk.org Wed May 28 15:18:03 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Wed, 28 May 2025 15:18:03 GMT Subject: Integrated: 8355003: Implement JEP 515: Ahead-of-Time Method Profiling In-Reply-To: References: Message-ID: On Fri, 25 Apr 2025 20:18:41 GMT, Igor Veresov wrote: > Improve warm-up time by making profile data from a previous run of an application instantly available, when the HotSpot Java Virtual Machine starts. Specifically, enhance the [AOT cache](https://openjdk.org/jeps/483) to store method execution profiles from training runs, reducing profiling delays in subsequent production runs. > > More details in the JEP: https://bugs.openjdk.org/browse/JDK-8325147 This pull request has now been integrated. Changeset: e3f85c96 Author: Igor Veresov URL: https://git.openjdk.org/jdk/commit/e3f85c961b4c1e5e01aedf3a0f4e1b0e6ff457fd Stats: 3324 lines in 59 files changed: 3111 ins; 100 del; 113 mod 8355003: Implement JEP 515: Ahead-of-Time Method Profiling Co-authored-by: John R Rose Co-authored-by: Vladimir Ivanov Co-authored-by: Ioi Lam Co-authored-by: Vladimir Kozlov Co-authored-by: Aleksey Shipilev Reviewed-by: kvn, ihse, cjplummer, iklam ------------- PR: https://git.openjdk.org/jdk/pull/24886 From qamai at openjdk.org Wed May 28 15:25:13 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 28 May 2025 15:25:13 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v7] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 14:46:16 GMT, Daniel Lund?n wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> fix uncommon_freq > > src/hotspot/share/opto/gcm.cpp line 2305: > >> 2303: } >> 2304: } >> 2305: > > Can you explain this removal? It is incorrect, it should be `return this == b_loop`. However, I think it is redundant so I removed it altogether. > src/hotspot/share/opto/reg_split.cpp line 522: > >> 520: Block* b = cfg.get_block(bidx); >> 521: if (!loop->in_loop_nest(b)) { >> 522: continue; > > Is there not a more efficient way to iterate through all the loops in the loop nest? We are iterating through all the blocks in the loop nest. There is probably a more straight-forward way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21472#discussion_r2112179738 PR Review Comment: https://git.openjdk.org/jdk/pull/21472#discussion_r2112181961 From kvn at openjdk.org Wed May 28 15:28:05 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 28 May 2025 15:28:05 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v13] In-Reply-To: References: Message-ID: On Mon, 12 May 2025 03:11:52 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: >> >> >> Baseline Patch >> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement >> VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) >> VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) >> VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) >> >> >> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Check for AVX2 for byte/long conversions src/hotspot/share/opto/superword.cpp line 2361: > 2359: > 2360: // Subword cast: Element sizes differ, but the platform supports a cast to change the def shape to the use shape. > 2361: if ((is_subword_type(def_bt) || is_subword_type(use_bt)) && VectorCastNode::implemented(-1, pack_size, def_bt, use_bt)) { I see you use this set of conditions 2 time. Can it be separate function? Also `-1` is strange argument for people who not familiar with code. May be add `/* comment */` to it. Or use some `#define` to have meaningful name for it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2112187921 From dlunden at openjdk.org Wed May 28 15:37:57 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 28 May 2025 15:37:57 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v7] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 15:22:29 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/reg_split.cpp line 522: >> >>> 520: Block* b = cfg.get_block(bidx); >>> 521: if (!loop->in_loop_nest(b)) { >>> 522: continue; >> >> Is there not a more efficient way to iterate through all the loops in the loop nest? > > We are iterating through all the blocks in the loop nest. There is probably a more straight-forward way. Yes, thanks, blocks is what I meant to write. OK, I can investigate if there is a more straightforward way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21472#discussion_r2112208526 From eastigeevich at openjdk.org Wed May 28 15:46:05 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 28 May 2025 15:46:05 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v18] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 21:44:46 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Add JVMCINMethodData::has_mirror() src/hotspot/share/code/nmethod.cpp line 1167: > 1165: #endif > 1166: + align_up(debug_info->data_size() , oopSize) > 1167: + align_up((int)sizeof(int) , oopSize); Why do we need `align_up((int)sizeof(int) , oopSize)`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2112224504 From shade at openjdk.org Wed May 28 15:54:15 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 28 May 2025 15:54:15 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines [v2] In-Reply-To: References: Message-ID: > There is an unfortunate limitation with default tiered policy that we would have at least 2 threads on 1 CPU machine: 1 thread for C1, and 1 thread for C2. > > But if we select C1-only or C2-only modes, we _also_ get 2 compiler threads, for which we have no good reason. These threads would just step on each other toes. The fix changes the behavior for 1..3 CPU hosts in C1/C2-only configurations, by using 1 thread instead of 2 threads. The change for 1 CPU config is what we really need. The change in 2..3 CPU configs is an additional effect, but I think it is still good not to use 100%/66% of the CPUs in those configurations as well. > > > $ for I in `seq 1 8`; do build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:-TieredCompilation -XX:ActiveProcessorCount=${I} \ > -XX:+PrintFlagsFinal 2>&1 | grep "CICompilerCount "; done > > # Before > intx CICompilerCount = 2 > intx CICompilerCount = 2 > intx CICompilerCount = 2 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 4 > > # After > intx CICompilerCount = 1 > intx CICompilerCount = 1 > intx CICompilerCount = 1 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 4 > > > It is a minor bug in `CompilationPolicy::initialize`, but it gets in the way studying Leyden in tight CPU scenarios. > > Additional testing: > - [x] New regression test passes with the fix, fails without it > - [ ] GHA Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Better test, patch amendments - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count - Unnecessary arch limitation - Simplify test - Adjust test bound - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24972/files - new: https://git.openjdk.org/jdk/pull/24972/files/8e3366a1..e682962c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24972&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24972&range=00-01 Stats: 127063 lines in 2732 files changed: 77275 ins; 34845 del; 14943 mod Patch: https://git.openjdk.org/jdk/pull/24972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24972/head:pull/24972 PR: https://git.openjdk.org/jdk/pull/24972 From shade at openjdk.org Wed May 28 15:54:16 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 28 May 2025 15:54:16 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines [v2] In-Reply-To: References: Message-ID: On Mon, 5 May 2025 07:18:23 GMT, Damon Fenacci wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Better test, patch amendments >> - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count >> - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count >> - Unnecessary arch limitation >> - Simplify test >> - Adjust test bound >> - Fix > > test/hotspot/jtreg/compiler/arguments/TestCompilerCounts.java line 64: > >> 62: ProcessBuilder pb = ProcessTools.createLimitedTestJavaProcessBuilder(args); >> 63: OutputAnalyzer output = new OutputAnalyzer(pb.start()); >> 64: output.shouldHaveExitValue(0); > > I was wondering if we should check the output as well, e.g. with a test that prints the actual number of compiler threads (like the one in the description, to make it a bit more like a regression test). Added the test now! Took a while to figure out how to do it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24972#discussion_r2112239915 From shade at openjdk.org Wed May 28 16:01:23 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 28 May 2025 16:01:23 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines [v2] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 15:54:15 GMT, Aleksey Shipilev wrote: >> There is an unfortunate limitation with default tiered policy that we would have at least 2 threads on 1 CPU machine: 1 thread for C1, and 1 thread for C2. >> >> But if we select C1-only or C2-only modes, we _also_ get 2 compiler threads, for which we have no good reason. These threads would just step on each other toes. The fix changes the behavior for 1..3 CPU hosts in C1/C2-only configurations, by using 1 thread instead of 2 threads. The change for 1 CPU config is what we really need. The change in 2..3 CPU configs is an additional effect, but I think it is still good not to use 100%/66% of the CPUs in those configurations as well. >> >> >> $ for I in `seq 1 8`; do build/linux-x86_64-server-release/images/jdk/bin/java \ >> -XX:-TieredCompilation -XX:ActiveProcessorCount=${I} \ >> -XX:+PrintFlagsFinal 2>&1 | grep "CICompilerCount "; done >> >> # Before >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 2 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> # After >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 1 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 3 >> intx CICompilerCount = 4 >> >> >> It is a minor bug in `CompilationPolicy::initialize`, but it gets in the way studying Leyden in tight CPU scenarios. >> >> Additional testing: >> - [x] New regression test passes with the fix, fails without it >> - [ ] GHA > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Better test, patch amendments > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count > - Unnecessary arch limitation > - Simplify test > - Adjust test bound > - Fix Update: Added the test that verifies heuristics performs as expected. This test also explores what happens with explicit `CICompilerCount` setting, varying the CPU sizes, etc. This allowed me to simplify the actual VM code: what we used to assert, we now directly test. I also had to add `CICompilerCount=0` for `Xint` mode, so that test is straightforward. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24972#issuecomment-2916855736 From shade at openjdk.org Wed May 28 16:01:23 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 28 May 2025 16:01:23 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines [v2] In-Reply-To: References: Message-ID: On Wed, 30 Apr 2025 19:42:41 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/compiler/compilationPolicy.cpp line 471: >> >>> 469: count = MAX2(max_count, min_count); >>> 470: } >>> 471: assert((!c1_only && !c2_only) || count <= active_cpus, "Too many threads: %d", count); >> >> Should it be the general rule: don't create more compiler threads than available cpus? > > Except when specified on command line with `-XX:CICompilerCount=n`. > Actually your changes does not take this flag into account. I think we should allow users to run with more threads than CPUs, if they really want it. (This is also handy for testing the heuristics for very large CPU counts without having too many real CPUs.) What we surely need to do is to teach _heuristics_ never go over the CPU count. I now added the more precise test that explores `CICompilerCount` settings as well, verifying heuristics works as expected. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24972#discussion_r2112250542 From eastigeevich at openjdk.org Wed May 28 16:05:59 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 28 May 2025 16:05:59 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v18] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 21:44:46 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Add JVMCINMethodData::has_mirror() src/hotspot/share/code/nmethod.cpp line 770: > 768: > 769: void nmethod::clear_inline_caches() { > 770: assert(SafepointSynchronize::is_at_safepoint() || is_not_installed(), "clearing of IC's only allowed at safepoint"); Could you correct the note to the assert? The current note contradicts with `is_not_installed()` which can happen not at a safepoint. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2112261840 From jbhateja at openjdk.org Wed May 28 16:31:01 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 28 May 2025 16:31:01 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX Message-ID: A) Patch extends the following tests with hard-coded encoding checks for various BMI instructions to cover REX2 or extended EVEX encodings supported by APX. compiler/intrinsics/bmi/verifycode/AndnTestI.java compiler/intrinsics/bmi/verifycode/AndnTestL.java compiler/intrinsics/bmi/verifycode/BzhiTestI2L.java compiler/intrinsics/bmi/verifycode/LZcntTestL.java compiler/intrinsics/bmi/verifycode/TZcntTestL.java B) After integration of JDK-8349582, which added APX NDD support, AndN instruction selection patterns that expect (Xor SRC, -1) as one of its operands were not getting selected because of a lower-cost generic immediate pattern match; patch fixes this issue through strict predicate checks. Above tests are now passing, validations were carried out using Intel Software Development emulator. Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - 8357982: Fix several failing BMI tests with -XX:+UseAPX Changes: https://git.openjdk.org/jdk/pull/25501/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25501&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357982 Stats: 62 lines in 6 files changed: 48 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/25501.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25501/head:pull/25501 PR: https://git.openjdk.org/jdk/pull/25501 From duke at openjdk.org Wed May 28 16:36:06 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 28 May 2025 16:36:06 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v18] In-Reply-To: References: Message-ID: <8DQR85GUuidMuFZwfUhWmijjgqwampveiFByFDR3hkc=.c95def20-f62d-47d8-b8ae-622758a73a9d@github.com> On Wed, 28 May 2025 15:43:21 GMT, Evgeny Astigeevich wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Add JVMCINMethodData::has_mirror() > > src/hotspot/share/code/nmethod.cpp line 1167: > >> 1165: #endif >> 1166: + align_up(debug_info->data_size() , oopSize) >> 1167: + align_up((int)sizeof(int) , oopSize); > > Why do we need `align_up((int)sizeof(int) , oopSize)`? That is for the int that keeps track of the number of nmethods using the immutable data. That way it can be shared between the old and new nmethod ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2112316704 From duke at openjdk.org Wed May 28 16:49:05 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 28 May 2025 16:49:05 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v18] In-Reply-To: References: Message-ID: <2HApmZeeYmB9G5gttb7G9zKLyTMSQXwrXODoYgvYmQM=.743583e2-7918-4900-9dbd-7223917cf310@github.com> On Wed, 28 May 2025 13:31:48 GMT, Evgeny Astigeevich wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Add JVMCINMethodData::has_mirror() > > src/hotspot/cpu/aarch64/relocInfo_aarch64.cpp line 90: > >> 88: } >> 89: } >> 90: call->set_destination(x); > > The new code does not update trampoline with `x`. Also you need to handle properly the case of `trampoline` being null. IMO it should never be null. So `if` is not needed. I'd use `guarantee` here. The trampoline should never been null when compiled with C1/C2. However when running on a debug build `Assembler::reachable_from_branch_at` uses 2M (on aarch64) for the branch range where as Graal always uses the max of 128M regardless of release/debug. In that case it is possible for `trampoline` to be null. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2112341501 From rehn at openjdk.org Wed May 28 16:53:02 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 28 May 2025 16:53:02 GMT Subject: RFR: 8357968: RISC-V: Interpreter volatile reference stores with G1 are not sequentially consistent Message-ID: Hi please consider. As ref: https://github.com/openjdk/jdk/pull/25483 As suggested in that PR - I removed these helpers as it's very hard to see that you get registers clobbered. Sanity tested, running t1. /Robbin ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/25502/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25502&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357968 Stats: 27 lines in 1 file changed: 0 ins; 18 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/25502.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25502/head:pull/25502 PR: https://git.openjdk.org/jdk/pull/25502 From shade at openjdk.org Wed May 28 17:09:02 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 28 May 2025 17:09:02 GMT Subject: RFR: 8357793: [PPC64] VM crashes with -XX:-UseSIGTRAP -XX:-ImplicitNullChecks In-Reply-To: <5XqAA3Z2G0uwOBkitUrqkG3Y68xtpRuvBwj_cEIFECs=.18259520-6f73-406f-a46f-fa025c12b303@github.com> References: <5XqAA3Z2G0uwOBkitUrqkG3Y68xtpRuvBwj_cEIFECs=.18259520-6f73-406f-a46f-fa025c12b303@github.com> Message-ID: On Wed, 28 May 2025 17:00:48 GMT, Martin Doerr wrote: > In case of -XX:-UseSIGTRAP -XX:-ImplicitNullChecks, we use the manually selected entry. (The same is true for -XX:-TrapBasedNullChecks -XX:-ImplicitNullChecks.) > We only need to use the correct NullPointerException entry in the compiler case. > > With this patch, the manually selected entry matches the one selected by `PosixSignals::pd_hotspot_signal_handler`. Not sure if you are looking for reviews outside normal PPC maintainer circle, but this looks entirely reasonable. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25504#pullrequestreview-2875830917 From mdoerr at openjdk.org Wed May 28 17:09:01 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 28 May 2025 17:09:01 GMT Subject: RFR: 8357793: [PPC64] VM crashes with -XX:-UseSIGTRAP -XX:-ImplicitNullChecks Message-ID: <5XqAA3Z2G0uwOBkitUrqkG3Y68xtpRuvBwj_cEIFECs=.18259520-6f73-406f-a46f-fa025c12b303@github.com> In case of -XX:-UseSIGTRAP -XX:-ImplicitNullChecks, we use the manually selected entry. (The same is true for -XX:-TrapBasedNullChecks -XX:-ImplicitNullChecks.) We only need to use the correct NullPointerException entry in the compiler case. With this patch, the manually selected entry matches the one selected by `PosixSignals::pd_hotspot_signal_handler`. ------------- Commit messages: - 8357793: [PPC64] VM crashes with -XX:-UseSIGTRAP -XX:-ImplicitNullChecks Changes: https://git.openjdk.org/jdk/pull/25504/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25504&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357793 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25504.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25504/head:pull/25504 PR: https://git.openjdk.org/jdk/pull/25504 From eosterlund at openjdk.org Wed May 28 17:12:53 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 28 May 2025 17:12:53 GMT Subject: RFR: 8357968: RISC-V: Interpreter volatile reference stores with G1 are not sequentially consistent In-Reply-To: References: Message-ID: On Wed, 28 May 2025 16:47:06 GMT, Robbin Ehn wrote: > Hi please consider. > > As ref: https://github.com/openjdk/jdk/pull/25483 > As suggested in that PR - I removed these helpers as it's very hard to see that you get registers clobbered. > > Sanity tested, running t1. > > /Robbin Great stuff. Thanks for fixing. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25502#pullrequestreview-2875843971 From never at openjdk.org Wed May 28 17:41:06 2025 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 28 May 2025 17:41:06 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v15] In-Reply-To: References: Message-ID: On Thu, 8 May 2025 21:21:11 GMT, Erik ?sterlund wrote: >> Chad Rakoczy has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 54 additional commits since the last revision: >> >> - Fix null check >> - Remove unnecessary include >> - Add nullptr check to relocate >> - Fix JVMCI nmethod data >> - Unexclude JVMCI methods >> - Add relocate_nmethod_mirror >> - Only hold NMethodState_lock when needed >> - Exclude JVMCI nmethods >> - Remove StressNMethodRelocation >> - Fix branch_range revert >> - ... and 44 more: https://git.openjdk.org/jdk/compare/4976a8f3...9ca3563a > > src/hotspot/share/jvmci/jvmciRuntime.cpp line 858: > >> 856: >> 857: JVMCIEnv* jvmciEnv = nullptr; >> 858: HotSpotJVMCI::InstalledCode::set_address(jvmciEnv, nmethod_mirror, (jlong)(nm)); > > What's the sync story here? Any lock protecting this? If not, I wonder if readers are okay with inconsistencies. I haven't checked. In the current implementation the fields of InstalledCode are initialized to valid values from the nmethod* during code installation. Those fields only ever transition to 0 as part of nmethod invalidation. Hosted methods may read `InstalledCode.entryPoint` and dispatch to it if it's non-null. So a transition of these values should be safe if they moved from a non-null value to another non-null value and the existing nmethod stayed alive until the next safepoint in the normal nmethod reclamation cycle. Currently writes to those fields by the VM are done in make_not_entrant or at a safepoint so we might want to perform more explicit locking to support transfer of these values. We might consider revisiting the design of InstalledCode itself now that Graal is aligned with the JDK. Backward compatibility precluded that in the past. That might simplify the whole thing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2112428990 From never at openjdk.org Wed May 28 17:45:59 2025 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 28 May 2025 17:45:59 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v18] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 21:44:46 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Add JVMCINMethodData::has_mirror() Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23573#pullrequestreview-2875929107 From shade at openjdk.org Wed May 28 18:05:12 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 28 May 2025 18:05:12 GMT Subject: RFR: 8356000: C1/C2-only modes use 2 compiler threads on low CPU count machines [v3] In-Reply-To: References: Message-ID: > There is an unfortunate limitation with default tiered policy that we would have at least 2 threads on 1 CPU machine: 1 thread for C1, and 1 thread for C2. > > But if we select C1-only or C2-only modes, we _also_ get 2 compiler threads, for which we have no good reason. These threads would just step on each other toes. The fix changes the behavior for 1..3 CPU hosts in C1/C2-only configurations, by using 1 thread instead of 2 threads. The change for 1 CPU config is what we really need. The change in 2..3 CPU configs is an additional effect, but I think it is still good not to use 100%/66% of the CPUs in those configurations as well. > > > $ for I in `seq 1 8`; do build/linux-x86_64-server-release/images/jdk/bin/java \ > -XX:-TieredCompilation -XX:ActiveProcessorCount=${I} \ > -XX:+PrintFlagsFinal 2>&1 | grep "CICompilerCount "; done > > # Before > intx CICompilerCount = 2 > intx CICompilerCount = 2 > intx CICompilerCount = 2 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 4 > > # After > intx CICompilerCount = 1 > intx CICompilerCount = 1 > intx CICompilerCount = 1 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 3 > intx CICompilerCount = 4 > > > It is a minor bug in `CompilationPolicy::initialize`, but it gets in the way studying Leyden in tight CPU scenarios. > > Additional testing: > - [x] New regression test passes with the fix, fails without it > - [x] GHA Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count - Better test, patch amendments - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count - Merge branch 'master' into JDK-8356000-c1-c2-compiler-count - Unnecessary arch limitation - Simplify test - Adjust test bound - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24972/files - new: https://git.openjdk.org/jdk/pull/24972/files/e682962c..f8519b46 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24972&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24972&range=01-02 Stats: 8002 lines in 258 files changed: 5728 ins; 1485 del; 789 mod Patch: https://git.openjdk.org/jdk/pull/24972.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24972/head:pull/24972 PR: https://git.openjdk.org/jdk/pull/24972 From fbredberg at openjdk.org Wed May 28 18:09:50 2025 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 28 May 2025 18:09:50 GMT Subject: RFR: 8357968: RISC-V: Interpreter volatile reference stores with G1 are not sequentially consistent In-Reply-To: References: Message-ID: On Wed, 28 May 2025 16:47:06 GMT, Robbin Ehn wrote: > Hi please consider. > > As ref: https://github.com/openjdk/jdk/pull/25483 > As suggested in that PR - I removed these helpers as it's very hard to see that you get registers clobbered. > > Sanity tested, running t1. > > /Robbin Looks good to me. ------------- Marked as reviewed by fbredberg (Committer). PR Review: https://git.openjdk.org/jdk/pull/25502#pullrequestreview-2875993513 From zzambers at openjdk.org Wed May 28 18:39:27 2025 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Wed, 28 May 2025 18:39:27 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option [v3] In-Reply-To: References: Message-ID: > This change adds ` -XX:-IgnoreUnrecognizedVMOptions` to problematic tests (or `@requires vm.compiler2.enabled` in one case), to prevent failures `Unrecognized VM option` on client VM. Zdenek Zambersky has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: Fix of compiler tests for client VM ------------- Changes: https://git.openjdk.org/jdk/pull/24262/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24262&range=02 Stats: 165 lines in 71 files changed: 57 ins; 0 del; 108 mod Patch: https://git.openjdk.org/jdk/pull/24262.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24262/head:pull/24262 PR: https://git.openjdk.org/jdk/pull/24262 From shade at openjdk.org Wed May 28 18:44:50 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 28 May 2025 18:44:50 GMT Subject: RFR: 8357968: RISC-V: Interpreter volatile reference stores with G1 are not sequentially consistent In-Reply-To: References: Message-ID: On Wed, 28 May 2025 16:47:06 GMT, Robbin Ehn wrote: > Hi please consider. > > As ref: https://github.com/openjdk/jdk/pull/25483 > As suggested in that PR - I removed these helpers as it's very hard to see that you get registers clobbered. > > Sanity tested, running t1. > > /Robbin Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25502#pullrequestreview-2876087208 From zzambers at openjdk.org Wed May 28 18:48:51 2025 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Wed, 28 May 2025 18:48:51 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option [v2] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 05:53:48 GMT, Emanuel Peter wrote: >> Zdenek Zambersky has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: >> >> Fix of compiler tests for client VM > > I quickly looked through the changes, and I think it looks ok. It's a little painful to have to add it everywhere though... It also increases the risk of misspelled flags, or using removed flags etc. But I don't have a great alternative solution. > > I'll run some internal testing now, please ping me again in 24h :) @eme64 I have rebased my changes on master and fixed conflicts. (caused by integration of [JDK-8350457](https://github.com/openjdk/jdk/pull/24522)) I have also updated PR description. (I have not changed JIRA as there is no info about fix. Should I add it there?) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2917290122 From mdoerr at openjdk.org Wed May 28 19:12:55 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 28 May 2025 19:12:55 GMT Subject: RFR: 8357793: [PPC64] VM crashes with -XX:-UseSIGTRAP -XX:-ImplicitNullChecks [v2] In-Reply-To: <5XqAA3Z2G0uwOBkitUrqkG3Y68xtpRuvBwj_cEIFECs=.18259520-6f73-406f-a46f-fa025c12b303@github.com> References: <5XqAA3Z2G0uwOBkitUrqkG3Y68xtpRuvBwj_cEIFECs=.18259520-6f73-406f-a46f-fa025c12b303@github.com> Message-ID: > In case of -XX:-UseSIGTRAP -XX:-ImplicitNullChecks, we use the manually selected entry. (The same is true for -XX:-TrapBasedNullChecks -XX:-ImplicitNullChecks.) > We only need to use the correct NullPointerException entry in the compiler case. > > With this patch, the manually selected entry matches the one selected by `PosixSignals::pd_hotspot_signal_handler`. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Fix bastore without ImplicitNullChecks. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25504/files - new: https://git.openjdk.org/jdk/pull/25504/files/fe0a8a16..bf0d04a4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25504&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25504&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25504.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25504/head:pull/25504 PR: https://git.openjdk.org/jdk/pull/25504 From mdoerr at openjdk.org Wed May 28 19:12:55 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 28 May 2025 19:12:55 GMT Subject: RFR: 8357793: [PPC64] VM crashes with -XX:-UseSIGTRAP -XX:-ImplicitNullChecks [v2] In-Reply-To: References: <5XqAA3Z2G0uwOBkitUrqkG3Y68xtpRuvBwj_cEIFECs=.18259520-6f73-406f-a46f-fa025c12b303@github.com> Message-ID: On Wed, 28 May 2025 17:05:03 GMT, Aleksey Shipilev wrote: > Not sure if you are looking for reviews outside normal PPC maintainer circle, but this looks entirely reasonable. Thank you! All reviews are welcome! I've added another simple fix. I think it should be ok to have it in the same PR. It fixes an other issue with these flags (found by NullPointerExceptionTest). And we should backport all these fixes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25504#issuecomment-2917346740 From shade at openjdk.org Wed May 28 19:20:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 28 May 2025 19:20:51 GMT Subject: RFR: 8357793: [PPC64] VM crashes with -XX:-UseSIGTRAP -XX:-ImplicitNullChecks [v2] In-Reply-To: References: <5XqAA3Z2G0uwOBkitUrqkG3Y68xtpRuvBwj_cEIFECs=.18259520-6f73-406f-a46f-fa025c12b303@github.com> Message-ID: <-UibPyac7WNglpox4aStyTgtbwHTob56wOT3KrJnP2s=.1f642951-9b49-43ab-9fde-765a62d673a3@github.com> On Wed, 28 May 2025 19:12:55 GMT, Martin Doerr wrote: >> In case of -XX:-UseSIGTRAP -XX:-ImplicitNullChecks, we use the manually selected entry. (The same is true for -XX:-TrapBasedNullChecks -XX:-ImplicitNullChecks.) >> We only need to use the correct NullPointerException entry in the compiler case. >> >> With this patch, the manually selected entry matches the one selected by `PosixSignals::pd_hotspot_signal_handler`. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Fix bastore without ImplicitNullChecks. Sure. This is where my knowledge of PPC64 template interpreter ends. I guess this is still fine. My understanding: without implicit null checks, we have to do an explicit null check here. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25504#pullrequestreview-2876179378 From vlivanov at openjdk.org Wed May 28 19:37:54 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 28 May 2025 19:37:54 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 11:45:21 GMT, Aleksey Shipilev wrote: >> `ReachabilityFence` accepts only OOPs as a referent and `DecodeNKlass` produces `Klass` pointer. >> >> I suspect it may be the case for safepoints as well (and `is_DecodeNarrowPtr()` is a a leftover from PermGen world), but I didn't check. > > Right, nevermind about `DecodeNKlass` then. My question on heap loads still stands: do we actually get `reachabilityFence(someField)` from anywhere? Are you asking specifically about `ReachabilityFence -> DecodeN -> LoadN` shape? Yes, it's common, especially after inlining. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2112616254 From mdoerr at openjdk.org Wed May 28 19:39:53 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 28 May 2025 19:39:53 GMT Subject: RFR: 8357793: [PPC64] VM crashes with -XX:-UseSIGTRAP -XX:-ImplicitNullChecks [v2] In-Reply-To: <-UibPyac7WNglpox4aStyTgtbwHTob56wOT3KrJnP2s=.1f642951-9b49-43ab-9fde-765a62d673a3@github.com> References: <5XqAA3Z2G0uwOBkitUrqkG3Y68xtpRuvBwj_cEIFECs=.18259520-6f73-406f-a46f-fa025c12b303@github.com> <-UibPyac7WNglpox4aStyTgtbwHTob56wOT3KrJnP2s=.1f642951-9b49-43ab-9fde-765a62d673a3@github.com> Message-ID: On Wed, 28 May 2025 19:18:14 GMT, Aleksey Shipilev wrote: > Sure. This is where my knowledge of PPC64 template interpreter ends. I guess this is still fine. My understanding: without implicit null checks, we have to do an explicit null check here. Exactly. `make run-test TEST=hotspot:tier1 JTREG="VM_OPTIONS=-XX:-UseSIGTRAP -XX:-ImplicitNullChecks"` has passed with this PR, now. Thanks again! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25504#issuecomment-2917411379 From kxu at openjdk.org Wed May 28 20:21:35 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 28 May 2025 20:21:35 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v4] In-Reply-To: References: Message-ID: > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request incrementally with three additional commits since the last revision: - further refactor is_counted_loop() by extracting functions - WIP: refactor is_counted_loop() - WIP: refactor is_counted_loop() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24458/files - new: https://git.openjdk.org/jdk/pull/24458/files/4d7738c8..25cfe289 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=02-03 Stats: 469 lines in 2 files changed: 250 ins; 145 del; 74 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From duke at openjdk.org Wed May 28 20:27:29 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 28 May 2025 20:27:29 GMT Subject: RFR: 8357223: AArch64: Optimize interpreter profile updates Message-ID: [JDK-8357223](https://bugs.openjdk.org/browse/JDK-8357223) The aarch64 version of [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946) The reasoning for this change is the same as the x86 version's PR: > First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. > > Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. Additional testing: - [x] Linux aarch64 fastdebug tier 1/2/3/4 ------------- Commit messages: - Give register better name - Remove unused code Changes: https://git.openjdk.org/jdk/pull/25512/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25512&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357223 Stats: 47 lines in 2 files changed: 0 ins; 32 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/25512.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25512/head:pull/25512 PR: https://git.openjdk.org/jdk/pull/25512 From eastigeevich at openjdk.org Wed May 28 20:38:03 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 28 May 2025 20:38:03 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v18] In-Reply-To: <2HApmZeeYmB9G5gttb7G9zKLyTMSQXwrXODoYgvYmQM=.743583e2-7918-4900-9dbd-7223917cf310@github.com> References: <2HApmZeeYmB9G5gttb7G9zKLyTMSQXwrXODoYgvYmQM=.743583e2-7918-4900-9dbd-7223917cf310@github.com> Message-ID: On Wed, 28 May 2025 16:46:31 GMT, Chad Rakoczy wrote: >> src/hotspot/cpu/aarch64/relocInfo_aarch64.cpp line 90: >> >>> 88: } >>> 89: } >>> 90: call->set_destination(x); >> >> The new code does not update trampoline with `x`. Also you need to handle properly the case of `trampoline` being null. IMO it should never be null. So `if` is not needed. I'd use `guarantee` here. > > The trampoline should never been null when compiled with C1/C2. However when running on a debug build `Assembler::reachable_from_branch_at` uses 2M (on aarch64) for the branch range where as Graal always uses the max of 128M regardless of release/debug. In that case it is possible for `trampoline` to be null. If a trampoline is null, it is a critical situation. The patched call instruction will be incorrect. `NativeCall::set_destination` does not check whether a destination is reachable: ```c++ void set_destination(address dest) { int offset = dest - instruction_address(); unsigned int insn = 0b100101 << 26; assert((offset & 3) == 0, "should be"); offset >>= 2; offset &= (1 << 26) - 1; // mask off insn part insn |= offset; set_int_at(displacement_offset, insn); } So higher bits will be masked out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2112706525 From duke at openjdk.org Wed May 28 20:45:57 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 28 May 2025 20:45:57 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v18] In-Reply-To: References: <2HApmZeeYmB9G5gttb7G9zKLyTMSQXwrXODoYgvYmQM=.743583e2-7918-4900-9dbd-7223917cf310@github.com> Message-ID: On Wed, 28 May 2025 20:35:12 GMT, Evgeny Astigeevich wrote: > The patched call instruction will be incorrect. That's not entirely correct. The null trampoline check is needed because on debug builds branches of distance >2M will fall into the `if (!Assembler::reachable_from_branch_at(addr(), x))` block but Graal would not have generated a trampoline for that call because it is still <128M. It is still safe to use that distance but it is just different than what HotSpot expects ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2112716099 From duke at openjdk.org Wed May 28 20:45:58 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 28 May 2025 20:45:58 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v18] In-Reply-To: References: <2HApmZeeYmB9G5gttb7G9zKLyTMSQXwrXODoYgvYmQM=.743583e2-7918-4900-9dbd-7223917cf310@github.com> Message-ID: <0tmWzYMOS7jyjgoJL0mBMRywf6mCEBkSTQ7jdRE7Xtg=.5857550c-35e0-4cbb-8bd8-0542ae1b70a5@github.com> On Wed, 28 May 2025 20:41:43 GMT, Chad Rakoczy wrote: >> If a trampoline is null, it is a critical situation. The patched call instruction will be incorrect. >> `NativeCall::set_destination` does not check whether a destination is reachable: >> ```c++ >> void set_destination(address dest) { >> int offset = dest - instruction_address(); >> unsigned int insn = 0b100101 << 26; >> assert((offset & 3) == 0, "should be"); >> offset >>= 2; >> offset &= (1 << 26) - 1; // mask off insn part >> insn |= offset; >> set_int_at(displacement_offset, insn); >> } >> >> >> So higher bits will be masked out. > >> The patched call instruction will be incorrect. > > That's not entirely correct. The null trampoline check is needed because on debug builds branches of distance >2M will fall into the `if (!Assembler::reachable_from_branch_at(addr(), x))` block but Graal would not have generated a trampoline for that call because it is still <128M. It is still safe to use that distance but it is just different than what HotSpot expects If we want to guarantee that a trampoline exists if `Assembler::reachable_from_branch_at` fails we would need to update Graal to use the check as well ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2112717962 From sparasa at openjdk.org Wed May 28 20:56:59 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Wed, 28 May 2025 20:56:59 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v23] In-Reply-To: <-kceEhIMg1R1fVYefLJ14cu5NeIRt2a_ZPw82ABwci8=.4dd111fc-bbf9-42dd-a17b-a3572a8c598d@github.com> References: <6aZaHfVvUJFLz83fyZ42bnoSGseaRBYd0jEg_VLdS2Q=.4c681def-ee7c-4fcd-b147-348d317ac58f@github.com> <1e-92EcDWshsTiFbEmJt8z5SAVfhf5vpr8sgbEq3BbQ=.25d6d5f7-48d3-4a13-ac7d-8844844490fa@github.com> <-kceEhIMg1R1fVYefLJ14cu5NeIRt2a_ZPw82ABwci8=.4dd111fc-bbf9-42dd-a17b-a3572a8c598d@github.com> Message-ID: On Mon, 26 May 2025 22:57:11 GMT, Srinivas Vamsi Parasa wrote: >>> @vamsi-parasa Testing launched, ping me again in 24h :) >> >> Thanks Emanuel (@eme64)! Please let me know if there're are any issues with the tests. > >> @vamsi-parasa Testing looked good, though now you pushed some more changes. I'd like to run tests one more time before integration. Please let me know when you are ready :) > > Hi Emanuel (@eme64), > > Thanks for the update! The new changes got approved and are ready for testing. > Could you please launch the tests? > > Thanks, > Vamsi > @vamsi-parasa Launched! Hi Emanuel (@eme64), Could you pls let me know when the testing is completed? Will integrate it if everything looks good. Thanks, Vamsi ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2917587711 From vlivanov at openjdk.org Wed May 28 21:48:52 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 28 May 2025 21:48:52 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3] In-Reply-To: References: Message-ID: <6v0ccqYSZk4OfbMCasFpwmVt7MGIV3Hw0f1ff7BcpJY=.b105566a-88ef-4171-88fe-54b139b00af0@github.com> On Mon, 26 May 2025 11:02:25 GMT, Aleksey Shipilev wrote: > But the whole point of this PR is that "current behavior" is incorrect, isn't it? Strictly speaking, current implementation has a defect and it requires a complete rewrite on C2 side to properly fix it. Current implementation is part of JDK for a long time (since 11). It's highly unlikely it'll be backported all the way to JDK 11 and it's an open question whether it should be backported at all. So, for diagnostic purposes it makes sense to provide a way to compare old and new implementations irrespective of whether old implementation still has the bug. > In other words, let's not rely on intrinsic to work for correctness; non-intrinsified version should be correct as well. A question for you: do you think we should test non-intrinsified case? Personally, I consider such requirement as way too strong. In this particular case, the method is unconditionally intrinsified in C2. If no intrinsification takes place, it's a bug. (I'm fine with adding an assert & abort compilation if C2 ever observes `Reference.reachabilityFence()` to be inlined.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2112806638 From eastigeevich at openjdk.org Wed May 28 22:24:58 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Wed, 28 May 2025 22:24:58 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v18] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 21:44:46 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Add JVMCINMethodData::has_mirror() src/hotspot/share/code/nmethod.cpp line 1580: > 1578: nm_copy->method()->set_code(mh, nm_copy); > 1579: make_not_used(); > 1580: } If `nm_copy->method()->code() != this`, we will return the copy which points at a method owning another code. This might be useful. Or we might return a broken copy. Should we allow this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2112865163 From duke at openjdk.org Wed May 28 22:35:01 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 28 May 2025 22:35:01 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v18] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 22:22:29 GMT, Evgeny Astigeevich wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Add JVMCINMethodData::has_mirror() > > src/hotspot/share/code/nmethod.cpp line 1580: > >> 1578: nm_copy->method()->set_code(mh, nm_copy); >> 1579: make_not_used(); >> 1580: } > > If `nm_copy->method()->code() != this`, we will return the copy which points at a method owning another code. This might be useful. Or we might return a broken copy. > Should we allow this? That's an interesting point. Now that I think about it I'm curious when `nm_copy->method()->code() != this` would actually happen. It's possible for nmethods that have been recompiled. In that situation though the first nmethod would have been marked as not used and would have failed the `is_relocatable` check in the first place. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2112873918 From xgong at openjdk.org Thu May 29 01:47:56 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Thu, 29 May 2025 01:47:56 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 12:26:31 GMT, Emanuel Peter wrote: >> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Refactor the JTReg tests for compare.xor(maskAll) >> >> Also made a bit change to support pattern `VectorMask.fromLong()`. >> - Merge branch 'master' into JDK-8354242 >> - Refactor code >> >> Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this >> optimization, making the code more modular. >> - Merge branch 'master' into JDK-8354242 >> - Update the jtreg test >> - Merge branch 'master' into JDK-8354242 >> - Addressed some review comments >> >> 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >> 2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >> This patch optimizes the following patterns: >> For integer types: >> ``` >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> ``` >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> ``` >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> ``` >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: >> With option `-XX:UseSVE=2`: >> ``` >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4... > > test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java line 49: > >> 47: private static final VectorSpecies L_SPECIES = LongVector.SPECIES_MAX; >> 48: private static final VectorSpecies F_SPECIES = FloatVector.SPECIES_MAX; >> 49: private static final VectorSpecies D_SPECIES = DoubleVector.SPECIES_MAX; > > Are you taking `SPECIES_MAX` on purpose here, or could we take `SPECIES_PREFERRED` instead? > @jatin-bhateja What is the best to do in these tests? I suppose best would be to test with all vector lengths... Thanks for pointing out this @eme64 ! Per my understanding, `SPECIES_MAX` is almost the same with `SPECIES_PREFERRED` in this case which are all specified to the max vector size of a hardware. Since the max vector size is different on different architectures, not all vector lengths are supported to be intrinsified on a specified architecture like AArch64, especially the SVE arch with different vector register size. Hence, just testing the max species makes sense to me as this is a mid-end common transformation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2113020416 From fyang at openjdk.org Thu May 29 05:50:51 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 29 May 2025 05:50:51 GMT Subject: RFR: 8357968: RISC-V: Interpreter volatile reference stores with G1 are not sequentially consistent In-Reply-To: References: Message-ID: On Wed, 28 May 2025 16:47:06 GMT, Robbin Ehn wrote: > Hi please consider. > > As ref: https://github.com/openjdk/jdk/pull/25483 > As suggested in that PR - I removed these helpers as it's very hard to see that you get registers clobbered. > > Sanity tested, running t1. > > /Robbin Thanks! I performed `hs:tier1`-`hs:tier3` test on SG2042 platform. Result looks good. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25502#pullrequestreview-2877164529 From shade at openjdk.org Thu May 29 06:18:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 29 May 2025 06:18:51 GMT Subject: RFR: 8357223: AArch64: Optimize interpreter profile updates In-Reply-To: References: Message-ID: On Wed, 28 May 2025 20:21:20 GMT, Chad Rakoczy wrote: > [JDK-8357223](https://bugs.openjdk.org/browse/JDK-8357223) > > The aarch64 version of [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946) > > The reasoning for this change is the same as the x86 version's PR: > >> First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. >> >> Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. > > Additional testing: > > - [x] Linux aarch64 fastdebug tier 1/2/3/4 Looks fine, with a nit: src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 950: > 948: assert(DataLayout::counter_increment == 1, > 949: "flow-free idiom only works with 1"); > 950: This assert is unnecessary, IMO. x86 version removed it as well. ------------- PR Review: https://git.openjdk.org/jdk/pull/25512#pullrequestreview-2877213152 PR Review Comment: https://git.openjdk.org/jdk/pull/25512#discussion_r2113285193 From shade at openjdk.org Thu May 29 06:53:54 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 29 May 2025 06:53:54 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3] In-Reply-To: <6v0ccqYSZk4OfbMCasFpwmVt7MGIV3Hw0f1ff7BcpJY=.b105566a-88ef-4171-88fe-54b139b00af0@github.com> References: <6v0ccqYSZk4OfbMCasFpwmVt7MGIV3Hw0f1ff7BcpJY=.b105566a-88ef-4171-88fe-54b139b00af0@github.com> Message-ID: On Wed, 28 May 2025 21:46:37 GMT, Vladimir Ivanov wrote: >> But the whole point of this PR is that "current behavior" is incorrect, isn't it? >> >> I think disabling `_Reference_reachabilityFence` intrinsic (or, failing to inline the intrinsic for some other reason) should fail-safe to non-inlined method, not fail-deadly to a broken RF. In other words, let's not rely on intrinsic to work for correctness; non-intrinsified version should be correct as well. >> >> I agree `@DontInline` would require a bit of extra fiddling in C1, but I suspect it should be as easy as copy-pasting a few hunks around `LIRGenerator::do_blackhole`. > >> But the whole point of this PR is that "current behavior" is incorrect, isn't it? > > Strictly speaking, current implementation has a defect and it requires a complete rewrite on C2 side to properly fix it. > > Current implementation is part of JDK for a long time (since 11). It's highly unlikely it'll be backported all the way to JDK 11 and it's an open question whether it should be backported at all. So, for diagnostic purposes it makes sense to provide a way to compare old and new implementations irrespective of whether old implementation still has the bug. > >> In other words, let's not rely on intrinsic to work for correctness; non-intrinsified version should be correct as well. > > A question for you: do you think we should test non-intrinsified case? > > Personally, I consider such requirement as way too strong. In this particular case, the method is unconditionally intrinsified in C2. If no intrinsification takes place, it's a bug. (I'm fine with adding an assert & abort compilation if C2 ever observes `Reference.reachabilityFence()` to be inlined.) I think two general principles apply here: a) intrinsics are performance optimizations, not correctness building blocks; b) when _forced to choose_, we prefer correctness over performance. This is not about the backports, but about having the correct fallback when we need it. Imagine, for a second, this fix has a major bug discovered later. We instruct users to disable the intrinsic to avoid that bug. Users then ask: "Is it safe to disable the intrinsic? Would my application crash/misbehave without it? Would it run slower?". Right now we have a choice what we can answer. With `@DontInline`, we say "Yes, it would run slower, but RF would still work against misbehaviors". Without `@DontInline`, we say "Yes, there is a possibility of misbehavior, but at least it would run as fast". One of these answers goes against the general principle (b). I also cannot remember any existing intrinsic that is a counter-example for (a). Without `@DontInline`, RF would be a first? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2113331201 From hgreule at openjdk.org Thu May 29 07:08:21 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 29 May 2025 07:08:21 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v3] In-Reply-To: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: > This change improves the precision of the `Mod(I|L)Node::Value()` functions. > > I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early. > The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions. > > ### Monotonicity > > Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range). > > ### Testing > > I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something). > > Please review and let me know what you think. > > ### Other > > The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508. > > During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into: > - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement? > - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd. Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: Use BasicType for shared implementation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25254/files - new: https://git.openjdk.org/jdk/pull/25254/files/20fe91d6..f93aeb12 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25254&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25254&range=01-02 Stats: 112 lines in 1 file changed: 12 ins; 71 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/25254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25254/head:pull/25254 PR: https://git.openjdk.org/jdk/pull/25254 From hgreule at openjdk.org Thu May 29 07:11:52 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 29 May 2025 07:11:52 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v2] In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: <_SEjgOdrZ2FxZK-Mm_oFuER5MtCopOmXoc9KCSigFoU=.e3182f94-6683-4745-904d-f4192ba30b41@github.com> On Wed, 28 May 2025 09:53:32 GMT, Emanuel Peter wrote: >> Hannes Greule has updated the pull request incrementally with three additional commits since the last revision: >> >> - Update ModL comment >> - Use TOP instead of ZERO >> - Apply suggested test changes > > @SirYwell Thanks for looking into this, that looks promising! > > I have two bigger comments: > - Could we unify the L and I code, either using C++ templating or `BasicType`? It would reduce code duplication. > - Can we have some tests where the input ranges are random as well, and where we check the output ranges with some comparisons? > > ------------------ > Copied from the code comment: > >> Nice work with the examples you already have, and randomizing some of it! >> >> I would like to see one more generalized test. >> - compute `res = lhs % rhs` >> - Truncate both `lhs` and `rhs` with randomly produced bounds from Generators, like this: `lhs = Math.max(lo, Math.min(hi, lhs))`. >> - Below, add all sorts of comparisons with random constants, like this: `if (res < CON) { sum += 1; }`. If the output range is wrong, this could wrongly constant fold, and allow us to catch that. >> >> Then fuzz the generated method a few times with random inputs for `lhs` and `rhs`, and check that the `sum` and `res` value are the same for compiled and interpreted code. >> >> I hope that makes sense :) >> This is currently my best method to check if ranges are correct, and I think it is quite important because often tests are only written with constants in mind, but less so with ranges, and then we mess up the ranges because it is just too tricky. >> >> This is an example, where I asked someone to try this out as well: >> https://github.com/openjdk/jdk/pull/23089/files#diff-12bebea175a260a6ab62c22a3681ccae0c3d9027900d2fdbd8c5e856ae7d1123R404-R422 Thanks @eme64. I unified the code now using `BasicType`. This works well because we can use the jlong operations everywhere (if I didn't miss something, please verify that claim). You can probably compare it to the unsigned_mod_value that is currently templated. I assume using BasicType there would be more involved because signed -> unsigned conversion depends on the actual type (i.e. the unsigned value of -1 is different for long vs int). I'll also look into your suggestions for the tests, thanks for the input there. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25254#issuecomment-2918524098 From duke at openjdk.org Thu May 29 07:57:56 2025 From: duke at openjdk.org (erifan) Date: Thu, 29 May 2025 07:57:56 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6] In-Reply-To: <9u6hJ-WgnHLMaYBa8ViRdpUZY-bI2wOk-TCRKWJJdqk=.b3303f1f-da3b-4c2e-8f0c-a2e16ba9688e@github.com> References: <9u6hJ-WgnHLMaYBa8ViRdpUZY-bI2wOk-TCRKWJJdqk=.b3303f1f-da3b-4c2e-8f0c-a2e16ba9688e@github.com> Message-ID: On Wed, 28 May 2025 12:16:23 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.cpp line 2244: >> >>> 2242: // BoolTest doesn't support unsigned comparisons. >>> 2243: BoolTest::mask neg_cond = >>> 2244: (BoolTest::mask) (((VectorMaskCmpNode*) in1)->get_predicate() ^ 4); >> >> What is the hard-coded `^ 4` here? This whole line looks like we are looking at internals of the `VectorMaskCmpNode` or its predicate, and we should probably do that in some method there? Or maybe it should be part of the `BoolTest(::mask)` interface? > > Also: You now cast `(VectorMaskCmpNode*) in1` twice. Can we not do `as_VectorMaskCmp()`? Or could we at least cast it only once, and then use it as `in1_mask_cmp` instead? > What is the hard-coded ^ 4 here? This is to negate the comparison condition. We can't use `BoolTest::negate()` here because the comparison condition may be **unsigned** comparison. Since there's already a `negate()` function in `BoolTest`, so I tend to add a new function `get_negative_predicate` for this into class `VectorMaskCmpNode`. > Also: You now cast (VectorMaskCmpNode*) in1 twice. Can we not do as_VectorMaskCmp()? Or could we at least cast it only once, and then use it as in1_mask_cmp instead? For the first cast, I think you mean if (in1->Opcode() != Op_VectorMaskCmp || in1->outcnt() > 1 || !((VectorMaskCmpNode*) in1)->predicate_can_be_negated() || !VectorNode::is_all_ones_vector(in2)) { return nullptr; } To remove one cast, then we have to split the above `if` because `in1` may not be a `VectorMaskCmpNode`. if (in1->Opcode() != Op_VectorMaskCmp) { return nullptr; } VectorMaskCmpNode* in1_as_mask_cmp = (VectorMaskCmpNode*) in1; if (in1->outcnt() > 1 || !in1_as_mask_cmp->predicate_can_be_negated() || !VectorNode::is_all_ones_vector(in2)) { return nullptr; } BoolTest::mask neg_cond = (BoolTest::mask) (in1_as_mask_cmp->get_predicate() ^ 4); Does this look better to you ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2113423376 From duke at openjdk.org Thu May 29 08:02:56 2025 From: duke at openjdk.org (erifan) Date: Thu, 29 May 2025 08:02:56 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 12:12:50 GMT, Emanuel Peter wrote: >> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Refactor the JTReg tests for compare.xor(maskAll) >> >> Also made a bit change to support pattern `VectorMask.fromLong()`. >> - Merge branch 'master' into JDK-8354242 >> - Refactor code >> >> Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this >> optimization, making the code more modular. >> - Merge branch 'master' into JDK-8354242 >> - Update the jtreg test >> - Merge branch 'master' into JDK-8354242 >> - Addressed some review comments >> >> 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >> 2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >> This patch optimizes the following patterns: >> For integer types: >> ``` >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> ``` >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> ``` >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> ``` >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: >> With option `-XX:UseSVE=2`: >> ``` >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4... > > src/hotspot/share/opto/vectornode.cpp line 2233: > >> 2231: if (in2->Opcode() == Op_VectorMaskCast) { >> 2232: in2 = in2->in(1); >> 2233: } > > Wow, this seems to be an addition that is not covered in the patterns you mention above, right? > But is that even necessary? > I suppose here `in2 = VectorMaskCast(all_ones_vector)`. > Would we not already want to transform this pattern in `VectorMaskCast::Ideal`, is that not possible and more powerful? Oh yeah, I forgot to mention it in the above comment and commit message. Yes, this is for `in2 = VectorMaskCast(all_ones_vector)`. I agree it's better to do this transformation in `VectorMaskCast::Ideal`. I'll remove this code change and do the `VectorMaskCast` optimization later. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2113430734 From duke at openjdk.org Thu May 29 08:06:54 2025 From: duke at openjdk.org (erifan) Date: Thu, 29 May 2025 08:06:54 GMT Subject: RFR: 8354242: VectorAPI: combine vector not operation with compare [v6] In-Reply-To: <9f_-MMP_SSYInzCTUc5scRzKOKN1jj4VcnGEYWoOF14=.2eca7e44-d1df-4e57-a513-d8ddaddc9ea2@github.com> References: <9f_-MMP_SSYInzCTUc5scRzKOKN1jj4VcnGEYWoOF14=.2eca7e44-d1df-4e57-a513-d8ddaddc9ea2@github.com> Message-ID: <4a5PBPvkJGne0xghQtU2_IGLvh5ZLcTxr21zICMTuC8=.1f2697d9-e96e-4421-b07d-2f372f54f079@github.com> On Fri, 16 May 2025 07:40:53 GMT, erifan wrote: >> erifan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Refactor the JTReg tests for compare.xor(maskAll) >> >> Also made a bit change to support pattern `VectorMask.fromLong()`. >> - Merge branch 'master' into JDK-8354242 >> - Refactor code >> >> Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this >> optimization, making the code more modular. >> - Merge branch 'master' into JDK-8354242 >> - Update the jtreg test >> - Merge branch 'master' into JDK-8354242 >> - Addressed some review comments >> >> 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. >> 2. Improve code comments. >> - Merge branch 'master' into JDK-8354242 >> - Merge branch 'master' into JDK-8354242 >> - 8354242: VectorAPI: combine vector not operation with compare >> >> This patch optimizes the following patterns: >> For integer types: >> ``` >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> ``` >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> ``` >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> ``` >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: >> With option `-XX:UseSVE=2`: >> ``` >> Benchmark Unit Before Score Error After Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4... > > Hi, I have updated the code and I'll file the patch to convert `VectorMask.fromLong(SPECIES, -1)` to `maskAll()` soon, I'll cover this test case in that patch. Would you please help review the patch again, thanks! > @erifan Looks like you really improved things, nice work! I have some more comments below :) I will modify the code according to your suggestions and commit again after all the test passes locally. Thanks for you careful review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24674#issuecomment-2918641646 From aph at openjdk.org Thu May 29 09:24:51 2025 From: aph at openjdk.org (Andrew Haley) Date: Thu, 29 May 2025 09:24:51 GMT Subject: RFR: 8357223: AArch64: Optimize interpreter profile updates In-Reply-To: References: Message-ID: On Wed, 28 May 2025 20:21:20 GMT, Chad Rakoczy wrote: > [JDK-8357223](https://bugs.openjdk.org/browse/JDK-8357223) > > The aarch64 version of [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946) > > The reasoning for this change is the same as the x86 version's PR: > >> First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. >> >> Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. > > Additional testing: > > - [x] Linux aarch64 fastdebug tier 1/2/3/4 src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 953: > 951: ldr(rscratch1, addr); > 952: add(rscratch1, rscratch1, DataLayout::counter_increment); > 953: str(rscratch1, addr); Suggestion: increment(addr, DataLayout::counter_increment); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25512#discussion_r2113563809 From eastigeevich at openjdk.org Thu May 29 11:28:01 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 29 May 2025 11:28:01 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v18] In-Reply-To: <0tmWzYMOS7jyjgoJL0mBMRywf6mCEBkSTQ7jdRE7Xtg=.5857550c-35e0-4cbb-8bd8-0542ae1b70a5@github.com> References: <2HApmZeeYmB9G5gttb7G9zKLyTMSQXwrXODoYgvYmQM=.743583e2-7918-4900-9dbd-7223917cf310@github.com> <0tmWzYMOS7jyjgoJL0mBMRywf6mCEBkSTQ7jdRE7Xtg=.5857550c-35e0-4cbb-8bd8-0542ae1b70a5@github.com> Message-ID: <62uTtu5i-RDdM1Lnk0i_2JXoNdbJzcn4CBXdCGBU3B0=.48748b12-0871-46b3-9754-b42943fdbad5@github.com> On Wed, 28 May 2025 20:43:03 GMT, Chad Rakoczy wrote: >>> The patched call instruction will be incorrect. >> >> That's not entirely correct. The null trampoline check is needed because on debug builds branches of distance >2M will fall into the `if (!Assembler::reachable_from_branch_at(addr(), x))` block but Graal would not have generated a trampoline for that call because it is still <128M. It is still safe to use that distance but it is just different than what HotSpot expects > > If we want to guarantee that a trampoline exists if `Assembler::reachable_from_branch_at` fails we would need to update Graal to use the check as well > The null trampoline check is needed because on debug builds branches of distance >2M will fall into the if (!Assembler::reachable_from_branch_at(addr(), x)) block but Graal would not have generated a trampoline for that call because it is still <128M. It is still safe to use that distance but it is just different than what HotSpot expects This logic looks strange to me. You are saying that a trampoline is only null in case of Graal but dest is always valid in this case. This is a bug in Graal: it always uses 128M branch range despite Hotspot can change the range to smaller values in debug builds. When Graal fixes the bug you will have undefined behaviour in this place. We must handle the situation where no trampoline is available. Options: 1. This is a bug in code generation. If the bug can be easy to reproduce with debug builds, use assert. If no, use guarantee. 2. This is an expected case. We need to generate a trampoline. This can be complicated. I think it's a bug situation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2113752991 From dnsimon at openjdk.org Thu May 29 13:04:40 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 29 May 2025 13:04:40 GMT Subject: RFR: 8357619: [JVMCI] Revisit phantom_ref parameter in JVMCINMethodData::get_nmethod_mirror Message-ID: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> The point of the `phantom_ref` parameter (introduced by [JDK-8234359](https://bugs.openjdk.org/browse/JDK-8234359)) of `JVMCINMethodData::get_nmethod_mirror` is to avoid the special resurrection semantics of a phantom read when reading the field during GC, which is when `JVMCINMethodData::invalidate_nmethod_mirror` can be called. This case can be handled directly in `JVMCINMethodData::invalidate_nmethod_mirror` and so the `phantom_ref` parameter can be removed. ------------- Commit messages: - remove phantom_ref arg from JVMCINMethodData::get_nmethod_mirror Changes: https://git.openjdk.org/jdk/pull/25488/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25488&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357619 Stats: 14 lines in 3 files changed: 1 ins; 5 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/25488.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25488/head:pull/25488 PR: https://git.openjdk.org/jdk/pull/25488 From eosterlund at openjdk.org Thu May 29 13:04:41 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 29 May 2025 13:04:41 GMT Subject: RFR: 8357619: [JVMCI] Revisit phantom_ref parameter in JVMCINMethodData::get_nmethod_mirror In-Reply-To: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> References: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> Message-ID: <5xZ0aIZT-xJM_h06TD061mZ_3T1qAPkd1F75vipRJ_w=.7d0e9d10-3a6c-4d1f-b457-aa0e1dd61560@github.com> On Wed, 28 May 2025 10:28:38 GMT, Doug Simon wrote: > The point of the `phantom_ref` parameter (introduced by [JDK-8234359](https://bugs.openjdk.org/browse/JDK-8234359)) of `JVMCINMethodData::get_nmethod_mirror` is to avoid the special resurrection semantics of a phantom read when reading the field during GC, which is when `JVMCINMethodData::invalidate_nmethod_mirror` can be called. > This case can be handled directly in `JVMCINMethodData::invalidate_nmethod_mirror` and so the `phantom_ref` parameter can be removed. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 2834: > 2832: // Only the mirror in the HotSpot heap is accessible > 2833: // through JVMCINMethodData > 2834: oop nmethod_mirror = data->get_nmethod_mirror(nm); Is the nmethod guaranteed to be on-stack here? If not it gotta be phantom. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25488#discussion_r2112390139 From dnsimon at openjdk.org Thu May 29 13:04:41 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 29 May 2025 13:04:41 GMT Subject: RFR: 8357619: [JVMCI] Revisit phantom_ref parameter in JVMCINMethodData::get_nmethod_mirror In-Reply-To: <5xZ0aIZT-xJM_h06TD061mZ_3T1qAPkd1F75vipRJ_w=.7d0e9d10-3a6c-4d1f-b457-aa0e1dd61560@github.com> References: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> <5xZ0aIZT-xJM_h06TD061mZ_3T1qAPkd1F75vipRJ_w=.7d0e9d10-3a6c-4d1f-b457-aa0e1dd61560@github.com> Message-ID: On Wed, 28 May 2025 17:15:48 GMT, Erik ?sterlund wrote: >> The point of the `phantom_ref` parameter (introduced by [JDK-8234359](https://bugs.openjdk.org/browse/JDK-8234359)) of `JVMCINMethodData::get_nmethod_mirror` is to avoid the special resurrection semantics of a phantom read when reading the field during GC, which is when `JVMCINMethodData::invalidate_nmethod_mirror` can be called. >> This case can be handled directly in `JVMCINMethodData::invalidate_nmethod_mirror` and so the `phantom_ref` parameter can be removed. > > src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 2834: > >> 2832: // Only the mirror in the HotSpot heap is accessible >> 2833: // through JVMCINMethodData >> 2834: oop nmethod_mirror = data->get_nmethod_mirror(nm); > > Is the nmethod guaranteed to be on-stack here? If not it gotta be phantom. Is the use of `JVMCINMethodHandle` equivalent to `nm` being on-stack? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25488#discussion_r2112482026 From eosterlund at openjdk.org Thu May 29 13:04:41 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 29 May 2025 13:04:41 GMT Subject: RFR: 8357619: [JVMCI] Revisit phantom_ref parameter in JVMCINMethodData::get_nmethod_mirror In-Reply-To: References: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> <5xZ0aIZT-xJM_h06TD061mZ_3T1qAPkd1F75vipRJ_w=.7d0e9d10-3a6c-4d1f-b457-aa0e1dd61560@github.com> Message-ID: On Wed, 28 May 2025 18:10:39 GMT, Doug Simon wrote: >> src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 2834: >> >>> 2832: // Only the mirror in the HotSpot heap is accessible >>> 2833: // through JVMCINMethodData >>> 2834: oop nmethod_mirror = data->get_nmethod_mirror(nm); >> >> Is the nmethod guaranteed to be on-stack here? If not it gotta be phantom. > > Is the use of `JVMCINMethodHandle` equivalent to `nm` being on-stack? Yes. Great, so that should be fine then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25488#discussion_r2112673894 From eastigeevich at openjdk.org Thu May 29 15:01:02 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 29 May 2025 15:01:02 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v18] In-Reply-To: References: Message-ID: On Wed, 28 May 2025 22:32:23 GMT, Chad Rakoczy wrote: >> src/hotspot/share/code/nmethod.cpp line 1580: >> >>> 1578: nm_copy->method()->set_code(mh, nm_copy); >>> 1579: make_not_used(); >>> 1580: } >> >> If `nm_copy->method()->code() != this`, we will return the copy which points at a method owning another code. This might be useful. Or we might return a broken copy. >> Should we allow this? > > That's an interesting point. Now that I think about it I'm curious when `nm_copy->method()->code() != this` would actually happen. It's possible for nmethods that have been recompiled. In that situation though the first nmethod would have been marked as not used and would have failed the `is_relocatable` check in the first place. The current protocol: 1. The `is_relocatable` check: if `true`, nmethod is in use. `method()` should be non-null. 2. `CodeCache::gc_on_allocation()`: this can invalidate nmethod. 3. Acquire `Compile_lock`, `CodeCache_lock`: `CodeCache_lock` to prevent CodeCache from modifications. `Compile_lock` blocks deoptimization of nmethod and invalidation of deps. 4. Construct a copy. 5. Invalidate CPU i-cache. 6. Acquire `NMethodState_lock`: the lock prevents nmethod from changing its state. 7. Set the copy as the current code. I think we need the following the protocol: 1. Invoke `CodeCache::gc_on_allocation()` first. 2. Invoke `is_relocatable`. 3. Acquire `Compile_lock`, `CodeCache_lock`. 4. Construct a copy. 5. Acquire `NMethodState_lock`. 6. Check nmethod is not mark for deoptimization and is in use and make_in_use. 6.1. Invalidate CPU i-cache. 6.2. Set code 6.3 Make not used 6.4 Return copy 7. Return nullptr ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2114150497 From kxu at openjdk.org Thu May 29 15:07:48 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 29 May 2025 15:07:48 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v5] In-Reply-To: References: Message-ID: > This PR refactors `PhaseIdealLoop::is_counted_loop()` into (mostly) `CountedLoopConverter::is_counted_loop()` and `CountedLoopConverter::convert()` to decouple the detection and conversion code. This enables us to try different loop configurations easily and finally convert once a counted loop is found. > > A nested `PhaseIdealLoop::CountedLoopConverter` class is created to handle the context, but I'm not if this is the best name or place for it. Please let me know what you think. > > Blocks [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759). Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 21 additional commits since the last revision: - Merge remote-tracking branch 'origin/master' into counted-loop-refactor - further refactor is_counted_loop() by extracting functions - WIP: refactor is_counted_loop() - WIP: refactor is_counted_loop() - WIP: review followups - reviewer suggested changes - line break - remove TODOs - Revert "improve formatting, naming, comments" This reverts commit fd6071761bdc47ab5695559dffd1e1dd6038d9f7. - improve formatting, naming, comments - ... and 11 more: https://git.openjdk.org/jdk/compare/ba6bd234...10635a07 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24458/files - new: https://git.openjdk.org/jdk/pull/24458/files/25cfe289..10635a07 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24458&range=03-04 Stats: 421775 lines in 4901 files changed: 168723 ins; 228889 del; 24163 mod Patch: https://git.openjdk.org/jdk/pull/24458.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24458/head:pull/24458 PR: https://git.openjdk.org/jdk/pull/24458 From eastigeevich at openjdk.org Thu May 29 15:11:59 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Thu, 29 May 2025 15:11:59 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v3] In-Reply-To: References: <8l4e6nqzNukJ6st0fEkLwKqlF35stq_W9ph831eo8w4=.6cbb2172-b35a-4d27-bab7-1d104c9f993b@github.com> Message-ID: On Sat, 15 Mar 2025 00:41:58 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/code/nmethod.hpp line 587: >> >>> 585: address immutable_data_references_begin () const { return _immutable_data + _immutable_data_references_offset ; } >>> 586: address immutable_data_references_end () const { return immutable_data_end(); } >>> 587: >> >> If we are going to add typed fields to this data, maybe we should put it in a struct/class header at the beginning so we can access the field directly? > > I am not sure adding one integers field justifies new structure. It is better to keep this counter at the end of this block since it is accessed only few times. IMO we should declare the type of `immutable_data_references` and use it in `sizeof` instead of `int`. `sizeof(int)` looks too cryptic. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2114171931 From kxu at openjdk.org Thu May 29 17:29:52 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 29 May 2025 17:29:52 GMT Subject: RFR: 8353290: C2: Refactor PhaseIdealLoop::is_counted_loop() [v3] In-Reply-To: References: Message-ID: On Tue, 22 Apr 2025 07:42:42 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> WIP: review followups > > Was out last week but I'm seeing your last commit mentions WIP. Let me know when it's ready to have another look again :-) @chhagedorn Sorry this took longer than I'd like. I broke down the huge `is_counted_loop()` and extracted trip-counting checks to separate calls. Hopefully this makes the code clearer. Could you please re-review if possible? Thank you very much! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24458#issuecomment-2920083936 From vlivanov at openjdk.org Thu May 29 17:35:52 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 29 May 2025 17:35:52 GMT Subject: RFR: 8290892: C2: Intrinsify Reference.reachabilityFence [v3] In-Reply-To: References: <6v0ccqYSZk4OfbMCasFpwmVt7MGIV3Hw0f1ff7BcpJY=.b105566a-88ef-4171-88fe-54b139b00af0@github.com> Message-ID: On Thu, 29 May 2025 06:51:16 GMT, Aleksey Shipilev wrote: > a) intrinsics are performance optimizations, not correctness building blocks; The fact that many intrinsics are performance optimizations doesn't mean all intrinsics should be. > I also cannot remember any existing intrinsic that is a counter-example for (a) `Reference::get`. > This is not about the backports, but about having the correct fallback when we need it. I provided it as a justification why we may need access to current implementation even if we know it's not 100% correct. > Users then ask: "Is it safe to disable the intrinsic? Would my application crash/misbehave without it? Would it run slower?" There's no single answer possible here and it depending on what users expect. If there's a severe regression after the fix, as the stop-the-gap solution, it's common to restore original behavior users had before the change arrived (and that's `-XX:DisableIntrinsic=_Reference_reachabilityFence`). If somebody asks for a workaround which is 100% correct, it can be suggested to also disable method inlining (`-XX:CompileCommand=dontinline,java.lang.ref.Reference::reachabilityFence`), but it won't be the mode they run before and it'll introduce additional risks for them (mostly, performance-wise). So, there's a way to achieve both. Also, `-XX:DisableIntrinsic=_Reference_reachabilityFence` covers the most common scenario we may need and it's the simplest one from implementation POV. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25315#discussion_r2114437062 From kvn at openjdk.org Thu May 29 18:50:27 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 29 May 2025 18:50:27 GMT Subject: RFR: 8357175: Failure to generate or load AOT code should be handled gracefully Message-ID: By default a failed AOT code should be discarded with UL message about it by request (`-Xlog:aot+codecache+*=debug`) and VM and AOT code processing should continue run. Unless we hit some catastrophic failure: OOM for example. This is similar how JIT compilers behave. I reordered VM configuration settings checking (`Config::verify()`) so that we switch off AOT code caching type which depends on these VM settings. For example, AOT adapters do not operate on oops - they are not affected by compressed oops settings/encoding. I removed `_objectAlignment` check because CDS already does this check when open archive. The AOT relocation processing for a blob will skip this blob when corresponding address is not found instead of bailing out VM in product mode. In debug VM it will issue assert so we know about missing address. These changes are in `AOTCodeAddressTable::id_for_address()` I kept `fatal()` in `AOTCodeAddressTable::for_address_for_id()` for incorrect ID we read from archive. The archive could be corrupted if ID is wrong. I did small code cleanup/renaming. Tested: tier1-10 ------------- Commit messages: - 8357175: Failure to generate or load AOT code should be handled gracefully Changes: https://git.openjdk.org/jdk/pull/25525/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25525&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357175 Stats: 143 lines in 2 files changed: 55 ins; 44 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/25525.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25525/head:pull/25525 PR: https://git.openjdk.org/jdk/pull/25525 From duke at openjdk.org Thu May 29 19:55:01 2025 From: duke at openjdk.org (duke) Date: Thu, 29 May 2025 19:55:01 GMT Subject: Withdrawn: 8352316: More MergeStoreBench In-Reply-To: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: <_nIPPpxLXQ9RaMuM56xQfaj0L1E2iUV-tSsVlWQi_bc=.fcd6fa86-540b-4211-b84d-5b0a7c9cd747@github.com> On Wed, 19 Mar 2025 03:28:59 GMT, Shaojin Wen wrote: > Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/24108 From duke at openjdk.org Thu May 29 20:40:59 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 29 May 2025 20:40:59 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v18] In-Reply-To: References: Message-ID: <1PNflwgt1b7xPnhS3SU6ug5UsX2jnfo724zDqPy6RLg=.17e285cb-2e62-4bd3-9be0-ae9626ba8678@github.com> On Thu, 29 May 2025 14:58:00 GMT, Evgeny Astigeevich wrote: >> That's an interesting point. Now that I think about it I'm curious when `nm_copy->method()->code() != this` would actually happen. It's possible for nmethods that have been recompiled. In that situation though the first nmethod would have been marked as not used and would have failed the `is_relocatable` check in the first place. > > The current protocol: > 1. The `is_relocatable` check: if `true`, nmethod is in use. `method()` should be non-null. > 2. `CodeCache::gc_on_allocation()`: this can invalidate nmethod. > 3. Acquire `Compile_lock`, `CodeCache_lock`: `CodeCache_lock` to prevent CodeCache from modifications. `Compile_lock` blocks deoptimization of nmethod and invalidation of deps. > 4. Construct a copy. > 5. Invalidate CPU i-cache. > 6. Acquire `NMethodState_lock`: the lock prevents nmethod from changing its state. > 7. Set the copy as the current code. > > I think we need the following the protocol: > > 1. Invoke `CodeCache::gc_on_allocation()` first. > 2. Invoke `is_relocatable`. > 3. Acquire `Compile_lock`, `CodeCache_lock`. > 4. Construct a copy. > 5. Acquire `NMethodState_lock`. > 6. Check nmethod is not mark for deoptimization and is in use and make_in_use. > 6.1. Invalidate CPU i-cache. > 6.2. Set code > 6.3 Make not used > 6.4 Return copy > 7. Return nullptr I'm not sure it is safe to call `is_relocatable` after calling `CodeCache::gc_on_allocation()`. Like you mentioned it can invalidate the nmethod and any use of it could break ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2114713909 From duke at openjdk.org Thu May 29 23:04:25 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 29 May 2025 23:04:25 GMT Subject: RFR: 8357223: AArch64: Optimize interpreter profile updates [v2] In-Reply-To: References: Message-ID: <7wo-_Wt-EiVGKgxMxU_MnTA8o1QQxH_LDtNzDShlOIY=.9c8093b7-ed4b-487d-afbe-5227362f1ade@github.com> > [JDK-8357223](https://bugs.openjdk.org/browse/JDK-8357223) > > The aarch64 version of [JDK-8356946](https://bugs.openjdk.org/browse/JDK-8356946) > > The reasoning for this change is the same as the x86 version's PR: > >> First, we carry the implementation for counter decrements without using them. This is dead code, and can be purged. >> >> Second, we care about overflows for 64-bit for some reason. I think this is a reminiscent of 32-bit x86 support, where we can plausibly have 32-bit counter overflow in a reasonable timeframe. But for 64-bit counter, we need tens of years of constantly bashing the counter to get it to overflow. No other profile counter update code, e.g. in C1, cares about this. > > Additional testing: > > - [x] Linux aarch64 fastdebug tier 1/2/3/4 Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Address comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25512/files - new: https://git.openjdk.org/jdk/pull/25512/files/5368815e..0a336523 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25512&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25512&range=00-01 Stats: 6 lines in 1 file changed: 0 ins; 5 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25512.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25512/head:pull/25512 PR: https://git.openjdk.org/jdk/pull/25512 From duke at openjdk.org Thu May 29 23:18:43 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 29 May 2025 23:18:43 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v19] In-Reply-To: References: Message-ID: <17al0aeFhm0iZHoHHGiqB03RfPeSrIHIoZuapOHPuy4=.a2ff2d67-392b-40f0-b6d9-6e3a7f396e8a@github.com> > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with four additional commits since the last revision: - Add requires GC to tests - Add type for immutable_data_references - Fix incorrect destination set if no trampoline available - Update assert note in nmethod::clear_inline_caches ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/a0134a87..c5ff58f4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=17-18 Stats: 37 lines in 6 files changed: 22 ins; 1 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Thu May 29 23:24:58 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Thu, 29 May 2025 23:24:58 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v3] In-Reply-To: References: <8l4e6nqzNukJ6st0fEkLwKqlF35stq_W9ph831eo8w4=.6cbb2172-b35a-4d27-bab7-1d104c9f993b@github.com> Message-ID: On Fri, 14 Mar 2025 22:03:44 GMT, Dean Long wrote: >> Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: >> >> Share immutable data between copied nmethods > > src/hotspot/share/code/nmethod.hpp line 587: > >> 585: address immutable_data_references_begin () const { return _immutable_data + _immutable_data_references_offset ; } >> 586: address immutable_data_references_end () const { return immutable_data_end(); } >> 587: > > If we are going to add typed fields to this data, maybe we should put it in a struct/class header at the beginning so we can access the field directly? @dean-long @vnkozlov I added `IMMUTABLE_DATA_REFERENCES` ([source](https://github.com/chadrako/jdk/blob/c5ff58f4e22cbbcdbe06997efe482f73fcee73f5/src/hotspot/share/code/nmethod.hpp#L577-L582)) to make the code more readable. Is that sufficient or would you like to see a struct to represent this instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2114885140 From kvn at openjdk.org Fri May 30 00:16:58 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 30 May 2025 00:16:58 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v3] In-Reply-To: References: <8l4e6nqzNukJ6st0fEkLwKqlF35stq_W9ph831eo8w4=.6cbb2172-b35a-4d27-bab7-1d104c9f993b@github.com> Message-ID: On Thu, 29 May 2025 23:22:25 GMT, Chad Rakoczy wrote: >> src/hotspot/share/code/nmethod.hpp line 587: >> >>> 585: address immutable_data_references_begin () const { return _immutable_data + _immutable_data_references_offset ; } >>> 586: address immutable_data_references_end () const { return immutable_data_end(); } >>> 587: >> >> If we are going to add typed fields to this data, maybe we should put it in a struct/class header at the beginning so we can access the field directly? > > @dean-long @vnkozlov I added `IMMUTABLE_DATA_REFERENCES` ([source](https://github.com/chadrako/jdk/blob/c5ff58f4e22cbbcdbe06997efe482f73fcee73f5/src/hotspot/share/code/nmethod.hpp#L577-L582)) to make the code more readable. Is that sufficient or would you like to see a struct to represent this instead? The more I look on it the more I like Dean's idea. I am withdrawing my previous objection. Let's have a **NOT virtual** class similar to CodeBlob. Consider moving associated `*_offset` fields from `nmethod` to new class. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2114924318 From dlong at openjdk.org Fri May 30 00:53:01 2025 From: dlong at openjdk.org (Dean Long) Date: Fri, 30 May 2025 00:53:01 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v3] In-Reply-To: References: <8l4e6nqzNukJ6st0fEkLwKqlF35stq_W9ph831eo8w4=.6cbb2172-b35a-4d27-bab7-1d104c9f993b@github.com> Message-ID: On Fri, 30 May 2025 00:13:45 GMT, Vladimir Kozlov wrote: >> @dean-long @vnkozlov I added `IMMUTABLE_DATA_REFERENCES` ([source](https://github.com/chadrako/jdk/blob/c5ff58f4e22cbbcdbe06997efe482f73fcee73f5/src/hotspot/share/code/nmethod.hpp#L577-L582)) to make the code more readable. Is that sufficient or would you like to see a struct to represent this instead? > > The more I look on it the more I like Dean's idea. I am withdrawing my previous objection. > Let's have a **NOT virtual** class similar to CodeBlob. > Consider moving associated `*_offset` fields from `nmethod` to new class. @vnkozlov , when I first proposed this [1], you had a concern about Leyden. Is it still a concern? [1] https://github.com/openjdk/jdk/pull/21276#discussion_r1853211488 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2114957912 From kvn at openjdk.org Fri May 30 01:13:07 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 30 May 2025 01:13:07 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v3] In-Reply-To: References: <8l4e6nqzNukJ6st0fEkLwKqlF35stq_W9ph831eo8w4=.6cbb2172-b35a-4d27-bab7-1d104c9f993b@github.com> Message-ID: On Fri, 30 May 2025 00:50:31 GMT, Dean Long wrote: >> The more I look on it the more I like Dean's idea. I am withdrawing my previous objection. >> Let's have a **NOT virtual** class similar to CodeBlob. >> Consider moving associated `*_offset` fields from `nmethod` to new class. > > @vnkozlov , when I first proposed this [1], you had a concern about Leyden. Is it still a concern? > [1] https://github.com/openjdk/jdk/pull/21276#discussion_r1853211488 I still have that concern for **mutable** data which includes relocations which accessed frequently. I don't think accessing **immutable** is performance critical. They mostly accessed during deoptimization and from JVMTI. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2114974264 From kvn at openjdk.org Fri May 30 01:21:07 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 30 May 2025 01:21:07 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v3] In-Reply-To: References: <8l4e6nqzNukJ6st0fEkLwKqlF35stq_W9ph831eo8w4=.6cbb2172-b35a-4d27-bab7-1d104c9f993b@github.com> Message-ID: <3lvuXGbqkCDeGwkzDQtzhkbZGN1XgcTcuFfL0_TUPvA=.4ba152ea-b3ab-4339-a42c-03d78bfcc829@github.com> On Fri, 30 May 2025 01:09:59 GMT, Vladimir Kozlov wrote: >> @vnkozlov , when I first proposed this [1], you had a concern about Leyden. Is it still a concern? >> [1] https://github.com/openjdk/jdk/pull/21276#discussion_r1853211488 > > I still have that concern for **mutable** data which includes relocations which accessed frequently. > > I don't think accessing **immutable** is performance critical. They mostly accessed during deoptimization and from JVMTI. Actually even **mutable** is not critical since we did not include oops data section (we keep it with nmethod). It currently contains relocations, metadata (klass*, method*) and JVMCI data. We can experiment in separate RFE if we can use separate class for it too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2114979698 From iveresov at openjdk.org Fri May 30 03:35:50 2025 From: iveresov at openjdk.org (Igor Veresov) Date: Fri, 30 May 2025 03:35:50 GMT Subject: RFR: 8357175: Failure to generate or load AOT code should be handled gracefully In-Reply-To: References: Message-ID: On Thu, 29 May 2025 18:45:11 GMT, Vladimir Kozlov wrote: > By default a failed AOT code should be discarded with UL message about it by request (`-Xlog:aot+codecache+*=debug`) and VM and AOT code processing should continue run. > > Unless we hit some catastrophic failure: OOM for example. This is similar how JIT compilers behave. > > I reordered VM configuration settings checking (`Config::verify()`) so that we switch off AOT code caching type which depends on these VM settings. For example, AOT adapters do not operate on oops - they are not affected by compressed oops settings/encoding. I removed `_objectAlignment` check because CDS already does this check when open archive. > > The AOT relocation processing for a blob will skip this blob when corresponding address is not found instead of bailing out VM in product mode. In debug VM it will issue assert so we know about missing address. These changes are in `AOTCodeAddressTable::id_for_address()` > > I kept `fatal()` in `AOTCodeAddressTable::for_address_for_id()` for incorrect ID we read from archive. The archive could be corrupted if ID is wrong. > > I did small code cleanup/renaming. > > Tested: tier1-10 Looks good ------------- Marked as reviewed by iveresov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25525#pullrequestreview-2880097189 From amitkumar at openjdk.org Fri May 30 03:53:56 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 30 May 2025 03:53:56 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v5] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 04:11:46 GMT, Amit Kumar wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > switch to vector stores As of now I am not getting any regression in the benchmark. And vector store + mvc is not performing better then the vector store only solution. So I am moving ahead with the integration. Thanks to all for the help and reviews/suggestion you provided. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2921150528 PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2921151461 From amitkumar at openjdk.org Fri May 30 03:53:57 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 30 May 2025 03:53:57 GMT Subject: Integrated: 8353500: [s390x] Intrinsify Unsafe::setMemory In-Reply-To: References: Message-ID: On Mon, 7 Apr 2025 08:44:07 GMT, Amit Kumar wrote: > Unsafe::setMemory intrinsic implementation for s390x. > > Stub Code: > > > StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) > -------------------------------------------------------------------------------- > 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 > 0x000003ffb04b63c4: nill %r1,7 > 0x000003ffb04b63c8: je 0x000003ffb04b6410 > 0x000003ffb04b63cc: nill %r1,3 > 0x000003ffb04b63d0: je 0x000003ffb04b6460 > 0x000003ffb04b63d4: nill %r1,1 > 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 > 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 > 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 > 0x000003ffb04b63e8: je 0x000003ffb04b6402 > 0x000003ffb04b63ec: nopr > 0x000003ffb04b63ee: nopr > 0x000003ffb04b63f0: sth %r4,0(%r2) > 0x000003ffb04b63f4: sth %r4,2(%r2) > 0x000003ffb04b63f8: agfi %r2,4 > 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 > 0x000003ffb04b6402: nilf %r3,2 > 0x000003ffb04b6408: ber %r14 > 0x000003ffb04b640a: sth %r4,0(%r2) > 0x000003ffb04b640e: br %r14 > 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 > 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 > 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 > 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 > 0x000003ffb04b6428: je 0x000003ffb04b6446 > 0x000003ffb04b642c: nopr > 0x000003ffb04b642e: nopr > 0x000003ffb04b6430: stg %r4,0(%r2) > 0x000003ffb04b6436: stg %r4,8(%r2) > 0x000003ffb04b643c: agfi %r2,16 > 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 > 0x000003ffb04b6446: nilf %r3,8 > 0x000003ffb04b644c: ber %r14 > 0x000003ffb04b644e: stg %r4,0(%r2) > 0x000003ffb04b6454: br %r14 > 0x000003ffb04b6456: nopr > 0x000003ffb04b6458: nopr > 0x000003ffb04b645a: nopr > 0x000003ffb04b645c: nopr > 0x000003ffb04b645e: nopr > 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 > 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 > 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 > 0x000003ffb04b6472: je 0x000003ffb04b6492 > 0x000003ffb04b6476: nopr > 0x000003ffb04b6478: nopr > 0x000003ffb04b647a: nopr > 0x000003ffb04b647c: nopr > 0x000003ffb04b647e: nopr > 0x000003ffb04b6480: st %r4,0(%r2) > 0x000003ffb04b6484: st %r4,4(%r2) > 0x000003ffb04b6488: agfi %r2,8 > 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 > 0x000003ffb04b6492: nilf %r3,4 > 0x000003ffb04b6498: ber %r14 > 0x000003ffb04b649a: st %r4,0(%r2) > 0x000003ffb04b649e: br %r14 > 0x000003ffb04b64a0: risbgz %r1,%r3,32,63,63 > 0x000003ffb04b64a6: je 0x000003ffb04b64c2 > 0x000003... This pull request has now been integrated. Changeset: 20005511 Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/20005511e3612d6a5f12fa83066f02c88c628e8b Stats: 119 lines in 1 file changed: 119 ins; 0 del; 0 mod 8353500: [s390x] Intrinsify Unsafe::setMemory Reviewed-by: lucy, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/24480 From kvn at openjdk.org Fri May 30 04:07:50 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 30 May 2025 04:07:50 GMT Subject: RFR: 8357175: Failure to generate or load AOT code should be handled gracefully In-Reply-To: References: Message-ID: On Thu, 29 May 2025 18:45:11 GMT, Vladimir Kozlov wrote: > By default a failed AOT code should be discarded with UL message about it by request (`-Xlog:aot+codecache+*=debug`) and VM and AOT code processing should continue run. > > Unless we hit some catastrophic failure: OOM for example. This is similar how JIT compilers behave. > > I reordered VM configuration settings checking (`Config::verify()`) so that we switch off AOT code caching type which depends on these VM settings. For example, AOT adapters do not operate on oops - they are not affected by compressed oops settings/encoding. I removed `_objectAlignment` check because CDS already does this check when open archive. > > The AOT relocation processing for a blob will skip this blob when corresponding address is not found instead of bailing out VM in product mode. In debug VM it will issue assert so we know about missing address. These changes are in `AOTCodeAddressTable::id_for_address()` > > I kept `fatal()` in `AOTCodeAddressTable::for_address_for_id()` for incorrect ID we read from archive. The archive could be corrupted if ID is wrong. > > I did small code cleanup/renaming. > > Tested: tier1-10 Thank you, Igor ------------- PR Comment: https://git.openjdk.org/jdk/pull/25525#issuecomment-2921166012 From dskantz at openjdk.org Fri May 30 06:26:00 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Fri, 30 May 2025 06:26:00 GMT Subject: RFR: 8356246: C2: Compilation fails with "assert(bol->is_Bool()) failed: unexpected if shape" in StringConcat::eliminate_unneeded_control [v3] In-Reply-To: References: Message-ID: On Tue, 27 May 2025 12:49:40 GMT, Daniel Skantz wrote: >> This pull request contains a fix for JDK-8356246. >> >> During stacked concatenations, a pair of `StringBuilder.append().toString()` links SB and SB2 could have a diamond if structure `(Region -> -> If)` created by String.valueOf that depends on the return value of SB1, which is going away (replaced by top() in `eliminate_call` in stringopts). >> >> JDK-8271341 added folding of the region of the diamond-if to stringopts to avoid the case where a live part of the graph becomes unreachable as this top() propagates through the graph too quickly. >> >> JDK-8291775 was a follow-up fix and instead used a constant test as input to the diamond If, as a case was discovered where the If was processed before the Region leading to a broken graph. >> >> The code in JDK-8271341 assumes that the input to the If is a boolean, not a constant. If two diamond if-region structures in the same StringBuilder candidate share the same test, the second iteration in `eliminate_unneeded_control` will fail with an unexpected input. The proposed solution is to skip over the second iteration as the test has already been replaced by a constant -- both structures will be simplified independently during IGVN. >> >> Testing: >> T1-4. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > test comment Thanks for the reviews and suggestions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25461#issuecomment-2921350068 From dskantz at openjdk.org Fri May 30 06:26:02 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Fri, 30 May 2025 06:26:02 GMT Subject: Integrated: 8356246: C2: Compilation fails with "assert(bol->is_Bool()) failed: unexpected if shape" in StringConcat::eliminate_unneeded_control In-Reply-To: References: Message-ID: On Tue, 27 May 2025 07:56:59 GMT, Daniel Skantz wrote: > This pull request contains a fix for JDK-8356246. > > During stacked concatenations, a pair of `StringBuilder.append().toString()` links SB and SB2 could have a diamond if structure `(Region -> -> If)` created by String.valueOf that depends on the return value of SB1, which is going away (replaced by top() in `eliminate_call` in stringopts). > > JDK-8271341 added folding of the region of the diamond-if to stringopts to avoid the case where a live part of the graph becomes unreachable as this top() propagates through the graph too quickly. > > JDK-8291775 was a follow-up fix and instead used a constant test as input to the diamond If, as a case was discovered where the If was processed before the Region leading to a broken graph. > > The code in JDK-8271341 assumes that the input to the If is a boolean, not a constant. If two diamond if-region structures in the same StringBuilder candidate share the same test, the second iteration in `eliminate_unneeded_control` will fail with an unexpected input. The proposed solution is to skip over the second iteration as the test has already been replaced by a constant -- both structures will be simplified independently during IGVN. > > Testing: > T1-4. This pull request has now been integrated. Changeset: 6f9e1175 Author: Daniel Skantz URL: https://git.openjdk.org/jdk/commit/6f9e1175a983c735c1beed755ec5b14b476858d7 Stats: 62 lines in 2 files changed: 62 ins; 0 del; 0 mod 8356246: C2: Compilation fails with "assert(bol->is_Bool()) failed: unexpected if shape" in StringConcat::eliminate_unneeded_control Reviewed-by: rcastanedalo, kvn ------------- PR: https://git.openjdk.org/jdk/pull/25461 From hgreule at openjdk.org Fri May 30 07:26:13 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 30 May 2025 07:26:13 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v4] In-Reply-To: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: > This change improves the precision of the `Mod(I|L)Node::Value()` functions. > > I reordered the structure a bit. First, we handle constants, afterwards, we handle ranges. The bottom checks seem to be excessive (`Type::BOTTOM` is covered by using `isa_(int|long)()`, the local bottom is just the full range). Given we can even give reasonable bounds if only one input has any bounds, we don't want to return early. > The changes after that are commented. Please let me know if the explanations are good, or if you have any suggestions. > > ### Monotonicity > > Before, a 0 divisor resulted in `Type(Int|Long)::POS`. Initially I wanted to keep it this way, but that violates monotonicity during PhaseCCP. As an example, if we see a 0 divisor first and a 3 afterwards, we might try to go from `>=0` to `-2..2`, but the meet of these would be `>=-2` rather than `-2..2`. Using `Type(Int|Long)::ZERO` instead (zero is always in the resulting value if we cover a range). > > ### Testing > > I added tests for cases around the relevant bounds. I also ran tier1, tier2, and tier3 but didn't see any related failures after addressing the monotonicity problem described above (I'm having a few unrelated failures on my system currently, so separate testing would be appreciated in case I missed something). > > Please review and let me know what you think. > > ### Other > > The `UMod(I|L)Node`s were adjusted to be more in line with its signed variants. This change diverges them again, but similar improvements could be made after #17508. > > During experimenting with these changes, I stumbled upon a few things that aren't directly related to this change, but might be worth to further look into: > - If the divisor is a constant, we will directly replace the `Mod(I|L)Node` with more but less expensive nodes in `::Ideal()`. Type analysis for these nodes combined is less precise, means we miss potential cases were this would help e.g., removing range checks. Would it make sense to delay the replacement? > - To force non-negative ranges, I'm using `char`. I noticed that method parameters of sub-int integer types all fall back to `TypeInt::INT`. This seems to be an intentional change of https://github.com/openjdk/jdk/commit/200784d505dd98444c48c9ccb7f2e4df36dcbb6a. The bug report is private, so I can't really judge if that part is necessary, but it seems odd. Hannes Greule has updated the pull request incrementally with one additional commit since the last revision: Add randomized test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/25254/files - new: https://git.openjdk.org/jdk/pull/25254/files/f93aeb12..80914319 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=25254&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=25254&range=02-03 Stats: 189 lines in 2 files changed: 180 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/25254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25254/head:pull/25254 PR: https://git.openjdk.org/jdk/pull/25254 From hgreule at openjdk.org Fri May 30 07:26:14 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Fri, 30 May 2025 07:26:14 GMT Subject: RFR: 8356813: Improve Mod(I|L)Node::Value [v2] In-Reply-To: References: <2Jf_gfvRlKcmCFoQHp5T0WW_fU_yK5-0Z3z41f00-YU=.164be9f0-fae1-44bb-84c3-846d8c2c0db2@github.com> Message-ID: On Wed, 28 May 2025 09:47:59 GMT, Emanuel Peter wrote: >> Hannes Greule has updated the pull request incrementally with three additional commits since the last revision: >> >> - Update ModL comment >> - Use TOP instead of ZERO >> - Apply suggested test changes > > test/hotspot/jtreg/compiler/c2/gvn/ModINodeValueTests.java line 201: > >> 199: // in bounds, cannot optimize >> 200: return ((byte) x) % (((char) y) + 1) <= -128; >> 201: } > > Nice work with the examples you already have, and randomizing some of it! > > I would like to see one more generalized test. > - compute `res = lhs % rhs` > - Truncate both `lhs` and `rhs` with randomly produced bounds from Generators, like this: `lhs = Math.max(lo, Math.min(hi, lhs))`. > - Below, add all sorts of comparisons with random constants, like this: `if (res < CON) { sum += 1; }`. If the output range is wrong, this could wrongly constant fold, and allow us to catch that. > > Then fuzz the generated method a few times with random inputs for `lhs` and `rhs`, and check that the `sum` and `res` value are the same for compiled and interpreted code. > > I hope that makes sense :) > This is currently my best method to check if ranges are correct, and I think it is quite important because often tests are only written with constants in mind, but less so with ranges, and then we mess up the ranges because it is just too tricky. > > This is an example, where I asked someone to try this out as well: > https://github.com/openjdk/jdk/pull/23089/files#diff-12bebea175a260a6ab62c22a3681ccae0c3d9027900d2fdbd8c5e856ae7d1123R404-R422 I introduced tests now very similar to the example you linked. I added a `@ForceInline` annotation to the `clamp` method to avoid test failures there. I ran the tests with REPEAT_COUNT=200 and didn't encounter any failure. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25254#discussion_r2115297961 From xgong at openjdk.org Fri May 30 07:48:31 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 30 May 2025 07:48:31 GMT Subject: RFR: 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times Message-ID: <-SKyhptjFPhuOPflySOZXJloR_Vgr4sC-xB5dSQXxZU=.fd6922bc-2498-4f4e-873a-999f82cd0a1a@github.com> C2 compiler fails to recognize counted loops when the induction variable is constrained by multiple consecutive `CastII` nodes. This prevents optimizations like range check elimination, loop unrolling and auto-vectorization for these loops. Please refer to the detailed discussion for a related performance issue from [1]. The ideal graph of such a loop typically looks like: /-----------| | | | ConI | loop | / / | | / / \ AddI / RangeCheck \ / | | \ / | IfTrue Phi | \ | | RangeCheck \ | | \ CastII / <- Range check #1 | | / IfTrue | | \ | | CastII | <- Range check #2 | / |-------/ For a counted loop, the loop induction variable (i.e `Phi`) should be the input of `AddI` ideally. However, in above case, it is used by two consecutive `CastII` nodes generated by two different range check operations. Compiler should skip all such kind of `CastII` when recognizing a counted loop. This patch modifies the counted loop recognition code to iteratively uncast the loop `iv` until no `CastII` nodes remain, enabling proper counted loop recognition even when the induction variable undergoes multiple range constraint operations. Test: - Tested tier1, tier2, tier3, and no regressions are found. - An additional test case is added to verify the fix. Performance: Here is the performance gain on a NVIDIA Grace machine which is an AArch64 architecture: Benchmark Mode Cnt Unit Before After Gain CountedLoopCastIV.loop_iv_int thrpt 30 ops/s 941482.597 4389292.439 4.66 CountedLoopCastIV.loop_iv_long thrpt 30 ops/s 884563.232 1441485.455 1.62 We can also observe the similar uplift on a x86_64 machine. [1] https://github.com/openjdk/jdk/pull/25138#issuecomment-2892720654 ------------- Commit messages: - 8357726: C2 fails to recognize the counted loop when induction variable range is changed multiple times Changes: https://git.openjdk.org/jdk/pull/25539/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25539&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357726 Stats: 257 lines in 3 files changed: 256 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25539/head:pull/25539 PR: https://git.openjdk.org/jdk/pull/25539 From eosterlund at openjdk.org Fri May 30 07:58:51 2025 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 30 May 2025 07:58:51 GMT Subject: RFR: 8357619: [JVMCI] Revisit phantom_ref parameter in JVMCINMethodData::get_nmethod_mirror In-Reply-To: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> References: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> Message-ID: On Wed, 28 May 2025 10:28:38 GMT, Doug Simon wrote: > The point of the `phantom_ref` parameter (introduced by [JDK-8234359](https://bugs.openjdk.org/browse/JDK-8234359)) of `JVMCINMethodData::get_nmethod_mirror` is to avoid the special resurrection semantics of a phantom read when reading the field during GC, which is when `JVMCINMethodData::invalidate_nmethod_mirror` can be called. > This case can be handled directly in `JVMCINMethodData::invalidate_nmethod_mirror` and so the `phantom_ref` parameter can be removed. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25488#pullrequestreview-2880492044 From xgong at openjdk.org Fri May 30 08:17:53 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Fri, 30 May 2025 08:17:53 GMT Subject: RFR: 8355563: VectorAPI: Refactor current implementation of subword gather load API In-Reply-To: References: Message-ID: On Tue, 20 May 2025 05:40:04 GMT, Xiaohong Gong wrote: > > @XiaohongGong Thanks for splitting this one out, and for investigating the regressions here. > > Putting the permalink here, fixed to the current change (the link you pasted will always refer to the newest, which may later on point to the wrong line when lines above are inserted / deleted): > > https://github.com/openjdk/jdk/blob/7077535c0b0a6ea0a2a167f9135b1504a3d71fb3/src/hotspot/share/opto/loopnode.cpp#L1659-L1661 > > > > I wonder if we should just use `Node::uncast` there? But I'm quite unsure about that. > > Sounds good to me. I will have a deep investigation for it. Thanks! Hi @eme64 @jatin-bhateja, I'v created a PR https://github.com/openjdk/jdk/pull/25539 to fix this issue. With this change, the performance regression can be fixed as well. Could you please take a look at that change and help to run the test on different X86 machines? Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25138#issuecomment-2921574716 From aph at openjdk.org Fri May 30 08:35:00 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 30 May 2025 08:35:00 GMT Subject: RFR: 8353500: [s390x] Intrinsify Unsafe::setMemory [v5] In-Reply-To: References: Message-ID: On Mon, 26 May 2025 04:11:46 GMT, Amit Kumar wrote: >> Unsafe::setMemory intrinsic implementation for s390x. >> >> Stub Code: >> >> >> StubRoutines::unsafe_setmemory [0x000003ffb04b63c0, 0x000003ffb04b64d0] (272 bytes) >> -------------------------------------------------------------------------------- >> 0x000003ffb04b63c0: ogrk %r1,%r2,%r3 >> 0x000003ffb04b63c4: nill %r1,7 >> 0x000003ffb04b63c8: je 0x000003ffb04b6410 >> 0x000003ffb04b63cc: nill %r1,3 >> 0x000003ffb04b63d0: je 0x000003ffb04b6460 >> 0x000003ffb04b63d4: nill %r1,1 >> 0x000003ffb04b63d8: jlh 0x000003ffb04b64a0 >> 0x000003ffb04b63dc: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b63e2: risbgz %r1,%r3,32,63,62 >> 0x000003ffb04b63e8: je 0x000003ffb04b6402 >> 0x000003ffb04b63ec: nopr >> 0x000003ffb04b63ee: nopr >> 0x000003ffb04b63f0: sth %r4,0(%r2) >> 0x000003ffb04b63f4: sth %r4,2(%r2) >> 0x000003ffb04b63f8: agfi %r2,4 >> 0x000003ffb04b63fe: brct %r1,0x000003ffb04b63f0 >> 0x000003ffb04b6402: nilf %r3,2 >> 0x000003ffb04b6408: ber %r14 >> 0x000003ffb04b640a: sth %r4,0(%r2) >> 0x000003ffb04b640e: br %r14 >> 0x000003ffb04b6410: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6416: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b641c: risbg %r4,%r4,0,31,32 >> 0x000003ffb04b6422: risbgz %r1,%r3,32,63,60 >> 0x000003ffb04b6428: je 0x000003ffb04b6446 >> 0x000003ffb04b642c: nopr >> 0x000003ffb04b642e: nopr >> 0x000003ffb04b6430: stg %r4,0(%r2) >> 0x000003ffb04b6436: stg %r4,8(%r2) >> 0x000003ffb04b643c: agfi %r2,16 >> 0x000003ffb04b6442: brct %r1,0x000003ffb04b6430 >> 0x000003ffb04b6446: nilf %r3,8 >> 0x000003ffb04b644c: ber %r14 >> 0x000003ffb04b644e: stg %r4,0(%r2) >> 0x000003ffb04b6454: br %r14 >> 0x000003ffb04b6456: nopr >> 0x000003ffb04b6458: nopr >> 0x000003ffb04b645a: nopr >> 0x000003ffb04b645c: nopr >> 0x000003ffb04b645e: nopr >> 0x000003ffb04b6460: risbg %r4,%r4,48,55,8 >> 0x000003ffb04b6466: risbg %r4,%r4,32,47,16 >> 0x000003ffb04b646c: risbgz %r1,%r3,32,63,61 >> 0x000003ffb04b6472: je 0x000003ffb04b6492 >> 0x000003ffb04b6476: nopr >> 0x000003ffb04b6478: nopr >> 0x000003ffb04b647a: nopr >> 0x000003ffb04b647c: nopr >> 0x000003ffb04b647e: nopr >> 0x000003ffb04b6480: st %r4,0(%r2) >> 0x000003ffb04b6484: st %r4,4(%r2) >> 0x000003ffb04b6488: agfi %r2,8 >> 0x000003ffb04b648e: brct %r1,0x000003ffb04b6480 >> 0x000003ffb04b6492: nilf %r3,4 >> 0x000003ffb04b6498: ber %r14 >> 0x000003ffb04b649a: st %r4,0(%r2) >> 0x0000... > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > switch to vector stores What are all those `nopr`s for? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24480#issuecomment-2921613997 From chagedorn at openjdk.org Fri May 30 10:43:16 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 30 May 2025 10:43:16 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Fri, 23 May 2025 10:52:20 GMT, Emanuel Peter wrote: >> **Goal** >> We want to generate Java source code: >> - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. >> - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). >> >> Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). >> >> **How to get started** >> When reviewing, please start by looking at: >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 >> >> We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. >> >> Second, look at this advanced test: >> https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 >> >> And then for a "tutorial", look at: >> `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` >> >> It shows these features: >> - The `body` of a Template is essentially a list of `Token`s that are concatenated. >> - Templates can be nested: a `TemplateWithArgs` is also a `Token`. >> - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. >> - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. >> - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. >> - The use of recursive templates, and `fuel` to limit the recursion. >> - `Name`s: useful to register field and variable names in code scopes. >> >> Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. >> https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 >> >> For a better experience, you may want... > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8344942-TemplateFramework-v3' of https://github.com/eme64/jdk into JDK-8344942-TemplateFramework-v3 > - move verification Thanks for all the updates and discussions! I've worked my way through the documentation in `Template` and the examples again in some more detail. It's much better and the new explanations are well done, excellent work! I left some comments here and there but mostly minor things. I will have another look at the implementation - probably only finished by Monday. The design now looks great. I'm glad we could find a good solution now after some more iterations :-) test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 150: > 148: * {@link Template#make(String, Function)}. For each number of arguments there is an implementation > 149: * (e.g. {@link Template.TwoArgs} for two arguments). This allows the use of generics for the > 150: * {@link Template} argument types which enables type checking of the {@link Template} arguments. Maybe add: Suggestion: * {@link Template} argument types which enables type checking of the {@link Template} arguments. * It is currently only allowed to use up to three arguments. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 160: > 158: * Ideally, we would have used
string templates to inject these Template arguments into the strings. > 159: * But since string templates are not (yet) available, the Templates provide hashtag replacements > 160: These paragraphs should probably belong together? And maybe you want to wrap the long line above. Suggestion: test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 174: > 172: *

> 173: * The dollar and hashtag names must have at least one character. The first character must be a letter > 174: * or underscore (i.e. {@code a-zA-Z_}), the other characters can also be digits (i.e. {@code a-zA-Z0-9_}). This does not seem to be enforced: var testTemplate = Template.make(() -> body( """ public class Foo { public static void main() { int $1var = 34; } } """ )); System.out.println(testTemplate.render()); Results in: public class Foo { public static void main() { int $1var = 34; } } which compiles fine. I can also change it to `$$var` which renders to `$var_1` which also compiles fine. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 190: > 188: *

> 189: * The writer of recursive {@link Template}s must ensure that this recursion terminates. To unify the > 190: * approach across {@link Template}s, we introduce the concept of {@link fuel}. Templates are rendered starting Suggestion: * approach across {@link Template}s, we introduce the concept of {@link #fuel}. Templates are rendered starting test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 201: > 199: * are available, and if they are mutable or immutable. We model fields and variables with {@link DataName}s, > 200: * which we can add to the current scope with {@link addDataName}. We can access the {@link DataName}s with > 201: * {@link dataNames}. We can filter for {@link DataName}s of specific {@link DataName.Type}s, and then Suggestion: * which we can add to the current scope with {@link #addDataName}. We can access the {@link DataName}s with * {@link #dataNames}. We can filter for {@link DataName}s of specific {@link DataName.Type}s, and then test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 222: > 220: * that the lambda is always executed first, and the tokens are evaluated afterwards. A method like > 221: * {@code dataNames(MUTABLE).exactOf(type).count()} is a method that is executed during the evaluation > 222: * of the lambda. But a method like {@link addDataName} returns a token, and does not immediately add Suggestion: * of the lambda. But a method like {@link #addDataName} returns a token, and does not immediately add test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 252: > 250: * // "otherTemplate" were to count the DataNames, the count would be increased > 251: * // by 2 compared to "c1". > 252: * otherTemplate.asToken(), Last arg: Suggestion: * otherTemplate.asToken() test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 260: > 258: * // evaluated later. By that time, the token for "v1" is evaluated, and so the > 259: * // nested Template would observe an increment in the count. > 260: * anotherTemplate.asToken(), Last arg: Suggestion: * anotherTemplate.asToken() test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 324: > 322: * A {@link Template} with one argument. > 323: * > 324: * @param arg0Name The name of the (first) argument, used for hashtag replacements in the {@link Template}. Nit and I'm okay with both: Should we name the first argument arg1 instead of arg0? Starting from zero might not be expected. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 341: > 339: * {@link Template}. > 340: */ > 341: public TemplateToken asToken(A a) { Could also be named `valueArg0` which is more expressive and it's easier when working in an IDE: ![image](https://github.com/user-attachments/assets/ab44b841-2cd0-4c13-942c-0188f60b421c) vs. ![image](https://github.com/user-attachments/assets/63f06a73-1ecd-4592-bcad-adc637a4a096) You could apply that change for the `render()` methods as well. Same for the two and three arg versions. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 539: > 537: * @param arg0Name The name of the (first) argument for hashtag replacement. > 538: * @return A {@link Template} with one argument. > 539: */ Just a general thought and not really something we can enforce by the framework, but we might want to mention here as well that the `arg0Name` string should match the lambda parameter for easier application and consistency? Theoretically (and not very clever), you can do that: var testTemplate = Template.make("a", "b", (Integer b, Integer a) -> body( """ public class Foo { public static void main() { int a1 = #a; int b1 = #b; """, "int a2 = " + a + ";\n", // != a1, oops "int b2 = " + b + ";\n", // != b1, oops """ } } """ )); We could make the same remark in the two and three arg `make()` versions as well. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 668: > 666: /** > 667: * Define a hashtag replacement for {@code "#key"}, with a specific value, which is also captured > 668: * by the provided {@code 'function'} with type {@code }. Suggestion: * by the provided {@code function} with type {@code }. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 683: > 681: * @param value The value that the hashtag is replaced with. > 682: * @param The type of the value. > 683: * @param function The function that is applied with the provided {@code 'value'}. Suggestion: * @param function The function that is applied with the provided {@code value}. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 696: > 694: * rendering a template with {@code render(fuel)} (e.g. {@link ZeroArgs#render(float)}). > 695: */ > 696: static final float DEFAULT_FUEL = 100.0f; Fields defined in an interface are implicitly static and final Suggestion: float DEFAULT_FUEL = 100.0f; test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 703: > 701: * with {@link #setFuelCost(float)} inside {@link #body(Object...)}. > 702: */ > 703: static final float DEFAULT_FUEL_COST = 10.0f; Suggestion: float DEFAULT_FUEL_COST = 10.0f; test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 762: > 760: * @param weight The weight of the {@link DataName}, which correlates to the probability > 761: * of this {@link DataName} being chosen when we sample. > 762: * Must be a value from 0 to 1000. But you disallow 0 below. Should this be 1 to 1000? Suggestion: * Must be a value from 1 to 1000. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 771: > 769: } > 770: boolean mutable = mutability == DataName.Mutability.MUTABLE; > 771: if (0 >= weight || weight > 1000) { Could be more readable but up to you Suggestion: if (weight <= 0 || weight > 1000) { test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 810: > 808: * @param weight The weight of the {@link StructuralName}, which correlates to the probability > 809: * of this {@link StructuralName} being chosen when we sample. > 810: * Must be a value from 0 to 1000. Same here, 1 to 1000? Suggestion: * Must be a value from 1 to 1000. test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 814: > 812: */ > 813: static Token addStructuralName(String name, StructuralName.Type type, int weight) { > 814: if (0 >= weight || weight > 1000) { Could be more readable but up to you Suggestion: if (weight <= 0 || weight > 1000) { test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java line 61: > 59: public class TestAdvanced { > 60: private static final Random RANDOM = Utils.getRandomInstance(); > 61: Unused Suggestion: test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 66: > 64: CompileFramework comp = new CompileFramework(); > 65: > 66: // Add java source files. Suggestion: // Add Java source files. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 84: > 82: // Hint: if you want to see the generated source code, you can enable > 83: // printing of the source code that the CompileFramework receives, > 84: // with -DCompileFrameworkVerbose=true Maybe also add here that the printed output is not formatted and one might consider dumping it to an IDE or other tool to auto-format. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 151: > 149: "System.out.println(", arg, ");\n", // capture arg via lambda argument > 150: "System.out.println(#arg);\n", // capture arg via hashtag replacement > 151: "System.out.println(#{arg});\n", // capture arg via hashtag replacement with brackets It's not clear here why one should use brackets. If there is an argument for those further down, then you can cross reference. Otherwise, it might need some explanation here. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 154: > 152: // The Template Framework allows two ways of formatting Strings, either > 153: // by appending to the comma-separated list of Tokens, or by hashtag > 154: // replacements. Appending as a Token works whenever one has a reference Might be more clear: Suggestion: // by appending to the comma-separated list of Tokens passed to body(), or by hashtag // replacements inside a single string. Appending as a Token works whenever one has a reference test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 199: > 197: // In the code below, we use "var" as a local variable. But if we were > 198: // to instantiate this template twice, the names could conflict. Hence, > 199: // we automatically rename the names that have a $ prepended. Suggestion: // we automatically rename the names that have a $ prepended with // var_1, var_2, etc. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 208: > 206: var template2 = Template.make("x", (Integer x) -> > 207: // Sometimes it can be helpful to not just create a hashtag replacement > 208: // with let, but also to capture the variable. Suggestion: // with let, but also to capture the variable to use it as lambda parameter. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 258: > 256: > 257: // Render templateClass to String. > 258: return templateClass.render(); When printing this, it starts at `var_2` and not `var_1`. Why is that? test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 298: > 296: > 297: // In this example, we look at the use of Hooks. They allow us to reach back, to outer > 298: // scopes. For example, we can reach out from inside a method body to a hook set at Should we say "anchored" with the recent set -> anchor renaming? Suggestion: // scopes. For example, we can reach out from inside a method body to a hook anchored at test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 306: > 304: var myHook = new Hook("MyHook"); > 305: > 306: var template1 = Template.make("name", "value", (String name, Integer value) -> body( One could generally think about using `_` for unused lambda parameters which I think is the common convention. But then I guess we would need to update the documentation about saying "name" and "String name" should be the same and make an exception for unused ones. I don't know. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 314: > 312: var template2 = Template.make("x", (Integer x) -> body( > 313: """ > 314: // Let us go back to the hook, and define a field named $field... I guess one need to get used to the fact that "go back" means back in the lambda evaluation order which is reversed to the Java code order. Maybe we can clarify that with something like: Suggestion: // Let us go back to where we anchored the hook with anchor() and define a field named $field there. // Note that in the Java code we have not defined anchor() on the hook, yet. But since it's a lambda // expression, it is not evaluated, yet! Eventually, anchor() will be evaluated before insert() in // this example. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 358: > 356: > 357: // We saw the use of custom hooks above, but now we look at the use of CLASS_HOOK and METHOD_HOOK > 358: // from the Template Library. Can you expand here on why it's better to use them instead of creating your own? Is it just readability/convenience? test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 454: > 452: // For every recursion depth, some fuel is automatically subtracted > 453: // so that the fuel slowly depletes with the depth. > 454: // We keep the recursion going until the fuel is depleted. You can also note here that if we forget to check the `fuel()`, the renderer causes a stack overflow because the recursion never ends. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 487: > 485: // in this scope, and in any nested scope, including nested Templates. This allows us to > 486: // add some fields and registers in one Template, and later on, in another Template, we > 487: // can access these fields and registers again with "dataNames()". What do you mean by "registers"? test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 525: > 523: private static final MySimpleInt mySimpleInt = new MySimpleInt(); > 524: > 525: // In this Example, we generate 3 fields, and add their names to the Suggestion: // In this example, we generate 3 fields, and add their names to the test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 596: > 594: @Override > 595: public boolean isSubtypeOf(DataName.Type other) { > 596: return other instanceof MyPrimitive(String n) && n == name(); Is `==` intended? Should it be `equals()`? test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 779: > 777: // Having defined these helper methods, let us start with the first example. > 778: // You should start reading this example bottum-up, starting at > 779: // templateClass, then going to templateMain and last to templateInnner. Suggestion: // templateClass, then going to templateMain and last to templateInner. test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 857: > 855: public static int f1 = 42; > 856: """, > 857: // But why is this DataName now availabe inside the scope of Suggestion: // But why is this DataName now available inside the scope of test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 868: > 866: // inserted into, i.e. the CLASS_HOOK. This is very important, > 867: // if we did not make that scope transparent, we could not > 868: // add any DataNames to the class scope any more, and we could Suggestion: // add any DataNames to the class scope anymore, and we could test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 1063: > 1061: // And now some fields are different than before. > 1062: """, > 1063: myClassList.stream().map(c -> templateLoad.asToken(c)).toList(), Suggestion: myClassList.stream().map(c -> templateLoad.asToken(c)).toList(), """ // Now let us mutate some fields. """, myClassList.stream().map(c -> templateStore.asToken(c)).toList(), """ // And now some fields are different than before. """, myClassList.stream().map(c -> templateLoad.asToken(c)).toList(), Suggestion: myClassList.stream().map(templateLoad::asToken).toList(), """ // Now let us mutate some fields. """, myClassList.stream().map(templateStore::asToken).toList(), """ // And now some fields are different than before. """, myClassList.stream().map(templateLoad::asToken).toList(), test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 1081: > 1079: // and we can read and write to them, they may be mutable or immutable. > 1080: // We now introduce another set of "Names", the "StructuralNames". They are > 1081: // useful for modeling method names an class names, and possibly more. Anything Suggestion: // useful for modeling method names and class names, and possibly more. Anything test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 1094: > 1092: // caught. > 1093: // > 1094: // Let us show an example with Method names. But for simplicity, we assume they Suggestion: // Let us look at an example with Method names. But for simplicity, we assume they test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestFormat.java line 49: > 47: > 48: public static void main(String[] args) { > 49: List list = new ArrayList(); Suggestion: List list = new ArrayList<>(); test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestTemplate.java line 79: > 77: @Override > 78: public boolean isSubtypeOf(DataName.Type other) { > 79: return other instanceof MyPrimitive(String n) && n == name(); Is `==` wanted and not `equals()`? ------------- PR Review: https://git.openjdk.org/jdk/pull/24217#pullrequestreview-2880263792 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115200599 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115203547 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115232385 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115240716 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115243151 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115253133 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115253959 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115254217 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115282245 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115277473 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115288457 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115294922 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115295164 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115306383 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115306586 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115317138 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115317698 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115318545 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115318856 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115207213 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115331173 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115338444 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115348344 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115354054 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115368253 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115374255 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115368728 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115382813 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115388737 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115397938 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115406391 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115448432 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115433085 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115438108 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115479497 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115497793 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115512637 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115515106 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115601955 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115604658 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115606795 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115320833 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115327212 From chagedorn at openjdk.org Fri May 30 10:43:16 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 30 May 2025 10:43:16 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v9] In-Reply-To: References: Message-ID: <1x3slcxP_UpVFyJ9mPy_g9W2OFC55TOoV1PrMoH2JVU=.6032f6a5-bc04-4bcf-91b5-9d588fb1d15a@github.com> On Wed, 7 May 2025 20:21:33 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 61: >> >>> 59: CompileFramework comp = new CompileFramework(); >>> 60: >>> 61: // Add java source files. >> >> Maybe it would also be nice to see the actually generated strings for the templates. Should we add an easy way to do this just for the tutorials in this file? Maybe we can do it by asking the user to pass an environment property like `-DPrintTemplates=true` or something like that. Or is there already a way provided by the framework to print the resulting templates on demand? > > There is `-DCompileFrameworkVerbose=true`, which will print the code that the CompileFramework compiles. I think that would be good enough, right? If it does not print more verbose logs than just the code, then it's fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115336730 From chagedorn at openjdk.org Fri May 30 10:43:16 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 30 May 2025 10:43:16 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 08:33:27 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8344942-TemplateFramework-v3' of https://github.com/eme64/jdk into JDK-8344942-TemplateFramework-v3 >> - move verification > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 358: > >> 356: >> 357: // We saw the use of custom hooks above, but now we look at the use of CLASS_HOOK and METHOD_HOOK >> 358: // from the Template Library. > > Can you expand here on why it's better to use them instead of creating your own? Is it just readability/convenience? Another question which is not evidently clear by following the examples: Can and should (not) you use the same hook inside the hook itself, i.e.: Hooks.CLASS_HOOK.anchor( Hooks.CLASS_HOOK.anchor( // ... This is probably not done on purpose but such a situation could arise when nesting more templates and suddenly one anchors the same hook again? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2115414990 From varadam at openjdk.org Fri May 30 12:12:55 2025 From: varadam at openjdk.org (Varada M) Date: Fri, 30 May 2025 12:12:55 GMT Subject: RFR: 8357304: [PPC64] C2: Implement MinV, MaxV and Reduction nodes [v2] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 09:10:12 GMT, David Briemann wrote: >> The following nodes are added: >> - MinV / MaxV >> - AndV / OrV / XorV >> - MinReductionV / MaxReductionV / AndReductionV / OrReductionV / XorReductionV >> - AddReductionVI / MulReductionVI > > David Briemann has updated the pull request incrementally with one additional commit since the last revision: > > remove TEMP_DEF effect for dst LGTM!! ------------- Marked as reviewed by varadam (Committer). PR Review: https://git.openjdk.org/jdk/pull/25318#pullrequestreview-2881160687 From dbriemann at openjdk.org Fri May 30 12:16:56 2025 From: dbriemann at openjdk.org (David Briemann) Date: Fri, 30 May 2025 12:16:56 GMT Subject: RFR: 8357304: [PPC64] C2: Implement MinV, MaxV and Reduction nodes [v2] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 09:10:12 GMT, David Briemann wrote: >> The following nodes are added: >> - MinV / MaxV >> - AndV / OrV / XorV >> - MinReductionV / MaxReductionV / AndReductionV / OrReductionV / XorReductionV >> - AddReductionVI / MulReductionVI > > David Briemann has updated the pull request incrementally with one additional commit since the last revision: > > remove TEMP_DEF effect for dst Thank you both for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/25318#issuecomment-2922231330 From duke at openjdk.org Fri May 30 12:16:56 2025 From: duke at openjdk.org (duke) Date: Fri, 30 May 2025 12:16:56 GMT Subject: RFR: 8357304: [PPC64] C2: Implement MinV, MaxV and Reduction nodes [v2] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 09:10:12 GMT, David Briemann wrote: >> The following nodes are added: >> - MinV / MaxV >> - AndV / OrV / XorV >> - MinReductionV / MaxReductionV / AndReductionV / OrReductionV / XorReductionV >> - AddReductionVI / MulReductionVI > > David Briemann has updated the pull request incrementally with one additional commit since the last revision: > > remove TEMP_DEF effect for dst @dbriemann Your change (at version 8126d0db54fcc70a7f4e098ca166049a11110be1) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25318#issuecomment-2922232656 From rcastanedalo at openjdk.org Fri May 30 12:27:54 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 May 2025 12:27:54 GMT Subject: RFR: 8354930: IGV: dump C2 graph before and after live range stretching In-Reply-To: References: Message-ID: <0BOrXqRocKbouDs_LIkGDew5H44PPxq485sTLcR86Q4=.5fd2a7df-4437-4d2e-8d1c-4aa563fd9e9d@github.com> On Wed, 28 May 2025 11:54:24 GMT, Manuel H?ssig wrote: > This PR introduces a new phase `LIVE_RANGE_STRETCHING` that prints after live ranges have been stretched, if that happens at all. The phase `INITIAL_LIVENESS` is moved before live range stretching so we can compare the live ranges before and after stretching in IGV, which is useful for debugging why an oop suddenly belongs to an oop map. > > ## Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15299362485) > - [x] tier1 and tier1, plus additional Oracle internal testing for all Oracle supported platforms and OSs > - [x] verified that the new phase prints when it should in IGV and with `-XX:PrintPhaseLevel=4` Looks good, thanks! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25492#pullrequestreview-2881199160 From rcastanedalo at openjdk.org Fri May 30 12:32:56 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 30 May 2025 12:32:56 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: <17yjI7ChbobGnY0TM9OWlMizcyOn4mWziUMKNG4F64A=.bc284e23-c5a0-4afd-b5d0-3d3064b0c193@github.com> On Sat, 17 May 2025 09:04:28 GMT, Andrew Haley wrote: >> OK. C2 does not currently support creating exception table entries with arbitrary offsets relative to the start address of the code emitted for a Mach node, so that support would have to be added. I prototyped this support [here](https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-implicit-null-checks), see calls to `record_exception_pc_offset()`. I don't think it is, overall, simpler than the approach proposed in this PR - definitely not from a `PhaseOutput`/`C2_MacroAssembler` perspective. But if you still think it is worth exploring, I will create a new prototype with the `record_exception_pc_offset()` on top of this PR to make it easier to compare. > > I don't think you have to do that. I think you only have to mark both the lea and the memory access with an exception table entry. The segfault handler sees the two entries, deduces that this access is split into two instructions, and does the right thing. @theRealAph do you have time to look into this, or should I proceed with the PR in its current form? The main bulk of the change is orthogonal to this discussion, and we can always revisit this part in a separate RFE if necessary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2115840786 From aph at openjdk.org Fri May 30 12:50:56 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 30 May 2025 12:50:56 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v5] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Sat, 17 May 2025 09:04:28 GMT, Andrew Haley wrote: >> OK. C2 does not currently support creating exception table entries with arbitrary offsets relative to the start address of the code emitted for a Mach node, so that support would have to be added. I prototyped this support [here](https://github.com/openjdk/jdk/compare/master...robcasloz:jdk:JDK-implicit-null-checks), see calls to `record_exception_pc_offset()`. I don't think it is, overall, simpler than the approach proposed in this PR - definitely not from a `PhaseOutput`/`C2_MacroAssembler` perspective. But if you still think it is worth exploring, I will create a new prototype with the `record_exception_pc_offset()` on top of this PR to make it easier to compare. > > I don't think you have to do that. I think you only have to mark both the lea and the memory access with an exception table entry. The segfault handler sees the two entries, deduces that this access is split into two instructions, and does the right thing. > @theRealAph do you have time to look into this, or should I proceed with the PR in its current form? The main bulk of the change is orthogonal to this discussion, and we can always revisit this part in a separate RFE if necessary. Sure, go ahead. I would prefer this to be done a little more neatly, but I accept your point that it's perhaps not quite as straightforward as I thought. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25066#discussion_r2115867606 From aboldtch at openjdk.org Fri May 30 13:36:56 2025 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 30 May 2025 13:36:56 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads [v6] In-Reply-To: References: <7hA9KtNbFc-SIekCv7cz2iZHgZY84B-6R4tV83brIEs=.ebc8186c-a645-4215-86cd-836f9cb5e916@github.com> Message-ID: On Tue, 27 May 2025 07:46:43 GMT, Roberto Casta?eda Lozano wrote: >> Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands (`MachTemp`), which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). >> >> This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::is_late_expanded_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: >> >> ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) >> >> ZGC object loads are marked as late-expanded null-check candidates unconditionally on all ZGC-supported platforms except on aarch64, where only loads that do not require an initial `lea` instruction (due to [address legitimization](https://github.com/openjdk/jdk/blob/ddd07b107e814ec846579a66d4f2005b7db9bb2f/src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp#L132-L144)) are marked as candidates. Fortunately, most aarch64 loads seen in practice use small offsets and can be marked as candidates. >> >> Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo23 benchmarks. This results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. >> >> #### Testing >> - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Include address mode test in 'legitimize_address' > - Excluded IR checks for testLoadVolatile on PPC64 Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/25066#pullrequestreview-2881385478 From duke at openjdk.org Fri May 30 15:06:04 2025 From: duke at openjdk.org (Tom Shull) Date: Fri, 30 May 2025 15:06:04 GMT Subject: RFR: 8357987: [JVMCI] Add Support for Retrieving All Non-Static Methods of a ResolvedJavaType. Message-ID: Currently from ResolvedJavaType one can retrieve all declared methods, static methods, and constructors of the given type. However, internally in HotSpot there are also VM-internal methods, such as overpass methods, associated with a given type which we cannot access via the API. To correct this, we should add a new method which enables VM-internal methods, such as overpass methods, to be accessed. ------------- Commit messages: - implement getAllMethods - address reviewer feedback - Add Support for Retrieving All Non-Static Methods of a ResolvedJavaType. Changes: https://git.openjdk.org/jdk/pull/25498/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25498&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357987 Stats: 107 lines in 11 files changed: 106 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/25498.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25498/head:pull/25498 PR: https://git.openjdk.org/jdk/pull/25498 From dnsimon at openjdk.org Fri May 30 15:06:08 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 30 May 2025 15:06:08 GMT Subject: RFR: 8357987: [JVMCI] Add Support for Retrieving All Non-Static Methods of a ResolvedJavaType. In-Reply-To: References: Message-ID: On Wed, 28 May 2025 15:55:39 GMT, Tom Shull wrote: > Currently from ResolvedJavaType one can retrieve all declared methods, static methods, and constructors of the given type. However, internally in HotSpot there are also VM-internal methods, such as overpass methods, associated with a given type which we cannot access via the API. > > To correct this, we should add a new method which enables VM-internal methods, such as overpass methods, to be accessed. I also updated the title of https://bugs.openjdk.org/browse/JDK-8357987 to Not Be All Capitalized so you'll need to fix the title of this PR. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 580: > 578: C2V_END > 579: > 580: C2V_VMENTRY_0(jboolean, isOverpass,(JNIEnv* env, jobject, ARGUMENT_PAIR(method))) Delete this method - it's no longer used. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 3315: > 3313: {CC "setNotInlinableOrCompilable", CC "(" HS_METHOD2 ")V", FN_PTR(setNotInlinableOrCompilable)}, > 3314: {CC "isCompilable", CC "(" HS_METHOD2 ")Z", FN_PTR(isCompilable)}, > 3315: {CC "isOverpass", CC "(" HS_METHOD2 ")Z", FN_PTR(isOverpass)}, delete src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/CompilerToVM.java line 179: > 177: private native boolean isCompilable(HotSpotResolvedJavaMethodImpl method, long methodPointer); > 178: > 179: /** Delete this method - it's no longer used. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/CompilerToVM.java line 1162: > 1160: > 1161: /** > 1162: * Gets the {@link ResolvedJavaMethod}s for all non-overpass instance methods of {@code klass}. all non-overpass and non-constructor src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/CompilerToVM.java line 1171: > 1169: > 1170: /** > 1171: * Gets the {@link ResolvedJavaMethod}s for all instance methods of {@code klass}. instance -> non-static Instance -> NonStatic src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java line 583: > 581: @Override > 582: public boolean isDeclared() { > 583: if (isConstructor() || isStatic()) { `isStatic()` -> `isClassInitializer()` src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java line 586: > 584: return false; > 585: } > 586: return !compilerToVM().isOverpass(this); I think you can do this with a direct flag check: boolean isOverpass = (getConstMethodFlags() & config().constMethodIsOverpass) != 0; return isOverpass; See #20256 as an example of the other changes needed for this. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ResolvedJavaMethod.java line 118: > 116: > 117: /** > 118: * Returns {@code true} if this method would be contained in the array returned by `would be` -> `is` src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ResolvedJavaType.java line 370: > 368: > 369: /** > 370: * Returns a list containing all the non-static methods present within this type. Point out that the returned list is unmodifiable (like the API for `Stream.toList()` does). test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestResolvedJavaType.java line 1027: > 1025: ResolvedJavaType type = metaAccess.lookupJavaType(c); > 1026: Set allMethods = new HashSet<>(type.getAllMethods(true)); > 1027: boolean included = Arrays.stream(type.getDeclaredMethods()).allMatch(m -> allMethods.contains(m)); You can produce a more helpful error message by collecting the entries from getDeclaredMethods, getDeclaredConstructors and the class initialized that are *not* in `allMethods`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25498#issuecomment-2921656256 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2113593898 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2113594155 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2113593301 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2112455015 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2112455704 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2112434269 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2112449433 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2112420844 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2112451810 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2115430479 From duke at openjdk.org Fri May 30 15:06:09 2025 From: duke at openjdk.org (Tom Shull) Date: Fri, 30 May 2025 15:06:09 GMT Subject: RFR: 8357987: [JVMCI] Add Support for Retrieving All Non-Static Methods of a ResolvedJavaType. In-Reply-To: References: Message-ID: <621JpJVqtfhOtmuHd54KXE7kbOW_RzTQuudFesTADJ0=.d0985feb-fd1a-45eb-8246-261cb3127d2a@github.com> On Wed, 28 May 2025 17:54:27 GMT, Doug Simon wrote: >> Currently from ResolvedJavaType one can retrieve all declared methods, static methods, and constructors of the given type. However, internally in HotSpot there are also VM-internal methods, such as overpass methods, associated with a given type which we cannot access via the API. >> >> To correct this, we should add a new method which enables VM-internal methods, such as overpass methods, to be accessed. > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/CompilerToVM.java line 1171: > >> 1169: >> 1170: /** >> 1171: * Gets the {@link ResolvedJavaMethod}s for all instance methods of {@code klass}. > > instance -> non-static > Instance -> NonStatic I realized NonStatic is not accurate - we return everything except `s` and `` - so I switched to `NonInitializerMethods` everywhere. Does that seem fair? > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java line 586: > >> 584: return false; >> 585: } >> 586: return !compilerToVM().isOverpass(this); > > I think you can do this with a direct flag check: > > boolean isOverpass = (getConstMethodFlags() & config().constMethodIsOverpass) != 0; > return isOverpass; > > See #20256 as an example of the other changes needed for this. good call. changed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2112886022 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2112884946 From duke at openjdk.org Fri May 30 15:06:09 2025 From: duke at openjdk.org (Tom Shull) Date: Fri, 30 May 2025 15:06:09 GMT Subject: RFR: 8357987: [JVMCI] Add Support for Retrieving All Non-Static Methods of a ResolvedJavaType. In-Reply-To: <621JpJVqtfhOtmuHd54KXE7kbOW_RzTQuudFesTADJ0=.d0985feb-fd1a-45eb-8246-261cb3127d2a@github.com> References: <621JpJVqtfhOtmuHd54KXE7kbOW_RzTQuudFesTADJ0=.d0985feb-fd1a-45eb-8246-261cb3127d2a@github.com> Message-ID: On Wed, 28 May 2025 22:46:46 GMT, Tom Shull wrote: >> src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/CompilerToVM.java line 1171: >> >>> 1169: >>> 1170: /** >>> 1171: * Gets the {@link ResolvedJavaMethod}s for all instance methods of {@code klass}. >> >> instance -> non-static >> Instance -> NonStatic > > I realized NonStatic is not accurate - we return everything except `s` and `` - so I switched to `NonInitializerMethods` everywhere. Does that seem fair? thinking about it more, it's probably better if we do no filtering and return all methods in `InstanceKlass->_methods`. How about something like `getAllMethods`: ``` /** * Returns a list containing all methods present within this type. This list can include * methods implicitly created and used by the VM. * The returned List is unmodifiable; calls to any mutator method * will always cause {@code UnsupportedOperationException} to be thrown. * * @param forceLink if {@code true}, forces this type to be {@link #link linked} */ List getAllMethods(boolean forceLink); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2113338609 From dnsimon at openjdk.org Fri May 30 15:06:09 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 30 May 2025 15:06:09 GMT Subject: RFR: 8357987: [JVMCI] Add Support for Retrieving All Non-Static Methods of a ResolvedJavaType. In-Reply-To: References: <621JpJVqtfhOtmuHd54KXE7kbOW_RzTQuudFesTADJ0=.d0985feb-fd1a-45eb-8246-261cb3127d2a@github.com> Message-ID: <5TwuZTOvXugCHTiNOQpfYWtfwgV9b0HTyzoPdRMSB3U=.8e616674-895a-4c16-9d4e-2655b7b410f7@github.com> On Thu, 29 May 2025 06:56:18 GMT, Tom Shull wrote: >> I realized NonStatic is not accurate - we return everything except `s` and `` - so I switched to `NonInitializerMethods` everywhere. Does that seem fair? > > thinking about it more, it's probably better if we do no filtering and return all methods in `InstanceKlass->_methods`. How about something like `getAllMethods`: > > ``` > /** > * Returns a list containing all methods present within this type. This list can include > * methods implicitly created and used by the VM. > * The returned List is unmodifiable; calls to any mutator method > * will always cause {@code UnsupportedOperationException} to be thrown. > * > * @param forceLink if {@code true}, forces this type to be {@link #link linked} > */ > List getAllMethods(boolean forceLink); Yes, that's a good idea - it's more future proof and lets the caller do the filtering. `This list can include methods implicitly created and used by the VM that are not present in {@link #getDeclaredMethods}.` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2113592457 From duke at openjdk.org Fri May 30 15:06:29 2025 From: duke at openjdk.org (Tom Shull) Date: Fri, 30 May 2025 15:06:29 GMT Subject: RFR: 8357660: [JVMCI] Add Support for Retrieving All Indy BootstrapMethodInvocations directly from the ConstantPool Message-ID: This PR adds support for directly retrieving all invokedynamic BootstrapMethodInvocations from a ConstantPool. In addition, two methods are added to the BootstrapMethodInvocations: 1. `void resolveInvokeDynamic()` 2. `JavaConstant lookupInvokeDynamicAppendix()` The combination of these two features allows one to directly interact with all invokedynamic information of a given ConstantPool without having to iterate through all of the Classfile's methods to find all invokedynamic bytecodes ------------- Commit messages: - complete changes - commit review suggestion - commit review suggestion - change to allow both indys and condys to be looked up all at once - address reviewer feedback - style fixes and add testing to TestDynamicConstants. - Add support for retrieving all Indy BootstrapMethodInvocations from Constant Pool. Changes: https://git.openjdk.org/jdk/pull/25420/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25420&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8357660 Stats: 142 lines in 5 files changed: 130 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/25420.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25420/head:pull/25420 PR: https://git.openjdk.org/jdk/pull/25420 From duke at openjdk.org Fri May 30 15:06:09 2025 From: duke at openjdk.org (Tom Shull) Date: Fri, 30 May 2025 15:06:09 GMT Subject: RFR: 8357987: [JVMCI] Add Support for Retrieving All Non-Static Methods of a ResolvedJavaType. In-Reply-To: <5TwuZTOvXugCHTiNOQpfYWtfwgV9b0HTyzoPdRMSB3U=.8e616674-895a-4c16-9d4e-2655b7b410f7@github.com> References: <621JpJVqtfhOtmuHd54KXE7kbOW_RzTQuudFesTADJ0=.d0985feb-fd1a-45eb-8246-261cb3127d2a@github.com> <5TwuZTOvXugCHTiNOQpfYWtfwgV9b0HTyzoPdRMSB3U=.8e616674-895a-4c16-9d4e-2655b7b410f7@github.com> Message-ID: <3kzvHswjZ98huXibmqouApRGInSf3rwkIwQReBOCANc=.c51d8496-6032-4387-8b5b-fba8b1d7adf4@github.com> On Thu, 29 May 2025 09:40:37 GMT, Doug Simon wrote: >> thinking about it more, it's probably better if we do no filtering and return all methods in `InstanceKlass->_methods`. How about something like `getAllMethods`: >> >> ``` >> /** >> * Returns a list containing all methods present within this type. This list can include >> * methods implicitly created and used by the VM. >> * The returned List is unmodifiable; calls to any mutator method >> * will always cause {@code UnsupportedOperationException} to be thrown. >> * >> * @param forceLink if {@code true}, forces this type to be {@link #link linked} >> */ >> List getAllMethods(boolean forceLink); > > Yes, that's a good idea - it's more future proof and lets the caller do the filtering. > > `This list can include methods implicitly created and used by the VM that are not present in {@link #getDeclaredMethods}.` I changed it now to be `getAllMethods` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2114796740 From dnsimon at openjdk.org Fri May 30 15:06:30 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 30 May 2025 15:06:30 GMT Subject: RFR: 8357660: [JVMCI] Add Support for Retrieving All Indy BootstrapMethodInvocations directly from the ConstantPool In-Reply-To: References: Message-ID: On Fri, 23 May 2025 17:37:14 GMT, Tom Shull wrote: > This PR adds support for directly retrieving all invokedynamic BootstrapMethodInvocations from a ConstantPool. > > In addition, two methods are added to the BootstrapMethodInvocations: > 1. `void resolveInvokeDynamic()` > 2. `JavaConstant lookupInvokeDynamicAppendix()` > > The combination of these two features allows one to directly interact with all invokedynamic information of a given ConstantPool without having to iterate through all of the Classfile's methods to find all invokedynamic bytecodes Please add some tests for the new methods to `test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.hotspot.test/src/jdk/vm/ci/hotspot/test/TestDynamicConstant.java`. I also updated the title of https://bugs.openjdk.org/browse/JDK-8357660 to Not Be All Capitalized so you'll need to fix the title of this PR. Also, please update both titles and descriptions further to reflect the final changes (i.e. lookupBootstrapMethodInvocations instead of lookupIndyBootstrapMethodInvocations). src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/CompilerToVM.java line 476: > 474: > 475: /** > 476: * Returns the number of {@code ResolvedIndyEntry} present within this constant `{@code ResolvedIndyEntry}` -> `{@code ResolvedIndyEntry}s` src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 540: > 538: private final JavaConstant type; > 539: private final List staticArguments; > 540: private final int index; index -> cpiOrIndyIndex src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 651: > 649: return List.of(); > 650: } > 651: return IntStream.range(0, numIndys).mapToObj(i -> lookupBootstrapMethodInvocation(i, Bytecodes.INVOKEDYNAMIC)) Suggestion: return IntStream.range(0, numIndys) .mapToObj(i -> lookupBootstrapMethodInvocation(i, Bytecodes.INVOKEDYNAMIC)) .toList(); src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 654: > 652: .toList(); > 653: } else { > 654: return IntStream.range(1, length()).filter(cpi -> { Suggestion: return IntStream.range(1, length()) .filter(this::isDynamicEntry) .mapToObj(...); and: private boolean isDynamicEntry(int cpi) { JvmConstant tagAt = getTagAt(cpi); return tagAt != null && tagAt.name.equals("Dynamic"); } src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 657: > 655: } else { > 656: return IntStream.range(1, length()) > 657: .filter(this::isDynamicEntry) Looks like you forgot to add the definition of `isDynamicEntry` that I suggested: private boolean isDynamicEntry(int cpi) { JvmConstant tagAt = getTagAt(cpi); return tagAt != null && tagAt.name.equals("Dynamic"); } src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java line 198: > 196: * If this bootstrap method invocation is for a {@code > 197: * CONSTANTAdd_InvokeDynamic_info} pool entry, then this method ensures the > 198: * invoke dynamic is resolved. This can be used to compile time resolve the What exactly does resolving an invoke dynamic mean? Also I would leave out the sentence about "compile time" unless you clarify exactly what that means. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java line 233: > 231: > 232: /** > 233: * Returns the BootstrapMethodInvocation instances for all invokedynamic Point out that the returned list is unmodifiable (like the API for `Stream.toList()` does). src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java line 237: > 235: * is returned. > 236: */ > 237: BootstrapMethodInvocation[] lookupAllIndyBootstrapMethodInvocations(); Why not make this return all `BootstrapMethodInvocation`s? The caller can then filter out the indy ones with `isInvokeDynamic`. Also, please return a `List` instead of an array - we should never return arrays from JVMCI (see #23159 as an example of addressing existing API). Lastly, return `List.of()` instead of null. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25420#issuecomment-2906643446 PR Comment: https://git.openjdk.org/jdk/pull/25420#issuecomment-2921667337 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2107428322 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2115447272 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2114177826 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2114187417 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2114737379 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2107430633 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2112429562 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2107441215 From dnsimon at openjdk.org Fri May 30 15:06:09 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 30 May 2025 15:06:09 GMT Subject: RFR: 8357987: [JVMCI] Add Support for Retrieving All Non-Static Methods of a ResolvedJavaType. In-Reply-To: References: Message-ID: On Wed, 28 May 2025 17:41:15 GMT, Doug Simon wrote: >> Currently from ResolvedJavaType one can retrieve all declared methods, static methods, and constructors of the given type. However, internally in HotSpot there are also VM-internal methods, such as overpass methods, associated with a given type which we cannot access via the API. >> >> To correct this, we should add a new method which enables VM-internal methods, such as overpass methods, to be accessed. > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedJavaMethodImpl.java line 583: > >> 581: @Override >> 582: public boolean isDeclared() { >> 583: if (isConstructor() || isStatic()) { > > `isStatic()` -> `isClassInitializer()` Looks like you did not yet make the `isClassInitializer()` fix. This also implies some missing test coverage in TestResolvedJavaType. Can you please address both these issues. > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ResolvedJavaMethod.java line 118: > >> 116: >> 117: /** >> 118: * Returns {@code true} if this method would be contained in the array returned by > > `would be` -> `is` not yet fixed (or pushed?) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2113583058 PR Review Comment: https://git.openjdk.org/jdk/pull/25498#discussion_r2113587598 From duke at openjdk.org Fri May 30 15:06:30 2025 From: duke at openjdk.org (Tom Shull) Date: Fri, 30 May 2025 15:06:30 GMT Subject: RFR: 8357660: [JVMCI] Add Support for Retrieving All Indy BootstrapMethodInvocations directly from the ConstantPool In-Reply-To: References: Message-ID: On Sat, 24 May 2025 08:49:54 GMT, Doug Simon wrote: >> This PR adds support for directly retrieving all invokedynamic BootstrapMethodInvocations from a ConstantPool. >> >> In addition, two methods are added to the BootstrapMethodInvocations: >> 1. `void resolveInvokeDynamic()` >> 2. `JavaConstant lookupInvokeDynamicAppendix()` >> >> The combination of these two features allows one to directly interact with all invokedynamic information of a given ConstantPool without having to iterate through all of the Classfile's methods to find all invokedynamic bytecodes > > Please add some tests for the new methods to `test/hotspot/jtreg/compiler/jvmci/jdk.vm.ci.hotspot.test/src/jdk/vm/ci/hotspot/test/TestDynamicConstant.java`. @dougxc I integrated testing for the new methods into `TestDynamicConstant.java` now @dougxc I cleaned up the PR to now have the symmetric lookup option and updated the tests > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 657: > >> 655: } else { >> 656: return IntStream.range(1, length()) >> 657: .filter(this::isDynamicEntry) > > Looks like you forgot to add the definition of `isDynamicEntry` that I suggested: > > private boolean isDynamicEntry(int cpi) { > JvmConstant tagAt = getTagAt(cpi); > return tagAt != null && tagAt.name.equals("Dynamic"); > } Yes, I applied the suggested change via github, and am just validating it works now (which of course it doesn't). I'll fix it > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java line 198: > >> 196: * If this bootstrap method invocation is for a {@code >> 197: * CONSTANTAdd_InvokeDynamic_info} pool entry, then this method ensures the >> 198: * invoke dynamic is resolved. This can be used to compile time resolve the > > What exactly does resolving an invoke dynamic mean? > Also I would leave out the sentence about "compile time" unless you clarify exactly what that means. Would you want me to add a reference to https://docs.oracle.com/javase/specs/jvms/se24/html/jvms-5.html#jvms-5.4.3.6? I removed the compile time sentence; I had it to be consistent with `loadReferencedType` > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java line 237: > >> 235: * is returned. >> 236: */ >> 237: BootstrapMethodInvocation[] lookupAllIndyBootstrapMethodInvocations(); > > Why not make this return all `BootstrapMethodInvocation`s? The caller can then filter out the indy ones with `isInvokeDynamic`. Also, please return a `List` instead of an array - we should never return arrays from JVMCI (see #23159 as an example of addressing existing API). Lastly, return `List.of()` instead of null. Changed to return a list. > Why not make this return all BootstrapMethodInvocations 1. Within HotSpot it is very easy to pick off all indy BootstrapMethodInvocations via [the ConstantPoolCache](https://github.com/openjdk/jdk/blob/72a3022dc6a1521d8e3f08fe5d592f760fc462d2/src/hotspot/share/oops/cpCache.hpp#L74) 2. Each invokedynamic bytecode location has a unique BootstrapMethodInvocation instance, but they may share the same constant pool entry, so it's not trivial to find all BootstrapMethodInvocations. One would have to iterate both all method bytecodes and constant pool slots, and do some additional filtering. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25420#issuecomment-2909796813 PR Comment: https://git.openjdk.org/jdk/pull/25420#issuecomment-2918426821 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2114780251 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2109301347 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2109317539 From dnsimon at openjdk.org Fri May 30 15:06:30 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 30 May 2025 15:06:30 GMT Subject: RFR: 8357660: [JVMCI] Add Support for Retrieving All Indy BootstrapMethodInvocations directly from the ConstantPool In-Reply-To: References: Message-ID: <1AMsWwdYheV0CZ9z_VWbiEPphQwkJz-HO6h-wYNCAfw=.8259a98a-89aa-40e6-98da-81c43d2a45e0@github.com> On Tue, 27 May 2025 14:07:21 GMT, Tom Shull wrote: > Would you want me to add a reference The main point is that resolving can execute Java code (as far as I recall) so cannot be called from a CompileBroker thread as these threads must not call Java code. However, I see that this constraint is not currently documented so it ok to leave it out for now. >> src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java line 237: >> >>> 235: * is returned. >>> 236: */ >>> 237: BootstrapMethodInvocation[] lookupAllIndyBootstrapMethodInvocations(); >> >> Why not make this return all `BootstrapMethodInvocation`s? The caller can then filter out the indy ones with `isInvokeDynamic`. Also, please return a `List` instead of an array - we should never return arrays from JVMCI (see #23159 as an example of addressing existing API). Lastly, return `List.of()` instead of null. > > Changed to return a list. > >> Why not make this return all BootstrapMethodInvocations > 1. Within HotSpot it is very easy to pick off all indy BootstrapMethodInvocations via [the ConstantPoolCache](https://github.com/openjdk/jdk/blob/72a3022dc6a1521d8e3f08fe5d592f760fc462d2/src/hotspot/share/oops/cpCache.hpp#L74) > 2. Each invokedynamic bytecode location has a unique BootstrapMethodInvocation instance, but they may share the same constant pool entry, so it's not trivial to find all BootstrapMethodInvocations. One would have to iterate both all method bytecodes and constant pool slots, and do some additional filtering. How about `List lookupBootstrapMethodInvocations(boolean indy)`? That is, it either gets the indy *or* the condy BSM invocations. I can imagine SVM wanting the latter at some point right? BTW, I noticed that the javadoc for `ConstantPool.lookupBootstrapMethodInvocation` is somewhat incorrect. Please check and apply these corrections in this PR: diff --git a/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java b/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java index 2273b256f03..3519af4bcbb 100644 --- a/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java +++ b/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java @@ -199,12 +199,12 @@ interface BootstrapMethodInvocation { * in the constant pool. * * @param index if {@code opcode} is -1, {@code index} is a constant pool index. Otherwise {@code opcode} - * must be {@code Bytecodes.INVOKEDYNAMIC}, and {@code index} must be the operand of that - * opcode in the bytecode stream (i.e., a {@code rawIndex}). - * @param opcode must be {@code Bytecodes.INVOKEDYNAMIC}, or -1 if + * must be {@code Bytecodes.INVOKEDYNAMIC} or {@code CONSTANT_Dynamic_info}, and {@code index} + * must be the operand of that opcode in the bytecode stream (i.e., a {@code rawIndex}). + * @param opcode must be {@code Bytecodes.INVOKEDYNAMIC}, {@code CONSTANT_Dynamic_info}, or -1 if * {@code index} was not decoded from a bytecode stream * @return the bootstrap method invocation details or {@code null} if the entry specified by {@code index} - * is not a {@code CONSTANT_Dynamic_info} or @{code CONSTANT_InvokeDynamic_info} + * is not a {@code CONSTANT_Dynamic_info} or {@code CONSTANT_InvokeDynamic_info} * @jvms 4.7.23 The {@code BootstrapMethods} Attribute */ default BootstrapMethodInvocation lookupBootstrapMethodInvocation(int index, int opcode) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2109436288 PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2109450651 From duke at openjdk.org Fri May 30 15:06:30 2025 From: duke at openjdk.org (Tom Shull) Date: Fri, 30 May 2025 15:06:30 GMT Subject: RFR: 8357660: [JVMCI] Add Support for Retrieving All Indy BootstrapMethodInvocations directly from the ConstantPool In-Reply-To: <1AMsWwdYheV0CZ9z_VWbiEPphQwkJz-HO6h-wYNCAfw=.8259a98a-89aa-40e6-98da-81c43d2a45e0@github.com> References: <1AMsWwdYheV0CZ9z_VWbiEPphQwkJz-HO6h-wYNCAfw=.8259a98a-89aa-40e6-98da-81c43d2a45e0@github.com> Message-ID: On Tue, 27 May 2025 15:03:02 GMT, Doug Simon wrote: >> Changed to return a list. >> >>> Why not make this return all BootstrapMethodInvocations >> 1. Within HotSpot it is very easy to pick off all indy BootstrapMethodInvocations via [the ConstantPoolCache](https://github.com/openjdk/jdk/blob/72a3022dc6a1521d8e3f08fe5d592f760fc462d2/src/hotspot/share/oops/cpCache.hpp#L74) >> 2. Each invokedynamic bytecode location has a unique BootstrapMethodInvocation instance, but they may share the same constant pool entry, so it's not trivial to find all BootstrapMethodInvocations. One would have to iterate both all method bytecodes and constant pool slots, and do some additional filtering. > > How about `List lookupBootstrapMethodInvocations(boolean indy)`? That is, it either gets the indy *or* the condy BSM invocations. I can imagine SVM wanting the latter at some point right? > > BTW, I noticed that the javadoc for `ConstantPool.lookupBootstrapMethodInvocation` is somewhat incorrect. Please check and apply these corrections in this PR: > > diff --git a/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java b/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java > index 2273b256f03..3519af4bcbb 100644 > --- a/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java > +++ b/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java > @@ -199,12 +199,12 @@ interface BootstrapMethodInvocation { > * in the constant pool. > * > * @param index if {@code opcode} is -1, {@code index} is a constant pool index. Otherwise {@code opcode} > - * must be {@code Bytecodes.INVOKEDYNAMIC}, and {@code index} must be the operand of that > - * opcode in the bytecode stream (i.e., a {@code rawIndex}). > - * @param opcode must be {@code Bytecodes.INVOKEDYNAMIC}, or -1 if > + * must be {@code Bytecodes.INVOKEDYNAMIC} or {@code CONSTANT_Dynamic_info}, and {@code index} > + * must be the operand of that opcode in the bytecode stream (i.e., a {@code rawIndex}). > + * @param opcode must be {@code Bytecodes.INVOKEDYNAMIC}, {@code CONSTANT_Dynamic_info}, or -1 if > * {@code index} was not decoded from a bytecode stream > * @return the bootstrap method invocation details or {@code null} if the entry specified by {@code index} > - * is not a {@code CONSTANT_Dynamic_info} or @{code CONSTANT_InvokeDynamic_info} > + * is not a {@code CONSTANT_Dynamic_info} or {@code CONSTANT_InvokeDynamic_info} > * @jvms 4.7.23 The {@code BootstrapMethods} Attribute > */ > default BootstrapMethodInvocation lookupBootstrapMethodInvocation(int index, int opcode) { I prototyped the option `List lookupBootstrapMethodInvocations(boolean indy)` here: https://github.com/openjdk/jdk/compare/master...teshull:jdk:jvmci_bootstrap_alternative As part of this I also prototyped generic BSM resolution / lookup logic >From the SVM perspective, retrieving condys via this new support isn't a big win. It's easy enough already to walk the ConstantPool. However, for symmetry purposes, it is reasonable to have this method (along with the resolve / lookup). What's your preference: this new version or the original? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2110104069 From dnsimon at openjdk.org Fri May 30 15:06:30 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 30 May 2025 15:06:30 GMT Subject: RFR: 8357660: [JVMCI] Add Support for Retrieving All Indy BootstrapMethodInvocations directly from the ConstantPool In-Reply-To: References: <1AMsWwdYheV0CZ9z_VWbiEPphQwkJz-HO6h-wYNCAfw=.8259a98a-89aa-40e6-98da-81c43d2a45e0@github.com> Message-ID: <3Lyb5MHjplhxqRmlkR6y-GpgQWe90ij_jClRdipKMQE=.cf4fcf50-4b4f-4930-abdb-75f9d0be9942@github.com> On Tue, 27 May 2025 20:10:50 GMT, Tom Shull wrote: >> How about `List lookupBootstrapMethodInvocations(boolean indy)`? That is, it either gets the indy *or* the condy BSM invocations. I can imagine SVM wanting the latter at some point right? >> >> BTW, I noticed that the javadoc for `ConstantPool.lookupBootstrapMethodInvocation` is somewhat incorrect. Please check and apply these corrections in this PR: >> >> diff --git a/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java b/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java >> index 2273b256f03..3519af4bcbb 100644 >> --- a/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java >> +++ b/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ConstantPool.java >> @@ -199,12 +199,12 @@ interface BootstrapMethodInvocation { >> * in the constant pool. >> * >> * @param index if {@code opcode} is -1, {@code index} is a constant pool index. Otherwise {@code opcode} >> - * must be {@code Bytecodes.INVOKEDYNAMIC}, and {@code index} must be the operand of that >> - * opcode in the bytecode stream (i.e., a {@code rawIndex}). >> - * @param opcode must be {@code Bytecodes.INVOKEDYNAMIC}, or -1 if >> + * must be {@code Bytecodes.INVOKEDYNAMIC} or {@code CONSTANT_Dynamic_info}, and {@code index} >> + * must be the operand of that opcode in the bytecode stream (i.e., a {@code rawIndex}). >> + * @param opcode must be {@code Bytecodes.INVOKEDYNAMIC}, {@code CONSTANT_Dynamic_info}, or -1 if >> * {@code index} was not decoded from a bytecode stream >> * @return the bootstrap method invocation details or {@code null} if the entry specified by {@code index} >> - * is not a {@code CONSTANT_Dynamic_info} or @{code CONSTANT_InvokeDynamic_info} >> + * is not a {@code CONSTANT_Dynamic_info} or {@code CONSTANT_InvokeDynamic_info} >> * @jvms 4.7.23 The {@code BootstrapMethods} Attribute >> */ >> default BootstrapMethodInvocation lookupBootstrapMethodInvocation(int index, int opcode) { > > I prototyped the option `List lookupBootstrapMethodInvocations(boolean indy)` here: https://github.com/openjdk/jdk/compare/master...teshull:jdk:jvmci_bootstrap_alternative > > As part of this I also prototyped generic BSM resolution / lookup logic > > From the SVM perspective, retrieving condys via this new support isn't a big win. It's easy enough already to walk the ConstantPool. However, for symmetry purposes, it is reasonable to have this method (along with the resolve / lookup). What's your preference: this new version or the original? I like the symmetry of the new version. Also, I think you can simplify things by replacing use of `flatMap` [here](https://github.com/openjdk/jdk/compare/master...teshull:jdk:jvmci_bootstrap_alternative#diff-b782878562668748c5c59acc2e937f7c24de4529b8a74bd3a4eae83fa0e07846R679) with `filter`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2111539245 From duke at openjdk.org Fri May 30 15:06:30 2025 From: duke at openjdk.org (Tom Shull) Date: Fri, 30 May 2025 15:06:30 GMT Subject: RFR: 8357660: [JVMCI] Add Support for Retrieving All Indy BootstrapMethodInvocations directly from the ConstantPool In-Reply-To: <3Lyb5MHjplhxqRmlkR6y-GpgQWe90ij_jClRdipKMQE=.cf4fcf50-4b4f-4930-abdb-75f9d0be9942@github.com> References: <1AMsWwdYheV0CZ9z_VWbiEPphQwkJz-HO6h-wYNCAfw=.8259a98a-89aa-40e6-98da-81c43d2a45e0@github.com> <3Lyb5MHjplhxqRmlkR6y-GpgQWe90ij_jClRdipKMQE=.cf4fcf50-4b4f-4930-abdb-75f9d0be9942@github.com> Message-ID: On Wed, 28 May 2025 10:45:07 GMT, Doug Simon wrote: >> I prototyped the option `List lookupBootstrapMethodInvocations(boolean indy)` here: https://github.com/openjdk/jdk/compare/master...teshull:jdk:jvmci_bootstrap_alternative >> >> As part of this I also prototyped generic BSM resolution / lookup logic >> >> From the SVM perspective, retrieving condys via this new support isn't a big win. It's easy enough already to walk the ConstantPool. However, for symmetry purposes, it is reasonable to have this method (along with the resolve / lookup). What's your preference: this new version or the original? > > I like the symmetry of the new version. Also, I think you can simplify things by replacing use of `flatMap` [here](https://github.com/openjdk/jdk/compare/master...teshull:jdk:jvmci_bootstrap_alternative#diff-b782878562668748c5c59acc2e937f7c24de4529b8a74bd3a4eae83fa0e07846R679) with `filter`. I updated the javadoc misplaced `@` in `{@code}`. However, the `opcode` doc changes look wrong to me; the opcode must be -1 or INVOKEDYNAMIC (https://github.com/openjdk/jdk/blob/04e0fe00abcf1d7919a50e0c9dd44ce2856984ea/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java#L592) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2113271157 From dnsimon at openjdk.org Fri May 30 15:06:30 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 30 May 2025 15:06:30 GMT Subject: RFR: 8357660: [JVMCI] Add Support for Retrieving All Indy BootstrapMethodInvocations directly from the ConstantPool In-Reply-To: References: <1AMsWwdYheV0CZ9z_VWbiEPphQwkJz-HO6h-wYNCAfw=.8259a98a-89aa-40e6-98da-81c43d2a45e0@github.com> <3Lyb5MHjplhxqRmlkR6y-GpgQWe90ij_jClRdipKMQE=.cf4fcf50-4b4f-4930-abdb-75f9d0be9942@github.com> Message-ID: On Thu, 29 May 2025 06:04:24 GMT, Tom Shull wrote: >> I like the symmetry of the new version. Also, I think you can simplify things by replacing use of `flatMap` [here](https://github.com/openjdk/jdk/compare/master...teshull:jdk:jvmci_bootstrap_alternative#diff-b782878562668748c5c59acc2e937f7c24de4529b8a74bd3a4eae83fa0e07846R679) with `filter`. > > I updated the javadoc misplaced `@` in `{@code}`. However, the `opcode` doc changes look wrong to me; the opcode must be -1 or INVOKEDYNAMIC (https://github.com/openjdk/jdk/blob/04e0fe00abcf1d7919a50e0c9dd44ce2856984ea/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java#L592) yeah, looks like you're right. I was basing my assumption on `case "Dynamic"` in: @Override public BootstrapMethodInvocation lookupBootstrapMethodInvocation(int index, int opcode) { int cpi = opcode == -1 ? index : indyIndexConstantPoolIndex(index, opcode); final JvmConstant tag = getTagAt(cpi); switch (tag.name) { case "InvokeDynamic": case "Dynamic": I guess it's possible for an INVOKEDYNAMIC to resolve it's cpi to a CONSTANT_Dynamic entry. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2113973088 From duke at openjdk.org Fri May 30 15:06:30 2025 From: duke at openjdk.org (Tom Shull) Date: Fri, 30 May 2025 15:06:30 GMT Subject: RFR: 8357660: [JVMCI] Add Support for Retrieving All Indy BootstrapMethodInvocations directly from the ConstantPool In-Reply-To: References: <1AMsWwdYheV0CZ9z_VWbiEPphQwkJz-HO6h-wYNCAfw=.8259a98a-89aa-40e6-98da-81c43d2a45e0@github.com> <3Lyb5MHjplhxqRmlkR6y-GpgQWe90ij_jClRdipKMQE=.cf4fcf50-4b4f-4930-abdb-75f9d0be9942@github.com> Message-ID: On Thu, 29 May 2025 13:40:55 GMT, Doug Simon wrote: >> I updated the javadoc misplaced `@` in `{@code}`. However, the `opcode` doc changes look wrong to me; the opcode must be -1 or INVOKEDYNAMIC (https://github.com/openjdk/jdk/blob/04e0fe00abcf1d7919a50e0c9dd44ce2856984ea/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java#L592) > > yeah, looks like you're right. I was basing my assumption on `case "Dynamic"` in: > > @Override > public BootstrapMethodInvocation lookupBootstrapMethodInvocation(int index, int opcode) { > int cpi = opcode == -1 ? index : indyIndexConstantPoolIndex(index, opcode); > final JvmConstant tag = getTagAt(cpi); > switch (tag.name) { > case "InvokeDynamic": > case "Dynamic": > > I guess it's possible for an INVOKEDYNAMIC to resolve it's cpi to a CONSTANT_Dynamic entry. I think INVOKEDYNAMIC should always point to a CONSTANT_InvokeDynamic entry ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25420#discussion_r2114794800 From mchevalier at openjdk.org Fri May 30 15:42:03 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 30 May 2025 15:42:03 GMT Subject: RFR: 8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 Message-ID: ### Problem On Aarch64, using `Integer.bitCount` can modify its argument. The problem comes from the implementation of `popCountI` on Aarch64. For instance, that's what we get with the reproducer `Reduced.java` on the related issue: ; Load lFld into local x ldr x11, [x10, #120] ; popCountI mov w11, w11 mov v16.d[0], x11 cnt v16.8b, v16.8b addv b16, v16.8b mov x13, v16.d[0] ; [...] ; store local x (which is believed to still contain lFld) into result str x11, [x10, #128] The instruction `mov w11, w11` is used to cut the 32 higher bits of `x11` since we use `popCountI` (from `Integer.bitCount`): on aarch64 (like other architectures), assigning the 32 lower bits of a register reset the 32 higher bits. Short: the input is modified, but the implementation of `popCountI` doesn't declare it: instruct popCountI(iRegINoSp dst, iRegIorL2I src, vRegF tmp) %{ match(Set dst (PopCountI src)); effect(TEMP tmp); [...] %} But then, why resetting the upper word of `x11`? It all starts with vector instructions: cnt v16.8b, v16.8b addv b16, v16.8b The `8b` specifies that it operates on the 8 lower bytes of `v16`, it would be nice to simply use `4b`, but that doesn't exist: vector instructions can only work on either the whole 128-bit register, or the 64 lower bits (by blocks of 1, 2, 4, 8 or 16 bytes). There is no suffix (and encoding) for a vector instruction to work only on the 32 lower bits, so not to pollute the bit count, we need to reset the 32 higher bits of `v16.d[0]` (aka `d16`), that is `v16.s[1]`, that is `v16[32:63]` in a more bit-explicit notation. Moreover, unlike with general purpose register doing mov v16.s[0], w11 would set `v16[0:31]` to `w11`, but not reset `v16[32:63]`. Which makes sense! Otherwise, using vector registers would be impractical if writing any piece would reset the rest... So we indeed need to set all of `v16[0:63]`, which mov w11, w11 mov v16.d[0], x11 does, but by destroying `x11`. ### Solution Simply adding `USE_KILL src` in the effects would be nice, but unfortunately not possible: `iRegIorL2I` is an operand class (either a 32-bit register or a L2I of a 64-bit register) and those cannot be used in effect lists. The way I went for is rather not to modify the source, but rather do write the two lower words of `v16` we are interested in separately: mov v16.s[1], wzr ; Reset the 1-indexed word of v16, that is v16[32:63] <- 0 mov v16.s[0], w11 ; Set the 0-indexed word of v16 to w11, that is v[0:31] <- w11 cnt v16.8b, v16.8b addv b16, v16.8b mov x13, v16.s[0] Unlike other solutions, this is relatively straightforward as it doesn't write twice the same bits, as for instance, this would: mov v16.d[0], xzr ; Reset the 0-indexed double word of v16, that is v16[0:63] <- 0 mov v16.s[0], w11 ; Set the 0-indexed word of v16 to w11, that is v[0:31] <- w11 and it doesn't use additional temporaries, like this would: mov w12, w11 ; Using a fresh register x12 mov v16.d[0], x12 Using the zero register rather than an immediate is convenient as it allows to set 32 bits at once, while a 32-bit immediate would not fit in a single instruction. ### Format The printing of this instruction is not very satisfactory. We used to have something that renders in OptoAssembly movw l2i(R29), l2i(R29) mov V16, l2i(R29) # vector (1D) cnt V16, V16 # vector (8B) addv V16, V16 # vector (8B) mov R13, V16 # vector (1D) This is... somewhat arguable. With context, I can understand or guess what `movw l2i(R29), l2i(R29)` means, but I don't think it's a very nice printout. Also, it's not clear that the second instruction works on the lower word of `V16`. Alas, my new version is not much better: mov V16, zr # vector (1S) mov V16, l2i(R29) # vector (1S) cnt V16, V16 # vector (8B) addv V16, V16 # vector (8B) mov R13, V16 # vector (1D) It's not clear that the first instruction is on the 1-indexed word of `V16` while the second is on the 0-indexed word. I couldn't find a nicer example in a similar situation, so I'm open to suggestions! Maybe simply hardcoding it in the format? as such: format %{ "mov $tmp.s[1], zr\t# vector (1S)\n\t" "mov $tmp.s[0], $src\t# vector (1S)\n\t" "cnt $tmp, $tmp\t# vector (8B)\n\t" "addv $tmp, $tmp\t# vector (8B)\n\t" "mov $dst, $tmp\t# vector (1D)" %} Not sure what's the best practice here. ------------- Commit messages: - Add randomization - Adapt test - Add test - Don't change src, set directly in the vector register Changes: https://git.openjdk.org/jdk/pull/25551/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25551&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353266 Stats: 84 lines in 2 files changed: 80 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/25551.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25551/head:pull/25551 PR: https://git.openjdk.org/jdk/pull/25551 From mchevalier at openjdk.org Fri May 30 15:42:03 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 30 May 2025 15:42:03 GMT Subject: RFR: 8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 In-Reply-To: References: Message-ID: <0Om_yBwlW-qu-buvKTqJZy2STNRMxgms1b_XLgFjRq0=.3e900e23-5108-4206-988d-bbaa175994de@github.com> On Fri, 30 May 2025 15:33:14 GMT, Marc Chevalier wrote: > ### Problem > > On Aarch64, using `Integer.bitCount` can modify its argument. The problem comes from the implementation of `popCountI` on Aarch64. For instance, that's what we get with the reproducer `Reduced.java` on the related issue: > > ; Load lFld into local x > ldr x11, [x10, #120] > ; popCountI > mov w11, w11 > mov v16.d[0], x11 > cnt v16.8b, v16.8b > addv b16, v16.8b > mov x13, v16.d[0] > ; [...] > ; store local x (which is believed to still contain lFld) into result > str x11, [x10, #128] > > > The instruction `mov w11, w11` is used to cut the 32 higher bits of `x11` since we use `popCountI` (from `Integer.bitCount`): on aarch64 (like other architectures), assigning the 32 lower bits of a register reset the 32 higher bits. Short: the input is modified, but the implementation of `popCountI` doesn't declare it: > > instruct popCountI(iRegINoSp dst, iRegIorL2I src, vRegF tmp) %{ > match(Set dst (PopCountI src)); > effect(TEMP tmp); > [...] > %} > > > But then, why resetting the upper word of `x11`? It all starts with vector instructions: > > cnt v16.8b, v16.8b > addv b16, v16.8b > > The `8b` specifies that it operates on the 8 lower bytes of `v16`, it would be nice to simply use `4b`, but that doesn't exist: vector instructions can only work on either the whole 128-bit register, or the 64 lower bits (by blocks of 1, 2, 4, 8 or 16 bytes). There is no suffix (and encoding) for a vector instruction to work only on the 32 lower bits, so not to pollute the bit count, we need to reset the 32 higher bits of `v16.d[0]` (aka `d16`), that is `v16.s[1]`, that is `v16[32:63]` in a more bit-explicit notation. Moreover, unlike with general purpose register doing > > mov v16.s[0], w11 > > would set `v16[0:31]` to `w11`, but not reset `v16[32:63]`. Which makes sense! Otherwise, using vector registers would be impractical if writing any piece would reset the rest... So we indeed need to set all of `v16[0:63]`, which > > mov w11, w11 > mov v16.d[0], x11 > > does, but by destroying `x11`. > > ### Solution > > Simply adding `USE_KILL src` in the effects would be nice, but unfortunately not possible: `iRegIorL2I` is an operand class (either a 32-bit register or a L2I of a 64-bit register) and those cannot be used in effect lists. > > The way I went for is rather not to modify the source, but rather do write the two lower words of `v16` we are interested in separately: > > mov v16.s[1], wzr ; Reset the 1-indexed word of v16, that is v16[32:63] <- 0 > mov v16.s[0], w11 ; Set the 0-ind... Opinion for people in charge: should I fix the fixVersion in the JBS issue, or wait a bit to integrate? ------------- PR Comment: https://git.openjdk.org/jdk/pull/25551#issuecomment-2922724166 From never at openjdk.org Fri May 30 16:07:52 2025 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 30 May 2025 16:07:52 GMT Subject: RFR: 8357619: [JVMCI] Revisit phantom_ref parameter in JVMCINMethodData::get_nmethod_mirror In-Reply-To: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> References: <37LbN00VRPqAt9LN8jx43xx3QGsF6jnPFS_OQLUa-0U=.687f6afe-d13a-4d03-af0c-ac91a9862b13@github.com> Message-ID: On Wed, 28 May 2025 10:28:38 GMT, Doug Simon wrote: > The point of the `phantom_ref` parameter (introduced by [JDK-8234359](https://bugs.openjdk.org/browse/JDK-8234359)) of `JVMCINMethodData::get_nmethod_mirror` is to avoid the special resurrection semantics of a phantom read when reading the field during GC, which is when `JVMCINMethodData::invalidate_nmethod_mirror` can be called. > This case can be handled directly in `JVMCINMethodData::invalidate_nmethod_mirror` and so the `phantom_ref` parameter can be removed. src/hotspot/share/jvmci/jvmciRuntime.cpp line 801: > 799: > 800: void JVMCINMethodData::invalidate_nmethod_mirror(nmethod* nm) { > 801: if (_nmethod_mirror_index == -1) { This part is actually wrong as that's the first part of `get_nmethod_mirror` and we must always check that `get_nmethod_mirror` doesn't return nullptr. I'd assumed that the mirror was always non-null if `_nmethod_mirror_index != -1` but that's not true. The slot is reserved for all non-default nmethods and must stay around so that `translate` can work. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25488#discussion_r2116193278 From rkennke at openjdk.org Fri May 30 16:13:25 2025 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 30 May 2025 16:13:25 GMT Subject: RFR: 8358169: Shenandoah/JVMCI: Export GC state constants Message-ID: We need the GC state enum constants available in JVMCI. ------------- Commit messages: - 8358169: Shenandoah/JVMCI: Export GC state constants Changes: https://git.openjdk.org/jdk/pull/25552/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25552&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358169 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25552.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25552/head:pull/25552 PR: https://git.openjdk.org/jdk/pull/25552 From dnsimon at openjdk.org Fri May 30 16:39:51 2025 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 30 May 2025 16:39:51 GMT Subject: RFR: 8358169: Shenandoah/JVMCI: Export GC state constants In-Reply-To: References: Message-ID: On Fri, 30 May 2025 16:09:03 GMT, Roman Kennke wrote: > We need the GC state enum constants available in JVMCI. Looks good. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25552#pullrequestreview-2881865876 From eastigeevich at openjdk.org Fri May 30 16:58:00 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 30 May 2025 16:58:00 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v19] In-Reply-To: <17al0aeFhm0iZHoHHGiqB03RfPeSrIHIoZuapOHPuy4=.a2ff2d67-392b-40f0-b6d9-6e3a7f396e8a@github.com> References: <17al0aeFhm0iZHoHHGiqB03RfPeSrIHIoZuapOHPuy4=.a2ff2d67-392b-40f0-b6d9-6e3a7f396e8a@github.com> Message-ID: On Thu, 29 May 2025 23:18:43 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with four additional commits since the last revision: > > - Add requires GC to tests > - Add type for immutable_data_references > - Fix incorrect destination set if no trampoline available > - Update assert note in nmethod::clear_inline_caches src/hotspot/share/code/nmethod.hpp line 28: > 26: #define SHARE_CODE_NMETHOD_HPP > 27: > 28: #define IMMUTABLE_DATA_REFERENCES int >From https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md#naming > Type names and global names should use mixed-case with the first letter of each word capitalized (FooBar). Instead of macro, use `using AaaBbbb = int;`. You can move it inside `nmethod` declaration. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2116263311 From jbhateja at openjdk.org Fri May 30 17:40:16 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 30 May 2025 17:40:16 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v8] In-Reply-To: References: Message-ID: > Hi All, > > This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. > > Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. > > New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23947/files - new: https://git.openjdk.org/jdk/pull/23947/files/4c4d1688..4065fb9c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=06-07 Stats: 105 lines in 1 file changed: 77 ins; 10 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/23947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23947/head:pull/23947 PR: https://git.openjdk.org/jdk/pull/23947 From jbhateja at openjdk.org Fri May 30 17:46:00 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 30 May 2025 17:46:00 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v8] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 17:40:16 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. >> >> Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. >> >> New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions We can further constrain the value range bounds of bit compression and expansion once PR #17508 gets integrated. For now, I have developed the following draft demonstrates bound constraining with KnownBitLattice. // // Prototype of bit compress/expand value range computation // using KnownBits infrastructure. // #include #include #include #include template class KnownBitsLattice { private: U zeros; U ones; public: KnownBitsLattice(U lb, U ub); U getKnownZeros() { return zeros; } U getKnownOnes() { return ones; } long getKnownZerosCount() { uint64_t count = 0; asm volatile ("popcntq %1, %0 \n\t" : "=r"(count) : "r"(zeros) : "cc"); return count; } long getKnownOnesCount() { uint64_t count = 0; asm volatile ("popcntq %1, %0 \n\t" : "=r"(count) : "r"(ones) : "cc"); return count; } bool check_voilation() { // A given bit cannot be both zero or one. return (zeros & ones) != 0; } bool is_MSB_KnownOneBitsSet() { return (ones >> 63) == 1; } bool is_MSB_KnownZeroBitsSet() { return (zeros >> 63) == 1; } }; template KnownBitsLattice::KnownBitsLattice(U lb, U ub) { // To find KnownBitsLattice from a given value range // we first find the common prefix b/w upper and lower // bound, we then concertize known zeros and ones bit // based on common prefix. // e.g. // lb = 00110001 // ub = 00111111 // common prefix = 0011XXXX // knownbits.zeros = 11000000 // knownbits.ones = 00110000 // // conversely, for a give knownbits value we can find // lower and upper value ranges. // e.g. // knownbits.zeros = 0x00010001 // knownbits.ones = 0x10001100 // range.lo = knownbits.ones, this is because knownbits.ones are // guaranteed to be one. // range.hi = ~knownbits.zeros, this is an optimistic upper bound // which assumes all unset knownbits.zero // are ones. // Thus in above example, // range.lo = 0x8C // range.hi = 0xEE U lzcnt = 0; U common_prefix = lb ^ ub; asm volatile ("lzcntq %1, %0 \n\t" : "=r"(lzcnt) : "r"(common_prefix) : "cc"); U common_prefix_mask = lzcnt == 0 ? 0xFFFFFFFFFFFFFFFFL : ~((1ULL << (64 - lzcnt)) - 1); zeros = (~lb) & common_prefix_mask; ones = (lb) & common_prefix_mask; } uint64_t getPopcount(uint64_t value) { uint64_t cnt = 0; asm volatile ("popcntq %1 , %0 \n\t" : "=r"(cnt) : "r"(value) : "cc"); return cnt; } int main(int argc, char * argv[]) { if (argc != 5) { return printf("Unexpected input ! \n"); } long cnt = 0; long mask_lo = atol(argv[1]); long mask_hi = atol(argv[2]); long src_lo = atol(argv[3]); long src_hi = atol(argv[4]); printf("mask.lo = %ld \n", mask_lo); printf("mask.hi = %ld \n", mask_hi); printf("src.lo = %ld \n", src_lo); printf("src.hi = %ld \n", src_hi); KnownBitsLattice mask_bits(mask_lo, mask_hi); printf("mask.bits.zeros = 0x%lx \n", mask_bits.getKnownZeros()); printf("mask.bits.zeros_count = %ld \n", mask_bits.getKnownZerosCount()); printf("mask.bits.ones = 0x%lx \n", mask_bits.getKnownOnes()); printf("mask.bits.ones_count = %ld \n", mask_bits.getKnownOnesCount()); assert(!mask_bits.check_voilation()); KnownBitsLattice src_bits(src_lo, src_hi); printf("src.bits.zeros = 0x%lx \n", src_bits.getKnownZeros()); printf("src.bits.zeros_count = %ld \n", src_bits.getKnownZerosCount()); printf("src.bits.ones = 0x%lx \n", src_bits.getKnownOnes()); printf("src.bits.ones_count = %ld \n", src_bits.getKnownOnesCount()); assert(!src_bits.check_voilation()); // Bit compression selects the source bits corresponding to true mask bits, // packs them and places them contiguously at destination bit positions // starting from least significant bit, remaining higher order bits are set // to zero. // In order to compute optimistic_upper_bound, barring MSB bit all unset known.zero bits // can be assumed to be set to 1, also we can assume corresponding source bits are set to 1, // thereby resulting into a max int value, else compute the upper bound through popcount of // flipped known zero bits. uint64_t bit_compress_optimistic_upper_bound = mask_bits.getKnownZerosCount() == 0 ? 0x7FFFFFFFFFFFFFFF : (1UL << (64 - mask_bits.getKnownZerosCount())) - 1; // Q. For bit compression, can we find maximum value less than optimistic_upper_bound where we assume // all the bits corresponding to source.knownbits.ones are set ? // A. Yes, again by taking into consideration source.knownbits.zeros we can find a maximum value less than // optimistic_upper_bound. Bit compression picks the source bits corresponding to set mask bits, packs // them and places them at destination bit positions starting from least significant bit. // Reset optimistic_upper_bound bits corresponding to set mask bits where source knownbits.zeros is set to 1. auto src_zeros = src_bits.getKnownZeros(); auto constrained_mask = mask_bits.getKnownZerosCount() == 0 ? 0x7FFFFFFFFFFFFFFF : ~mask_bits.getKnownZeros(); constrained_mask = constrained_mask & ~src_zeros; uint64_t constrained_bit_compress_upper_bound = (1UL << getPopcount(constrained_mask)) - 1; // In order to compute optimistic_lower_bound, we refer mask.knownbits.ones, if all the bits are set then // minimum value is computed by assuming all but MSB bits as zero, else minimum value will always be a non-negative // value, this is based on assumption that source bits corresponding to set mask bits were zero. uint64_t bit_compress_optimistic_lower_bound = mask_bits.getKnownOnesCount() == 64 ? 0x8000000000000000 : 0; // Q. For bit compression, can we find a minimum value greater than optimistic_lower_bound // A. Yes, optimistic_lower_bound for mask with knownbits.ones.cnt as 64 is minimum int value which is based on the assumption // that all source bits corresponding to true mask bits barring most significant bit are set to 0. By consulting // source.knownbits.ones we can find a value greater than optimistic_lower_bound // e.g. // mask.knownbits.ones = 0xFFFFFFFFFFFFFFFF (-1) // optimistic_lower_bound = 0x8000000000000000 which assume that all but MSB bits are set to zero. // if // source.knownbits.ones = 0xF0, i.e. bit 4-7 are guaranteed ones // then // result.lo = 0x80000000000000F0 which is greater than optimistic_lower_bound uint64_t constrained_bit_compress_lower_bound = bit_compress_optimistic_lower_bound; if (bit_compress_optimistic_lower_bound < 0) { constrained_bit_compress_lower_bound |= mask_bits.getKnownOnes() & src_bits.getKnownOnes(); } else { constrained_bit_compress_lower_bound = (1UL << getPopcount(mask_bits.getKnownOnes() & src_bits.getKnownOnes())) - 1; } // Bit expansion is a reverse process, which sequentially reads source bits // starting from LSB and places them at bit positions in result value where // corresponding mask bits are 1. Thus, bit expansion for non-negative mask // value will always generate a +ve value, this is because sign bit of result // will never be set to 1 as corresponding mask bit is always 0. // To compute optimistic upper bound for bit expansion, we assume all but last read source bit to be one, // number of source bits read equals popcount of mask value. uint64_t bit_expansion_optimistic_upper_bound = (~mask_bits.getKnownZeros() | mask_bits.getKnownOnes()) & ~0x8000000000000000; // Q. For bit expansion can we find an upper bound lesser than optimistic_upper_bound ? // A. Yes, we can find a maximum value lower than optimistic upper bound by taking into consideration // source.knownbits.zeros bits, if any of the lower order n source bits where n equals popcount of mask // are zero then we are sure to find a maximum value less than optimistic upper bound // e.g. // mask = (~mask_bits.zero | mask_bits.ones) = 0xFF00FF00 // optimistic_upper_bound = 0xFF00FF00 // if (source.knownbits.zero | ~source.knownbits.ones) = 0xFF // then lower order 8 bits of src are always set to 0, thus bit expansion which reads // 16 least significant source bits, only assumes upper 8 bits to be 1. // constrained_upper_bound = 0xFF000000 uint64_t num_lower_order_source_bits_read = getPopcount(bit_expansion_optimistic_upper_bound); uint64_t lower_order_source_bits = ~((src_bits.getKnownZeros() | ~src_bits.getKnownOnes()) & ((1UL << lower_order_source_bits) -1)); uint64_t constrained_bit_expansion_upper_bound = 0; asm volatile ("pdepq %2 , %1, %0 \n\t" : "=r"(constrained_bit_expansion_upper_bound) : "r"(lower_order_source_bits) , "r"(bit_expansion_optimistic_upper_bound) : "cc"); // Q. Can we use mask_bits.knownbits.ones.MSB to ascertain a -ve result ? // A. Since results is dependent on lower order source bits values and expansion simply scatters those // bits at result bit positions corresponding to set mask bits, hence just based on set MSB we cannot // guarantee a -ve result, also if source.knownbits.ones.cnt == 64 then result is solely // a function of mask_bits ones count. if (src_bits.getKnownOnesCount() == 64) { constrained_bit_expansion_upper_bound = (~mask_bits.getKnownZeros() | mask_bits.getKnownOnes()); } // To compute optimistic lower bound for bit expansion, we check mask.knownbits.zeros.MSB bit, // if it's set, bit expansion will always result in a non-negative value and we can consider // lowest non-negative value as the lower bound else result should be lowest integral // value i.e. 0x8000000000000000 // Q. Why do we not base our assumptions to compute optimistic_lower_bound on is_MSB_KnownOneBitsSet ? // A. This is becasue even if it returns false and mask.knownbits.zeros.MSB is zero actual mask value // may still have its most significant sign bit set. // Thus golden rule to check for non-negative number is mask.knownbits.zero.MSB should be set. uint64_t bit_expansion_optimistic_lower_bound = mask_bits.is_MSB_KnownZeroBitsSet() ? 0 : 0x8000000000000000; // Q. For bit expansion, can we find a minimum value greater than optimistic lower bound ? // A. Yes, it can be done by first computing the knownbits from source value range, then consider // lower order source bit expansion. // e.g. // mask.knownbits.ones = 0x000000000000F0F0 // mask.knownbits.zeros = 0x8000000000000000 // // Here, mask.knownbits.zeros.MSB is 1, this means result will always be a non-negative value, and optimistic // lower bound can be assumed to be minimum non-negative value i.e. 0 // // optimistic_lower_bound = 0 // max_num_lower_order_source_bits_read = getPopcount((~mask.knownbits.zeros | mask.knownbits.ones)) // source.knownbits.ones = 0x0F00 // source.knownbits.zeros = 0x00F0 // // source_one_bits = (source.knownbits.ones | ~source.knownbits.zeros) // = 0xF00 | ~0x00F0 // = 0xF00 | 0xFF0F // = 0xFF0F // // Consider first set knownonebits with bit index less than max_num_lower_order_source_bits_read // to compute next lower value greater than optimistic_lower_bound 0 // // Thus, minimum lower bound greater than optimistic_lower_bound is 0x001. // // If mask.knownbits.ones.MSB is 1, then result may a -ve value, I am saying maybe because it depends on the last read source bit // if its 1 then results is a -ve value else not. Last read source bit depends on the popcount of actual mask, with mask.knownbits.ones // we can only partially determine number of set mask bits, remaining bits i.e. ~mask.knownbits.zeros are unknown at // compile time. Thus, its not possible to make any assumption based on unknown mask popcount. // // Overall, KnownBits information help us constrain optimistic value range bounds. uint64_t constrained_bit_expansion_lower_bound = bit_expansion_optimistic_lower_bound; // Try to find lower bound greater than optimistic_lower_bound if (mask_bits.is_MSB_KnownZeroBitsSet()) { uint64_t source_one_bits = src_bits.getKnownOnes() | ~src_bits.getKnownZeros(); uint64_t first_set_bit = 0; asm volatile ("bsfq %1 , %0 \n\t" : "=r"(first_set_bit) : "r"(source_one_bits) : "cc"); constrained_bit_expansion_lower_bound |= (1UL << first_set_bit); } printf("\nbit_compress_optimistic_upper_bound = %lx \n", bit_compress_optimistic_upper_bound); printf("constrained_bit_compress_upper_bound= %lx \n", constrained_bit_compress_upper_bound); printf("bit_compress_optimistic_lower_bound = %lx \n", bit_compress_optimistic_lower_bound); printf("constrained_bit_compress_lower_bound = %lx \n", constrained_bit_compress_lower_bound); printf("bit_expansion_optimistic_upper_bound = %lx \n", bit_expansion_optimistic_upper_bound); printf("constrained_bit_expansion_upper_bound= %lx \n", constrained_bit_expansion_upper_bound); printf("bit_expansion_optimistic_lower_bound = %lx \n", bit_expansion_optimistic_lower_bound); printf("constrained_bit_expansion_lower_bound = %lx \n", constrained_bit_expansion_lower_bound); assert(bit_compress_optimistic_upper_bound >= constrained_bit_compress_upper_bound); assert(bit_compress_optimistic_lower_bound <= constrained_bit_compress_lower_bound); assert(bit_expansion_optimistic_upper_bound >= constrained_bit_expansion_upper_bound); assert(bit_expansion_optimistic_lower_bound <= constrained_bit_expansion_lower_bound); return 0; } ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2923008281 From sviswanathan at openjdk.org Fri May 30 17:56:52 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 30 May 2025 17:56:52 GMT Subject: RFR: 8357982: Fix several failing BMI tests with -XX:+UseAPX In-Reply-To: References: Message-ID: On Wed, 28 May 2025 16:21:56 GMT, Jatin Bhateja wrote: > A) Patch extends the following tests with hard-coded encoding checks for various BMI instructions to cover REX2 or extended EVEX encodings supported by APX. > > > compiler/intrinsics/bmi/verifycode/AndnTestI.java > compiler/intrinsics/bmi/verifycode/AndnTestL.java > compiler/intrinsics/bmi/verifycode/BzhiTestI2L.java > compiler/intrinsics/bmi/verifycode/LZcntTestL.java > compiler/intrinsics/bmi/verifycode/TZcntTestL.java > > > B) After integration of JDK-8349582, which added APX NDD support, AndN instruction selection patterns that expect (Xor SRC, -1) as one of its operands were not getting selected because of a lower-cost generic immediate pattern match; patch fixes this issue through strict predicate checks. > > Above tests are now passing, validations were carried out using Intel Software Development emulator. > > Kindly review and share your feedback. > > Best Regards, > Jatin How about the other tests in the same directory: Blsi, Blsr, Blsmsk, LZcnti, TZcnti? They also need the APX encoding with higher bank register usage. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25501#issuecomment-2923036492 From sviswanathan at openjdk.org Fri May 30 18:08:59 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 30 May 2025 18:08:59 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v36] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 20:48:38 GMT, Srinivas Vamsi Parasa wrote: >> Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. >> >> For example: >> >> `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding >> `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > refactor is_P6_or_later and remove cpu_family==18 We plan to integrate this PR on Monday June 2nd so as to not get very close to the upcoming fork next Thursday. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2923060837 From never at openjdk.org Fri May 30 19:29:01 2025 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 30 May 2025 19:29:01 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v19] In-Reply-To: <17al0aeFhm0iZHoHHGiqB03RfPeSrIHIoZuapOHPuy4=.a2ff2d67-392b-40f0-b6d9-6e3a7f396e8a@github.com> References: <17al0aeFhm0iZHoHHGiqB03RfPeSrIHIoZuapOHPuy4=.a2ff2d67-392b-40f0-b6d9-6e3a7f396e8a@github.com> Message-ID: On Thu, 29 May 2025 23:18:43 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with four additional commits since the last revision: > > - Add requires GC to tests > - Add type for immutable_data_references > - Fix incorrect destination set if no trampoline available > - Update assert note in nmethod::clear_inline_caches So this copying keeps the same compile_id, which sort of makes sense but it's also potentially confusing. What's the plan for how this interacts with flags like PrintNMethods and JVMTI code installation notification? This is done in nmethod::post_compiled_method which doesn't seem to be used on the new nmethod. If the reclamation of the old nmethod is performed in the normal fashion, we now have 2 nmethods alive with the same compile_id which could be confusing. But allocating a new compile_id breaks the connection to the original compile which seems bad too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23573#issuecomment-2923296912 From chagedorn at openjdk.org Fri May 30 20:38:54 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 30 May 2025 20:38:54 GMT Subject: RFR: 8354930: IGV: dump C2 graph before and after live range stretching In-Reply-To: References: Message-ID: On Wed, 28 May 2025 11:54:24 GMT, Manuel H?ssig wrote: > This PR introduces a new phase `LIVE_RANGE_STRETCHING` that prints after live ranges have been stretched, if that happens at all. The phase `INITIAL_LIVENESS` is moved before live range stretching so we can compare the live ranges before and after stretching in IGV, which is useful for debugging why an oop suddenly belongs to an oop map. > > ## Testing > > - [x] [Github Actions](https://github.com/mhaessig/jdk/actions/runs/15299362485) > - [x] tier1 and tier1, plus additional Oracle internal testing for all Oracle supported platforms and OSs > - [x] verified that the new phase prints when it should in IGV and with `-XX:PrintPhaseLevel=4` Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/25492#pullrequestreview-2882457988 From eastigeevich at openjdk.org Fri May 30 20:39:04 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Fri, 30 May 2025 20:39:04 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v19] In-Reply-To: <17al0aeFhm0iZHoHHGiqB03RfPeSrIHIoZuapOHPuy4=.a2ff2d67-392b-40f0-b6d9-6e3a7f396e8a@github.com> References: <17al0aeFhm0iZHoHHGiqB03RfPeSrIHIoZuapOHPuy4=.a2ff2d67-392b-40f0-b6d9-6e3a7f396e8a@github.com> Message-ID: On Thu, 29 May 2025 23:18:43 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with four additional commits since the last revision: > > - Add requires GC to tests > - Add type for immutable_data_references > - Fix incorrect destination set if no trampoline available > - Update assert note in nmethod::clear_inline_caches src/hotspot/share/code/nmethod.hpp line 502: > 500: // Relocate the nmethod to the code heap identified by code_blob_type. > 501: // Returns nullptr if the code heap does not have enough space or the > 502: // nmethod is unrelocatable, otherwise the relocated nmethod. ... nmethod is unrelocatable, or nmethod is invalidated during relocation, otherwise ... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2116601704 From kvn at openjdk.org Fri May 30 21:04:50 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 30 May 2025 21:04:50 GMT Subject: RFR: 8357175: Failure to generate or load AOT code should be handled gracefully In-Reply-To: References: Message-ID: On Thu, 29 May 2025 18:45:11 GMT, Vladimir Kozlov wrote: > By default a failed AOT code should be discarded with UL message about it by request (`-Xlog:aot+codecache+*=debug`) and VM and AOT code processing should continue run. > > Unless we hit some catastrophic failure: OOM for example. This is similar how JIT compilers behave. > > I reordered VM configuration settings checking (`Config::verify()`) so that we switch off AOT code caching type which depends on these VM settings. For example, AOT adapters do not operate on oops - they are not affected by compressed oops settings/encoding. I removed `_objectAlignment` check because CDS already does this check when open archive. > > The AOT relocation processing for a blob will skip this blob when corresponding address is not found instead of bailing out VM in product mode. In debug VM it will issue assert so we know about missing address. These changes are in `AOTCodeAddressTable::id_for_address()` > > I kept `fatal()` in `AOTCodeAddressTable::for_address_for_id()` for incorrect ID we read from archive. The archive could be corrupted if ID is wrong. > > I did small code cleanup/renaming. > > Tested: tier1-10 @ashu-mehra please look ------------- PR Comment: https://git.openjdk.org/jdk/pull/25525#issuecomment-2923475844 From duke at openjdk.org Fri May 30 22:37:30 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 30 May 2025 22:37:30 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v20] In-Reply-To: References: Message-ID: <3Dhpf3MQZrowhxUynTzzHEEsOY2dancWXtTLZH0-wRE=.e14282b6-060f-4c22-8fc4-2d4a5a8c0ec6@github.com> > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with four additional commits since the last revision: - Move ICache::invalidate_range - Fix comment - Small fix - Remove gc on allocation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/c5ff58f4..6d053dc0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=18-19 Stats: 27 lines in 2 files changed: 13 ins; 6 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Fri May 30 22:49:42 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 30 May 2025 22:49:42 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v21] In-Reply-To: References: Message-ID: > This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). > > When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. > > This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: Change to ImmutableDataReferences ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23573/files - new: https://git.openjdk.org/jdk/pull/23573/files/6d053dc0..9f753071 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23573&range=19-20 Stats: 10 lines in 2 files changed: 2 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23573/head:pull/23573 PR: https://git.openjdk.org/jdk/pull/23573 From duke at openjdk.org Fri May 30 22:55:05 2025 From: duke at openjdk.org (Chad Rakoczy) Date: Fri, 30 May 2025 22:55:05 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v3] In-Reply-To: <3lvuXGbqkCDeGwkzDQtzhkbZGN1XgcTcuFfL0_TUPvA=.4ba152ea-b3ab-4339-a42c-03d78bfcc829@github.com> References: <8l4e6nqzNukJ6st0fEkLwKqlF35stq_W9ph831eo8w4=.6cbb2172-b35a-4d27-bab7-1d104c9f993b@github.com> <3lvuXGbqkCDeGwkzDQtzhkbZGN1XgcTcuFfL0_TUPvA=.4ba152ea-b3ab-4339-a42c-03d78bfcc829@github.com> Message-ID: On Fri, 30 May 2025 01:17:49 GMT, Vladimir Kozlov wrote: >> I still have that concern for **mutable** data which includes relocations which accessed frequently. >> >> I don't think accessing **immutable** is performance critical. They mostly accessed during deoptimization and from JVMTI. > > Actually even **mutable** is not critical since we did not include oops data section (we keep it with nmethod). It currently contains relocations, metadata (klass*, method*) and JVMCI data. We can experiment in separate RFE if we can use separate class for it too. I have created a RFE to move the immutable data from the nmethod to a separate class. [JDK-8358213](https://bugs.openjdk.org/browse/JDK-8358213) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2116837278 From kvn at openjdk.org Fri May 30 23:23:58 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 30 May 2025 23:23:58 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v3] In-Reply-To: References: <8l4e6nqzNukJ6st0fEkLwKqlF35stq_W9ph831eo8w4=.6cbb2172-b35a-4d27-bab7-1d104c9f993b@github.com> <3lvuXGbqkCDeGwkzDQtzhkbZGN1XgcTcuFfL0_TUPvA=.4ba152ea-b3ab-4339-a42c-03d78bfcc829@github.com> Message-ID: On Fri, 30 May 2025 22:51:53 GMT, Chad Rakoczy wrote: >> Actually even **mutable** is not critical since we did not include oops data section (we keep it with nmethod). It currently contains relocations, metadata (klass*, method*) and JVMCI data. We can experiment in separate RFE if we can use separate class for it too. > > I have created a RFE to move the immutable data from the nmethod to a separate class. [JDK-8358213](https://bugs.openjdk.org/browse/JDK-8358213) Thanks. Let's keep current changes as it is with small comment: IMMUTABLE_DATA_REFERENCES is used with `sizeof()` in all places - consider using instead ``` #define IMMUTABLE_DATA_REFERENCES_SIZE sizeof(int) ``` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2116857046 From dlong at openjdk.org Sat May 31 00:31:54 2025 From: dlong at openjdk.org (Dean Long) Date: Sat, 31 May 2025 00:31:54 GMT Subject: RFR: 8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 In-Reply-To: <0Om_yBwlW-qu-buvKTqJZy2STNRMxgms1b_XLgFjRq0=.3e900e23-5108-4206-988d-bbaa175994de@github.com> References: <0Om_yBwlW-qu-buvKTqJZy2STNRMxgms1b_XLgFjRq0=.3e900e23-5108-4206-988d-bbaa175994de@github.com> Message-ID: <50cNhVps-VSpbPZrVNerO5gSCjYwaesL3FkwoPHRToU=.8ac7af59-d12e-4e39-9739-32c6e1fd63b8@github.com> On Fri, 30 May 2025 15:36:30 GMT, Marc Chevalier wrote: > Opinion for people in charge: should I fix the fixVersion in the JBS issue, or wait a bit to integrate? I would say, change the fixVersion to 25 and try to get this into 25, resulting it one less backport needed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25551#issuecomment-2923797883 From dbriemann at openjdk.org Sat May 31 02:49:58 2025 From: dbriemann at openjdk.org (David Briemann) Date: Sat, 31 May 2025 02:49:58 GMT Subject: Integrated: 8357304: [PPC64] C2: Implement MinV, MaxV and Reduction nodes In-Reply-To: References: Message-ID: <6RqeUqJQ6wX79oBlLoffmyJxbroucySwaX0p3zlJdlM=.2f137da4-ddc5-4f97-9337-a8896215b0c0@github.com> On Tue, 20 May 2025 06:47:45 GMT, David Briemann wrote: > The following nodes are added: > - MinV / MaxV > - AndV / OrV / XorV > - MinReductionV / MaxReductionV / AndReductionV / OrReductionV / XorReductionV > - AddReductionVI / MulReductionVI This pull request has now been integrated. Changeset: 061b24d4 Author: David Briemann Committer: SendaoYan URL: https://git.openjdk.org/jdk/commit/061b24d4f9d8635944683766532e9252c3ba0152 Stats: 213 lines in 6 files changed: 213 ins; 0 del; 0 mod 8357304: [PPC64] C2: Implement MinV, MaxV and Reduction nodes Reviewed-by: mdoerr, varadam ------------- PR: https://git.openjdk.org/jdk/pull/25318 From syan at openjdk.org Sat May 31 03:03:00 2025 From: syan at openjdk.org (SendaoYan) Date: Sat, 31 May 2025 03:03:00 GMT Subject: RFR: 8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 In-Reply-To: References: Message-ID: On Fri, 30 May 2025 15:33:14 GMT, Marc Chevalier wrote: > ### Problem > > On Aarch64, using `Integer.bitCount` can modify its argument. The problem comes from the implementation of `popCountI` on Aarch64. For instance, that's what we get with the reproducer `Reduced.java` on the related issue: > > ; Load lFld into local x > ldr x11, [x10, #120] > ; popCountI > mov w11, w11 > mov v16.d[0], x11 > cnt v16.8b, v16.8b > addv b16, v16.8b > mov x13, v16.d[0] > ; [...] > ; store local x (which is believed to still contain lFld) into result > str x11, [x10, #128] > > > The instruction `mov w11, w11` is used to cut the 32 higher bits of `x11` since we use `popCountI` (from `Integer.bitCount`): on aarch64 (like other architectures), assigning the 32 lower bits of a register reset the 32 higher bits. Short: the input is modified, but the implementation of `popCountI` doesn't declare it: > > instruct popCountI(iRegINoSp dst, iRegIorL2I src, vRegF tmp) %{ > match(Set dst (PopCountI src)); > effect(TEMP tmp); > [...] > %} > > > But then, why resetting the upper word of `x11`? It all starts with vector instructions: > > cnt v16.8b, v16.8b > addv b16, v16.8b > > The `8b` specifies that it operates on the 8 lower bytes of `v16`, it would be nice to simply use `4b`, but that doesn't exist: vector instructions can only work on either the whole 128-bit register, or the 64 lower bits (by blocks of 1, 2, 4, 8 or 16 bytes). There is no suffix (and encoding) for a vector instruction to work only on the 32 lower bits, so not to pollute the bit count, we need to reset the 32 higher bits of `v16.d[0]` (aka `d16`), that is `v16.s[1]`, that is `v16[32:63]` in a more bit-explicit notation. Moreover, unlike with general purpose register doing > > mov v16.s[0], w11 > > would set `v16[0:31]` to `w11`, but not reset `v16[32:63]`. Which makes sense! Otherwise, using vector registers would be impractical if writing any piece would reset the rest... So we indeed need to set all of `v16[0:63]`, which > > mov w11, w11 > mov v16.d[0], x11 > > does, but by destroying `x11`. > > ### Solution > > Simply adding `USE_KILL src` in the effects would be nice, but unfortunately not possible: `iRegIorL2I` is an operand class (either a 32-bit register or a L2I of a 64-bit register) and those cannot be used in effect lists. > > The way I went for is rather not to modify the source, but rather do write the two lower words of `v16` we are interested in separately: > > mov v16.s[1], wzr ; Reset the 1-indexed word of v16, that is v16[32:63] <- 0 > mov v16.s[0], w11 ; Set the 0-ind... Hi, how does this bug was found, seems the original testcase generated by a fuzz tool. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25551#issuecomment-2924073433 From syan at openjdk.org Sat May 31 03:13:53 2025 From: syan at openjdk.org (SendaoYan) Date: Sat, 31 May 2025 03:13:53 GMT Subject: RFR: 8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 In-Reply-To: References: Message-ID: On Fri, 30 May 2025 15:33:14 GMT, Marc Chevalier wrote: > ### Problem > > On Aarch64, using `Integer.bitCount` can modify its argument. The problem comes from the implementation of `popCountI` on Aarch64. For instance, that's what we get with the reproducer `Reduced.java` on the related issue: > > ; Load lFld into local x > ldr x11, [x10, #120] > ; popCountI > mov w11, w11 > mov v16.d[0], x11 > cnt v16.8b, v16.8b > addv b16, v16.8b > mov x13, v16.d[0] > ; [...] > ; store local x (which is believed to still contain lFld) into result > str x11, [x10, #128] > > > The instruction `mov w11, w11` is used to cut the 32 higher bits of `x11` since we use `popCountI` (from `Integer.bitCount`): on aarch64 (like other architectures), assigning the 32 lower bits of a register reset the 32 higher bits. Short: the input is modified, but the implementation of `popCountI` doesn't declare it: > > instruct popCountI(iRegINoSp dst, iRegIorL2I src, vRegF tmp) %{ > match(Set dst (PopCountI src)); > effect(TEMP tmp); > [...] > %} > > > But then, why resetting the upper word of `x11`? It all starts with vector instructions: > > cnt v16.8b, v16.8b > addv b16, v16.8b > > The `8b` specifies that it operates on the 8 lower bytes of `v16`, it would be nice to simply use `4b`, but that doesn't exist: vector instructions can only work on either the whole 128-bit register, or the 64 lower bits (by blocks of 1, 2, 4, 8 or 16 bytes). There is no suffix (and encoding) for a vector instruction to work only on the 32 lower bits, so not to pollute the bit count, we need to reset the 32 higher bits of `v16.d[0]` (aka `d16`), that is `v16.s[1]`, that is `v16[32:63]` in a more bit-explicit notation. Moreover, unlike with general purpose register doing > > mov v16.s[0], w11 > > would set `v16[0:31]` to `w11`, but not reset `v16[32:63]`. Which makes sense! Otherwise, using vector registers would be impractical if writing any piece would reset the rest... So we indeed need to set all of `v16[0:63]`, which > > mov w11, w11 > mov v16.d[0], x11 > > does, but by destroying `x11`. > > ### Solution > > Simply adding `USE_KILL src` in the effects would be nice, but unfortunately not possible: `iRegIorL2I` is an operand class (either a 32-bit register or a L2I of a 64-bit register) and those cannot be used in effect lists. > > The way I went for is rather not to modify the source, but rather do write the two lower words of `v16` we are interested in separately: > > mov v16.s[1], wzr ; Reset the 1-indexed word of v16, that is v16[32:63] <- 0 > mov v16.s[0], w11 ; Set the 0-ind... test/hotspot/jtreg/compiler/intrinsics/BitCountIAarch64PreservesArgument.java line 58: > 56: if (result != 0xfedc_ba98_7654_3210L) { > 57: // Wrongly outputs the cut input 0x7654_3210 == 1985229328 > 58: throw new RuntimeException("Wrong result. lFld=" + lFld + "; result=" + result); How about: throw new RuntimeException("Wrong result. Expected result = " + lFld + "; Actual result = " + result); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25551#discussion_r2117132388 From eastigeevich at openjdk.org Sat May 31 10:10:59 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Sat, 31 May 2025 10:10:59 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v21] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 22:49:42 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Change to ImmutableDataReferences src/hotspot/share/code/nmethod.cpp line 1572: > 1570: > 1571: // Verify the nm we copied from is still valid > 1572: if (method() != nullptr && method()->code() == this && !is_marked_for_deoptimization() && is_in_use()) { We can turn `method() != nullptr && method()->code() == this` into an assert. If `is_in_use()` returns true they should be true as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2117669694 From eastigeevich at openjdk.org Sat May 31 10:26:05 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Sat, 31 May 2025 10:26:05 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v21] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 22:49:42 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Change to ImmutableDataReferences src/hotspot/share/code/nmethod.hpp line 172: > 170: friend class DeoptimizationScope; > 171: > 172: using ImmutableDataReferences = int; Sorry, I might look too annoying. Let's be more specific. This type represents not references themself. It is a counter of them: `ImmutableDataReferenceCounter` Let's reflect this in all related names. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2117682837 From epeter at openjdk.org Sat May 31 11:12:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 31 May 2025 11:12:54 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v62] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/3c4e1ce2..981330d1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=61 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=60-61 Stats: 13 lines in 1 file changed: 1 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Sat May 31 11:12:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 31 May 2025 11:12:55 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 07:28:44 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8344942-TemplateFramework-v3' of https://github.com/eme64/jdk into JDK-8344942-TemplateFramework-v3 >> - move verification > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 696: > >> 694: * rendering a template with {@code render(fuel)} (e.g. {@link ZeroArgs#render(float)}). >> 695: */ >> 696: static final float DEFAULT_FUEL = 100.0f; > > Fields defined in an interface are implicitly static and final > Suggestion: > > float DEFAULT_FUEL = 100.0f; @manuel suggested it.... but you are right, it is unnecessary! https://www.tutorialspoint.com/interface-variables-are-static-and-final-by-default-in-java-why ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2117711864 From epeter at openjdk.org Sat May 31 11:25:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 31 May 2025 11:25:22 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v63] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with three additional commits since the last revision: - more from Christian - Update test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java Co-authored-by: Christian Hagedorn - Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/981330d1..4624d307 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=62 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=61-62 Stats: 30 lines in 4 files changed: 4 ins; 4 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Sat May 31 11:25:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 31 May 2025 11:25:23 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 06:07:00 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8344942-TemplateFramework-v3' of https://github.com/eme64/jdk into JDK-8344942-TemplateFramework-v3 >> - move verification > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 160: > >> 158: * Ideally, we would have used string templates to inject these Template arguments into the strings. >> 159: * But since string templates are not (yet) available, the Templates provide hashtag replacements >> 160: > > These paragraphs should probably belong together? And maybe you want to wrap the long line above. > Suggestion: Nice catch! Artifact from applying previous suggestion. Fixed now :) > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 174: > >> 172: *

>> 173: * The dollar and hashtag names must have at least one character. The first character must be a letter >> 174: * or underscore (i.e. {@code a-zA-Z_}), the other characters can also be digits (i.e. {@code a-zA-Z0-9_}). > > This does not seem to be enforced: > > var testTemplate = Template.make(() -> body( > """ > public class Foo { > public static void main() { > int $1var = 34; > } > } > """ > )); > System.out.println(testTemplate.render()); > > Results in: > > public class Foo { > public static void main() { > int $1var = 34; > } > } > > which compiles fine. I can also change it to `$$var` which renders to `$var_1` which also compiles fine. Oh, I did not think about these edge cases! Nice catch. The problem is with the regex. The regex only finds valid cases, so your cases are not found. I can change the regex to find also invalid cases, and then validate them. I'll have to add some more tests for this as well! > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java line 61: > >> 59: public class TestAdvanced { >> 60: private static final Random RANDOM = Utils.getRandomInstance(); >> 61: > > Unused > Suggestion: Fixed, also removed the include. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2117725006 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2117725778 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2117725151 From epeter at openjdk.org Sat May 31 11:28:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 31 May 2025 11:28:08 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: <_9v8HAVi-VCgq9pHbk92s9XUdtk2kdq3xo_wWl17D6k=.79f6cbac-3df4-402b-9b32-5a607a3e6653@github.com> On Sat, 31 May 2025 11:22:21 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 174: >> >>> 172: *

>>> 173: * The dollar and hashtag names must have at least one character. The first character must be a letter >>> 174: * or underscore (i.e. {@code a-zA-Z_}), the other characters can also be digits (i.e. {@code a-zA-Z0-9_}). >> >> This does not seem to be enforced: >> >> var testTemplate = Template.make(() -> body( >> """ >> public class Foo { >> public static void main() { >> int $1var = 34; >> } >> } >> """ >> )); >> System.out.println(testTemplate.render()); >> >> Results in: >> >> public class Foo { >> public static void main() { >> int $1var = 34; >> } >> } >> >> which compiles fine. I can also change it to `$$var` which renders to `$var_1` which also compiles fine. > > Oh, I did not think about these edge cases! Nice catch. > The problem is with the regex. The regex only finds valid cases, so your cases are not found. I can change the regex to find also invalid cases, and then validate them. I'll have to add some more tests for this as well! This is what I already had, but they only validate the `let` and `$` calls, not the string replacements that use the regex. ![image](https://github.com/user-attachments/assets/936d9027-d794-4fa5-9ace-58d3b7bc69d0) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2117727666 From epeter at openjdk.org Sat May 31 11:45:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 31 May 2025 11:45:08 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 06:32:54 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8344942-TemplateFramework-v3' of https://github.com/eme64/jdk into JDK-8344942-TemplateFramework-v3 >> - move verification > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 174: > >> 172: *

>> 173: * The dollar and hashtag names must have at least one character. The first character must be a letter >> 174: * or underscore (i.e. {@code a-zA-Z_}), the other characters can also be digits (i.e. {@code a-zA-Z0-9_}). > > This does not seem to be enforced: > > var testTemplate = Template.make(() -> body( > """ > public class Foo { > public static void main() { > int $1var = 34; > } > } > """ > )); > System.out.println(testTemplate.render()); > > Results in: > > public class Foo { > public static void main() { > int $1var = 34; > } > } > > which compiles fine. I can also change it to `$$var` which renders to `$var_1` which also compiles fine. @chhagedorn Gosh, this is actually not so simple, to write a good regex here that also parses the "bad" cases. Consider this: `#{abc}` Is now ambiguous: should we parse `#` as an empty name, and `{abc}` as unrelated? Or is it a bracketed name `abc`? I could first parse all the bracket patterns, and then the non-bracket patterns. But what if someone does a bracket replacement, and inserts a non-bracket pattern? Then we might end up re-parsing the inserted String, and try to replace there again. What a mess! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2117739309 From epeter at openjdk.org Sat May 31 11:51:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 31 May 2025 11:51:09 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: <4MgAjHfzurYkWqrZ6ah81SwKah7IHR7okOxnq5gapb8=.b7b7bfc8-6dd7-4186-9839-b446c86f21a3@github.com> On Fri, 30 May 2025 06:32:54 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8344942-TemplateFramework-v3' of https://github.com/eme64/jdk into JDK-8344942-TemplateFramework-v3 >> - move verification > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 174: > >> 172: *

>> 173: * The dollar and hashtag names must have at least one character. The first character must be a letter >> 174: * or underscore (i.e. {@code a-zA-Z_}), the other characters can also be digits (i.e. {@code a-zA-Z0-9_}). > > This does not seem to be enforced: > > var testTemplate = Template.make(() -> body( > """ > public class Foo { > public static void main() { > int $1var = 34; > } > } > """ > )); > System.out.println(testTemplate.render()); > > Results in: > > public class Foo { > public static void main() { > int $1var = 34; > } > } > > which compiles fine. I can also change it to `$$var` which renders to `$var_1` which also compiles fine. @chhagedorn The current parsing/regex-ing is relatively simple. We only parse the "valid" cases, so the description above is still relevant. Your example `$1var` is not a valid pattern, so the regex does not match, and there is no replacement. Sadly, in Java `$1var` is a valid variable name, so there is some chance that the user makes a mistake and gets tripped up by this. If the user does a call to `let` or `$` with such a bad string `1var`, then they get a `RendererException`. The question is this: Should I really try to parse these "bad" patterns, just to validate them as well? All solutions I can think of are really complicated. Is it worth it? Or is it just a mistake by the user, and so the matching does not happen, and that is the users problem? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2117740333 From epeter at openjdk.org Sat May 31 11:51:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 31 May 2025 11:51:09 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: <4MgAjHfzurYkWqrZ6ah81SwKah7IHR7okOxnq5gapb8=.b7b7bfc8-6dd7-4186-9839-b446c86f21a3@github.com> References: <4MgAjHfzurYkWqrZ6ah81SwKah7IHR7okOxnq5gapb8=.b7b7bfc8-6dd7-4186-9839-b446c86f21a3@github.com> Message-ID: On Sat, 31 May 2025 11:47:26 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 174: >> >>> 172: *

>>> 173: * The dollar and hashtag names must have at least one character. The first character must be a letter >>> 174: * or underscore (i.e. {@code a-zA-Z_}), the other characters can also be digits (i.e. {@code a-zA-Z0-9_}). >> >> This does not seem to be enforced: >> >> var testTemplate = Template.make(() -> body( >> """ >> public class Foo { >> public static void main() { >> int $1var = 34; >> } >> } >> """ >> )); >> System.out.println(testTemplate.render()); >> >> Results in: >> >> public class Foo { >> public static void main() { >> int $1var = 34; >> } >> } >> >> which compiles fine. I can also change it to `$$var` which renders to `$var_1` which also compiles fine. > > @chhagedorn > The current parsing/regex-ing is relatively simple. We only parse the "valid" cases, so the description above is still relevant. > Your example `$1var` is not a valid pattern, so the regex does not match, and there is no replacement. Sadly, in Java `$1var` is a valid variable name, so there is some chance that the user makes a mistake and gets tripped up by this. > > If the user does a call to `let` or `$` with such a bad string `1var`, then they get a `RendererException`. > > The question is this: > Should I really try to parse these "bad" patterns, just to validate them as well? All solutions I can think of are really complicated. Is it worth it? Or is it just a mistake by the user, and so the matching does not happen, and that is the users problem? FYI: `$$var` the first `$` is not a valid pattern, so it is not replaced. But `$var` is, and so that part gets replaced. The result is `$var_1`, which sadly happens to also be valid Java code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2117740776 From aph at openjdk.org Sat May 31 14:31:50 2025 From: aph at openjdk.org (Andrew Haley) Date: Sat, 31 May 2025 14:31:50 GMT Subject: RFR: 8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 In-Reply-To: References: Message-ID: On Fri, 30 May 2025 15:33:14 GMT, Marc Chevalier wrote: > ### Problem > > On Aarch64, using `Integer.bitCount` can modify its argument. The problem comes from the implementation of `popCountI` on Aarch64. For instance, that's what we get with the reproducer `Reduced.java` on the related issue: > > ; Load lFld into local x > ldr x11, [x10, #120] > ; popCountI > mov w11, w11 > mov v16.d[0], x11 > cnt v16.8b, v16.8b > addv b16, v16.8b > mov x13, v16.d[0] > ; [...] > ; store local x (which is believed to still contain lFld) into result > str x11, [x10, #128] > > > The instruction `mov w11, w11` is used to cut the 32 higher bits of `x11` since we use `popCountI` (from `Integer.bitCount`): on aarch64 (like other architectures), assigning the 32 lower bits of a register reset the 32 higher bits. Short: the input is modified, but the implementation of `popCountI` doesn't declare it: > > instruct popCountI(iRegINoSp dst, iRegIorL2I src, vRegF tmp) %{ > match(Set dst (PopCountI src)); > effect(TEMP tmp); > [...] > %} > > > But then, why resetting the upper word of `x11`? It all starts with vector instructions: > > cnt v16.8b, v16.8b > addv b16, v16.8b > > The `8b` specifies that it operates on the 8 lower bytes of `v16`, it would be nice to simply use `4b`, but that doesn't exist: vector instructions can only work on either the whole 128-bit register, or the 64 lower bits (by blocks of 1, 2, 4, 8 or 16 bytes). There is no suffix (and encoding) for a vector instruction to work only on the 32 lower bits, so not to pollute the bit count, we need to reset the 32 higher bits of `v16.d[0]` (aka `d16`), that is `v16.s[1]`, that is `v16[32:63]` in a more bit-explicit notation. Moreover, unlike with general purpose register doing > > mov v16.s[0], w11 > > would set `v16[0:31]` to `w11`, but not reset `v16[32:63]`. Which makes sense! Otherwise, using vector registers would be impractical if writing any piece would reset the rest... So we indeed need to set all of `v16[0:63]`, which > > mov w11, w11 > mov v16.d[0], x11 > > does, but by destroying `x11`. > > ### Solution > > Simply adding `USE_KILL src` in the effects would be nice, but unfortunately not possible: `iRegIorL2I` is an operand class (either a 32-bit register or a L2I of a 64-bit register) and those cannot be used in effect lists. > > The way I went for is rather not to modify the source, but rather do write the two lower words of `v16` we are interested in separately: > > mov v16.s[1], wzr ; Reset the 1-indexed word of v16, that is v16[32:63] <- 0 > mov v16.s[0], w11 ; Set the 0-ind... src/hotspot/cpu/aarch64/aarch64.ad line 7771: > 7769: ins_encode %{ > 7770: __ mov($tmp$$FloatRegister, __ S, 1, zr); // tmp[32:63] <- 0 > 7771: __ mov($tmp$$FloatRegister, __ S, 0, $src$$Register); // tmp[ 0:31] <- src "Where the entire 128-bit wide register is not fully utilized, the vector or scalar quantity is held in the least significant bits of the register, with the most significant bits being cleared to zero on a write." Suggestion: __ fmovs($tmp$$FloatRegister, $src$$Register); should do it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25551#discussion_r2117909516 From aph at openjdk.org Sat May 31 14:40:53 2025 From: aph at openjdk.org (Andrew Haley) Date: Sat, 31 May 2025 14:40:53 GMT Subject: RFR: 8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 In-Reply-To: <50cNhVps-VSpbPZrVNerO5gSCjYwaesL3FkwoPHRToU=.8ac7af59-d12e-4e39-9739-32c6e1fd63b8@github.com> References: <0Om_yBwlW-qu-buvKTqJZy2STNRMxgms1b_XLgFjRq0=.3e900e23-5108-4206-988d-bbaa175994de@github.com> <50cNhVps-VSpbPZrVNerO5gSCjYwaesL3FkwoPHRToU=.8ac7af59-d12e-4e39-9739-32c6e1fd63b8@github.com> Message-ID: On Sat, 31 May 2025 00:29:17 GMT, Dean Long wrote: > Opinion for people in charge: should I fix the fixVersion in the JBS issue, or wait a bit to integrate? Get it in 25. Low risk, significant Java compatibility bug. ------------- PR Comment: https://git.openjdk.org/jdk/pull/25551#issuecomment-2925277211 From epeter at openjdk.org Sat May 31 16:23:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 31 May 2025 16:23:58 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v36] In-Reply-To: References: Message-ID: On Thu, 22 May 2025 20:48:38 GMT, Srinivas Vamsi Parasa wrote: >> Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. >> >> For example: >> >> `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding >> `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > refactor is_P6_or_later and remove cpu_family==18 Looks reasonable to me, thanks for the work! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24431#pullrequestreview-2884297007 From epeter at openjdk.org Sat May 31 16:23:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 31 May 2025 16:23:59 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v23] In-Reply-To: References: <6aZaHfVvUJFLz83fyZ42bnoSGseaRBYd0jEg_VLdS2Q=.4c681def-ee7c-4fcd-b147-348d317ac58f@github.com> <1e-92EcDWshsTiFbEmJt8z5SAVfhf5vpr8sgbEq3BbQ=.25d6d5f7-48d3-4a13-ac7d-8844844490fa@github.com> <-kceEhIMg1R1fVYefLJ14cu5NeIRt2a_ZPw82ABwci8=.4dd111fc-bbf9-42dd-a17b-a3572a8c598d@github.com> Message-ID: On Wed, 28 May 2025 20:54:04 GMT, Srinivas Vamsi Parasa wrote: >>> @vamsi-parasa Testing looked good, though now you pushed some more changes. I'd like to run tests one more time before integration. Please let me know when you are ready :) >> >> Hi Emanuel (@eme64), >> >> Thanks for the update! The new changes got approved and are ready for testing. >> Could you please launch the tests? >> >> Thanks, >> Vamsi > >> @vamsi-parasa Launched! > > Hi Emanuel (@eme64), > > Could you pls let me know when the testing is completed? > Will integrate it if everything looks good. > > Thanks, > Vamsi @vamsi-parasa @sviswa7 Testing passed! We are celebrating ascension weekend, hence we may be a little less responsive than usual ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2925398999 From sviswanathan at openjdk.org Sat May 31 16:32:01 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 31 May 2025 16:32:01 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v23] In-Reply-To: References: <6aZaHfVvUJFLz83fyZ42bnoSGseaRBYd0jEg_VLdS2Q=.4c681def-ee7c-4fcd-b147-348d317ac58f@github.com> <1e-92EcDWshsTiFbEmJt8z5SAVfhf5vpr8sgbEq3BbQ=.25d6d5f7-48d3-4a13-ac7d-8844844490fa@github.com> <-kceEhIMg1R1fVYefLJ14cu5NeIRt2a_ZPw82ABwci8=.4dd111fc-bbf9-42dd-a17b-a3572a8c598d@github.com> Message-ID: On Wed, 28 May 2025 20:54:04 GMT, Srinivas Vamsi Parasa wrote: >>> @vamsi-parasa Testing looked good, though now you pushed some more changes. I'd like to run tests one more time before integration. Please let me know when you are ready :) >> >> Hi Emanuel (@eme64), >> >> Thanks for the update! The new changes got approved and are ready for testing. >> Could you please launch the tests? >> >> Thanks, >> Vamsi > >> @vamsi-parasa Launched! > > Hi Emanuel (@eme64), > > Could you pls let me know when the testing is completed? > Will integrate it if everything looks good. > > Thanks, > Vamsi > @vamsi-parasa @sviswa7 Testing passed! We are celebrating ascension weekend, hence we may be a little less responsive than usual ;) Thanks a lot @eme64. Really appreciate all your help. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2925407679 From epeter at openjdk.org Sat May 31 16:48:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 31 May 2025 16:48:47 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v64] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: rename template arguments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/4624d307..03c9514b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=63 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=62-63 Stats: 111 lines in 2 files changed: 0 ins; 0 del; 111 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Sat May 31 16:48:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 31 May 2025 16:48:49 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 07:12:05 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8344942-TemplateFramework-v3' of https://github.com/eme64/jdk into JDK-8344942-TemplateFramework-v3 >> - move verification > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 324: > >> 322: * A {@link Template} with one argument. >> 323: * >> 324: * @param arg0Name The name of the (first) argument, used for hashtag replacements in the {@link Template}. > > Nit and I'm okay with both: Should we name the first argument arg1 instead of arg0? Starting from zero might not be expected. I renamed it to arg1, arg2, arg3, with types T1, T2, T3. > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 341: > >> 339: * {@link Template}. >> 340: */ >> 341: public TemplateToken asToken(A a) { > > Could also be named `valueArg0` which is more expressive and it's easier when working in an IDE: > ![image](https://github.com/user-attachments/assets/ab44b841-2cd0-4c13-942c-0188f60b421c) > > vs. > > ![image](https://github.com/user-attachments/assets/63f06a73-1ecd-4592-bcad-adc637a4a096) > > You could apply that change for the `render()` methods as well. Same for the two and three arg versions. I renamed it to `arg1, arg2, arg3`, with types `T1, T2, T3`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2118050972 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2118050954 From epeter at openjdk.org Sat May 31 17:01:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 31 May 2025 17:01:55 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v65] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: good practice ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/03c9514b..2d18fcae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=64 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=63-64 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From epeter at openjdk.org Sat May 31 17:01:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 31 May 2025 17:01:57 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 07:16:35 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8344942-TemplateFramework-v3' of https://github.com/eme64/jdk into JDK-8344942-TemplateFramework-v3 >> - move verification > > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 539: > >> 537: * @param arg0Name The name of the (first) argument for hashtag replacement. >> 538: * @return A {@link Template} with one argument. >> 539: */ > > Just a general thought and not really something we can enforce by the framework, but we might want to mention here as well that the `arg0Name` string should match the lambda parameter for easier application and consistency? Theoretically (and not very clever), you can do that: > > var testTemplate = Template.make("a", "b", (Integer b, Integer a) -> body( > """ > public class Foo { > public static void main() { > int a1 = #a; > int b1 = #b; > """, > "int a2 = " + a + ";\n", // != a1, oops > "int b2 = " + b + ";\n", // != b1, oops > """ > } > } > """ > )); > > We could make the same remark in the two and three arg `make()` versions as well. Sure. I added this line: `Good practice but not enforced: {@code arg1Name}, {@code arg2Name}, and {@code arg3Name} should match the lambda argument names.` > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 771: > >> 769: } >> 770: boolean mutable = mutability == DataName.Mutability.MUTABLE; >> 771: if (0 >= weight || weight > 1000) { > > Could be more readable but up to you > Suggestion: > > if (weight <= 0 || weight > 1000) { Sure. I prefer going from small to large, so I'll do this: `if (weight <= 0 || 1000 < weight) {` > test/hotspot/jtreg/compiler/lib/template_framework/Template.java line 814: > >> 812: */ >> 813: static Token addStructuralName(String name, StructuralName.Type type, int weight) { >> 814: if (0 >= weight || weight > 1000) { > > Could be more readable but up to you > Suggestion: > > if (weight <= 0 || weight > 1000) { Sure. I prefer going from small to large, so I'll do this: `if (weight <= 0 || 1000 < weight) {` > test/hotspot/jtreg/testlibrary_tests/template_framework/tests/TestTemplate.java line 79: > >> 77: @Override >> 78: public boolean isSubtypeOf(DataName.Type other) { >> 79: return other instanceof MyPrimitive(String n) && n == name(); > > Is `==` wanted and not `equals()`? Good catch! I suppose it does not really matter here in this example, but it would be better practice. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2118058874 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2118060523 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2118060526 PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2118061219 From epeter at openjdk.org Sat May 31 17:05:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 31 May 2025 17:05:09 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v61] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 07:48:27 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8344942-TemplateFramework-v3' of https://github.com/eme64/jdk into JDK-8344942-TemplateFramework-v3 >> - move verification > > test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java line 84: > >> 82: // Hint: if you want to see the generated source code, you can enable >> 83: // printing of the source code that the CompileFramework receives, >> 84: // with -DCompileFrameworkVerbose=true > > Maybe also add here that the printed output is not formatted and one might consider dumping it to an IDE or other tool to auto-format. Nice idea! Extended it like this: 82 // Hint: if you want to see the generated source code, you can enable 83 // printing of the source code that the CompileFramework receives, 84 // with -DCompileFrameworkVerbose=true + 85 // The code may not be nicely formatted, especially regarding + 86 // indentation. You might consider dumping the generated code + 87 // into an IDE or other auto-formatting tool. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24217#discussion_r2118064440 From epeter at openjdk.org Sat May 31 17:23:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 31 May 2025 17:23:52 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v66] In-Reply-To: References: Message-ID: > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more suggestions applied ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/2d18fcae..ea2bb65d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=65 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=64-65 Stats: 30 lines in 3 files changed: 16 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From eastigeevich at openjdk.org Sat May 31 19:43:00 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Sat, 31 May 2025 19:43:00 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v21] In-Reply-To: References: Message-ID: <7jNBtwz-x_rh1MKzKY2dg-wo2H_pw8LnKbxm_YRzWQY=.fe8e5b6c-5546-42d7-987b-154bb7e6a521@github.com> On Fri, 30 May 2025 22:49:42 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Change to ImmutableDataReferences src/hotspot/share/code/relocInfo.cpp line 383: > 381: // If the original call is to an address in the src CodeBuffer (such as a stub call) > 382: // the updated call should be to the corresponding address in dest CodeBuffer > 383: int offset = callee - orig_addr; Use `ptrdiff_t` instead of `int`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2118190036 From asmehra at openjdk.org Sat May 31 19:52:55 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Sat, 31 May 2025 19:52:55 GMT Subject: RFR: 8357175: Failure to generate or load AOT code should be handled gracefully In-Reply-To: References: Message-ID: On Thu, 29 May 2025 18:45:11 GMT, Vladimir Kozlov wrote: > By default a failed AOT code should be discarded with UL message about it by request (`-Xlog:aot+codecache+*=debug`) and VM and AOT code processing should continue run. > > Unless we hit some catastrophic failure: OOM for example. This is similar how JIT compilers behave. > > I reordered VM configuration settings checking (`Config::verify()`) so that we switch off AOT code caching type which depends on these VM settings. For example, AOT adapters do not operate on oops - they are not affected by compressed oops settings/encoding. I removed `_objectAlignment` check because CDS already does this check when open archive. > > The AOT relocation processing for a blob will skip this blob when corresponding address is not found instead of bailing out VM in product mode. In debug VM it will issue assert so we know about missing address. These changes are in `AOTCodeAddressTable::id_for_address()` > > I kept `fatal()` in `AOTCodeAddressTable::for_address_for_id()` for incorrect ID we read from archive. The archive could be corrupted if ID is wrong. > > I did small code cleanup/renaming. > > Tested: tier1-10 src/hotspot/share/code/aotCodeCache.cpp line 434: > 432: > 433: if (((_flags & enableContendedPadding) != 0) != EnableContended) { > 434: log_debug(aot, codecache, init)("AOT Code Cache disabled: it was created with EnableContended = %s", EnableContended ? "false" : "true"); This check says code cache is disabled, but we still return true. Same with other checks following this. Is that intentional? src/hotspot/share/code/aotCodeCache.cpp line 985: > 983: // ------------ process code and data -------------- > 984: > 985: #define BAD_ADDRESS_ID -2 Can you please add a comment to indicate why -1 is not used. >From the comment in `id_for_address`, I guess it is because -1 is a valid id for representing jump to itself in static call stub. Is that correct? int id = -1; if (addr == (address)-1) { // Static call stub has jump to itself return id; } src/hotspot/share/code/aotCodeCache.cpp line 1011: > 1009: } > 1010: case relocInfo::runtime_call_w_cp_type: > 1011: log_debug(aot, codecache, reloc)("runtime_call_w_cp_type relocation is not unimplemented"); typo: "relocation is not unimplemented" -> "relocation is unimplemented" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/25525#discussion_r2118176845 PR Review Comment: https://git.openjdk.org/jdk/pull/25525#discussion_r2118201960 PR Review Comment: https://git.openjdk.org/jdk/pull/25525#discussion_r2118177090 From eastigeevich at openjdk.org Sat May 31 20:35:02 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Sat, 31 May 2025 20:35:02 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v21] In-Reply-To: References: Message-ID: <4MX-ARqN9xt8pvCYlsfv82_UVC_Kl0YbLOFUNemzRGY=.988198ac-01a5-4b33-bf02-5df8da2a0372@github.com> On Fri, 30 May 2025 22:49:42 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Change to ImmutableDataReferences test/hotspot/jtreg/compiler/whitebox/DeoptimizeRelocatedNMethod.java line 131: > 129: > 130: // Deoptimized method > 131: WHITE_BOX.deoptimizeMethod(method); Should we invoke `function` before deoptimizing it to check it works? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2118233657 From asmehra at openjdk.org Sat May 31 20:50:02 2025 From: asmehra at openjdk.org (Ashutosh Mehra) Date: Sat, 31 May 2025 20:50:02 GMT Subject: RFR: 8358230: Incorrect location for the assert for blob != nullptr in CodeBlob::create Message-ID: A trivial fix to moves the assert for `blob != nullptr` before any usage of the the `blob` ------------- Commit messages: - 8358230: Incorrect location for the assert for blob != nullptr in CodeBlob::create Changes: https://git.openjdk.org/jdk/pull/25566/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25566&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8358230 Stats: 3 lines in 1 file changed: 2 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/25566.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25566/head:pull/25566 PR: https://git.openjdk.org/jdk/pull/25566 From eastigeevich at openjdk.org Sat May 31 20:59:57 2025 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Sat, 31 May 2025 20:59:57 GMT Subject: RFR: 8316694: Implement relocation of nmethod within CodeCache [v21] In-Reply-To: References: Message-ID: On Fri, 30 May 2025 22:49:42 GMT, Chad Rakoczy wrote: >> This PR introduces a new function to replace nmethods, addressing [JDK-8316694](https://bugs.openjdk.org/browse/JDK-8316694). It enables the creation of new nmethods from existing ones, allowing method relocation in the code heap and supporting [JDK-8328186](https://bugs.openjdk.org/browse/JDK-8328186). >> >> When an nmethod is replaced, a deep copy is performed. The corresponding Java method is updated to reference the new nmethod, while the old one is marked as unused. The garbage collector handles final cleanup and deallocation. >> >> This change does not modify existing code paths and therefore does not benefit much from existing tests. New tests were created and confirmed to pass on x64/aarch64 for slowdebug/fastdebug/release. > > Chad Rakoczy has updated the pull request incrementally with one additional commit since the last revision: > > Change to ImmutableDataReferences test/lib/jdk/test/whitebox/WhiteBox.java line 498: > 496: relocateNMethodFromMethod0(method, type); > 497: } > 498: public native void relocateNMethodFromAddr0(long address, int type); Why does the name have '0' at the end? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23573#discussion_r2118247481 From sparasa at openjdk.org Sat May 31 21:43:01 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Sat, 31 May 2025 21:43:01 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v23] In-Reply-To: References: <6aZaHfVvUJFLz83fyZ42bnoSGseaRBYd0jEg_VLdS2Q=.4c681def-ee7c-4fcd-b147-348d317ac58f@github.com> <1e-92EcDWshsTiFbEmJt8z5SAVfhf5vpr8sgbEq3BbQ=.25d6d5f7-48d3-4a13-ac7d-8844844490fa@github.com> <-kceEhIMg1R1fVYefLJ14cu5NeIRt2a_ZPw82ABwci8=.4dd111fc-bbf9-42dd-a17b-a3572a8c598d@github.com> Message-ID: On Wed, 28 May 2025 20:54:04 GMT, Srinivas Vamsi Parasa wrote: >>> @vamsi-parasa Testing looked good, though now you pushed some more changes. I'd like to run tests one more time before integration. Please let me know when you are ready :) >> >> Hi Emanuel (@eme64), >> >> Thanks for the update! The new changes got approved and are ready for testing. >> Could you please launch the tests? >> >> Thanks, >> Vamsi > >> @vamsi-parasa Launched! > > Hi Emanuel (@eme64), > > Could you pls let me know when the testing is completed? > Will integrate it if everything looks good. > > Thanks, > Vamsi > @vamsi-parasa @sviswa7 Testing passed! We are celebrating ascension weekend, hence we may be a little less responsive than usual ;) Thank you, Emanuel! :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2925768667 From duke at openjdk.org Sat May 31 21:43:02 2025 From: duke at openjdk.org (duke) Date: Sat, 31 May 2025 21:43:02 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same [v36] In-Reply-To: References: Message-ID: <3Y1lElsgEywa4vxlwEJB_4tqwDsVjmSW49InOto5oX8=.d0b50246-a0f8-44aa-8013-7a2ca06ab4db@github.com> On Thu, 22 May 2025 20:48:38 GMT, Srinivas Vamsi Parasa wrote: >> Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. >> >> For example: >> >> `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding >> `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > refactor is_P6_or_later and remove cpu_family==18 @vamsi-parasa Your change (at version 3378eaee8ec66b295711c5ca5bbe4d1585b5131a) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24431#issuecomment-2925769917 From sparasa at openjdk.org Sat May 31 23:11:02 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Sat, 31 May 2025 23:11:02 GMT Subject: Integrated: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 01:15:36 GMT, Srinivas Vamsi Parasa wrote: > Intel APX NDD instructions are encoded using EVEX encoding. The goal of this PR is to enable optimized instruction encoding for Intel APX NDD instructions when the non-destructive destination is same as the first source. > > For example: > > `eaddl r18, r18, r25` can be encoded as `addl r18, r25` using APX REX2 encoding > `eaddl r2, r2, r7` can be encoded as `addl r2, r7` using non-APX legacy encoding This pull request has now been integrated. Changeset: fc3d3d9b Author: Srinivas Vamsi Parasa Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/fc3d3d9b303652275599e315b2d7e534d92080ea Stats: 3879 lines in 8 files changed: 1267 ins; 497 del; 2115 mod 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same Reviewed-by: sviswanathan, jbhateja, epeter ------------- PR: https://git.openjdk.org/jdk/pull/24431