From gcao at openjdk.org Thu Jan 2 07:38:18 2025 From: gcao at openjdk.org (Gui Cao) Date: Thu, 2 Jan 2025 07:38:18 GMT Subject: RFR: 8346922: TestVectorReinterpret.java fails without the rvv extension on RISCV fastdebug VM Message-ID: Hi, TestVectorReinterpret.java fails without the rvv extension on RISCV fastdebug VM, on riscv platform need to have rvv extension to run it. ### Testing - [x] Run TestVectorReinterpret.java tests on SOPHON SG2042 without rvv1.0 (fastdebug) - [x] Run TestVectorReinterpret.java tests on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) - [ ] Run TestVectorReinterpret.java tests on aarch64 with neon - [ ] Run TestVectorReinterpret.java tests on Xeon(R) Platinum 8378A CPU ------------- Commit messages: - 8346922: TestVectorReinterpret.java fails without the rvv extension Changes: https://git.openjdk.org/jdk/pull/22901/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22901&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346922 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22901.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22901/head:pull/22901 PR: https://git.openjdk.org/jdk/pull/22901 From gcao at openjdk.org Thu Jan 2 07:44:19 2025 From: gcao at openjdk.org (Gui Cao) Date: Thu, 2 Jan 2025 07:44:19 GMT Subject: RFR: 8346924: TestVectorizationNegativeScale.java fails without the rvv extension on RISCV fastdebug VM Message-ID: Hi, TestVectorizationNegativeScale.java fails without the rvv extension on RISCV fastdebug VM, on riscv platform need to have rvv extension to run it. ### Testing - [x] Run TestVectorizationNegativeScale.java tests on SOPHON SG2042 without rvv1.0 (fastdebug) - [x] Run TestVectorizationNegativeScale.java tests on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) - [ ] Run TestVectorizationNegativeScale.java tests on aarch64 with neon - [ ] Run TestVectorizationNegativeScale.java tests on Xeon(R) Platinum 8378A CPU ------------- Commit messages: - 8346924: TestVectorizationNegativeScale.java fails without the rvv extension on RISCV fastdebug VM Changes: https://git.openjdk.org/jdk/pull/22902/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22902&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346924 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22902.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22902/head:pull/22902 PR: https://git.openjdk.org/jdk/pull/22902 From matthias at mernst.org Thu Jan 2 16:11:23 2025 From: matthias at mernst.org (Matthias Ernst) Date: Thu, 2 Jan 2025 17:11:23 +0100 Subject: Bad assert in OuterStripMinedLoopNode::transform_to_counted_loop? In-Reply-To: References: Message-ID: In fact, the assertion can be triggered by just this code: import static java.lang.foreign.ValueLayout.JAVA_LONG; import java.lang.foreign.Arena; import java.lang.foreign.MemorySegment; public class Repro { static final int COUNT = 100000; static final MemorySegment segment = Arena.global().allocate(JAVA_LONG, COUNT); public static void main(String[] args) { var i = 0; var j = 0; while (i < COUNT) { segment.setAtIndex(JAVA_LONG, i++, 0); segment.setAtIndex(JAVA_LONG, j++, 0); } } } On Tue, Dec 31, 2024 at 13:26 Matthias Ernst wrote: > Hi, I've come across an assertion tripping in JDK debug builds (24, > latest, not 23) in combination with new foreign memory apis. Product builds > seem to be functioning ok: > > # Internal Error (src/hotspot/share/opto/loopnode.cpp:3196), pid=27, tid=44 > # Error: assert(!loop->_body.contains(in)) failed > > > I've been able to narrow it down somewhat, repro code/data as well as > crash logs can be found here: > https://github.com/mernst-github/repro/tree/main/loopnode-assertion (this > was originally a standard `quicksort` on top of a MemorySegment). > > I have not been able to discover anything more systematic. Small changes > to the input data, or using a heap array instead of a MemorySegment make > the issue go away. > > Cheers > Matthias > (PS: lmk if this is not an opportune place to report such an issue) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kxu at openjdk.org Thu Jan 2 18:59:36 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Thu, 2 Jan 2025 18:59:36 GMT Subject: RFR: 8336759: C2: int counted loop with long limit not recognized as counted loop [v3] In-Reply-To: References: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> Message-ID: <19Euq7y3NcY166nbpHDepVzMT-SNBEHzrmD5jDUHrM8=.37b6a69c-31b7-4167-93dd-04b554f25fc6@github.com> On Wed, 4 Dec 2024 16:11:58 GMT, Kangcheng Xu wrote: >> This patch implements [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759) that recognizes int counted loops with long limits. >> >> Currently, patterns like `for ( int i =...; i < long_limit; ...)` where int `i` is implicitly promoted to long (i.e., `(long) i < long_limit`) is not recognized as (int) counted loop. This patch speculatively and optimistically converts long limits to ints and deoptimize if the limit is outside int range, allowing more optimization opportunities. >> >> In other words, it transforms >> >> >> for (int i = 0; (long) i < long_limit; i++) {...} >> >> >> to >> >> >> if (int_min <= long_limit && long_limit <= int_max ) { >> for (int i = 0; i < (int) long_limit; i++) {...} >> } else { >> trap: loop_limit_check >> } >> >> >> This could benefit calls to APIs like `long MemorySegment#byteSize()` when iterating over a long limit. > > Kangcheng Xu has updated the pull request incrementally with three additional commits since the last revision: > > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn Bump. Refactor is WIP ------------- PR Comment: https://git.openjdk.org/jdk/pull/22449#issuecomment-2568228265 From mdoerr at openjdk.org Thu Jan 2 21:09:39 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 2 Jan 2025 21:09:39 GMT Subject: RFR: 8346038: [REDO] - [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: <0ihL6ChjMqIzIsmK-P3KZZV0kGmmTeOeAj1Wo54AIHk=.3b6a1d1e-9e7b-4c7f-ad56-7b2492e1f1ee@github.com> On Thu, 12 Dec 2024 13:12:45 GMT, Martin Doerr wrote: > 1st commit: Same as https://github.com/openjdk/jdk/commit/a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c > 2nd commit: Removal of special "Knights Landing" code for `lir_abs` and `lir_neg` from C1. > > Testing: > make run-test TEST="test/hotspot/jtreg/compiler" JTREG="VM_OPTIONS=-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting" > All passed. > > This is how this PR changes the C1 code on Knights Landing CPUs (emulated by -XX:+UseKNLSetting): > > `lir_abs` without this patch: > > vmovsd -0x63(%rip),%xmm1 > vpandnd %zmm0,%zmm1,%zmm0 > > > `lir_neg` without this patch: > > vmovsd -0x63(%rip),%xmm1 > vpxord %zmm0,%zmm1,%zmm0 > > > (The `vmovsd` loads the `LIR_OprFact::doubleConst(-0.0)`.) > > `lir_abs` with this patch: > > vandpd 0xa1b213d(%rip),%xmm0,%xmm0 > > > `lir_neg` with this patch: > > vxorpd 0xa12585d(%rip),%xmm0,%xmm0 > > > New code is faster on our machine (using -XX:+UseKNLSetting). @dholmes-ora: make run-test TEST="test/hotspot/jtreg/compiler" JTREG="VM_OPTIONS=-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting" has passed on our machine. Would you mind checking in your environment where the tier2 tests had failures with the KNLSetting? That would be nice. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22709#issuecomment-2568378211 From kvn at openjdk.org Thu Jan 2 22:19:42 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 2 Jan 2025 22:19:42 GMT Subject: RFR: 8346922: TestVectorReinterpret.java fails without the rvv extension on RISCV fastdebug VM In-Reply-To: References: Message-ID: On Thu, 2 Jan 2025 07:33:32 GMT, Gui Cao wrote: > Hi, TestVectorReinterpret.java fails without the rvv extension on RISCV fastdebug VM, on riscv platform need to have rvv extension to run it. > > ### Testing > - [x] Run TestVectorReinterpret.java tests on SOPHON SG2042 without rvv1.0 (fastdebug) > - [x] Run TestVectorReinterpret.java tests on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) > - [x] Run TestVectorReinterpret.java tests on aarch64 with neon > - [x] Run TestVectorReinterpret.java tests on Xeon(R) Platinum 8378A CPU test/hotspot/jtreg/compiler/vectorapi/reshape/TestVectorReinterpret.java line 43: > 41: * @summary Test that vector reinterpret intrinsics work as intended. > 42: * @requires (os.arch=="x86" | os.arch=="i386" | os.arch=="amd64" | os.arch=="x86_64" | os.arch=="aarch64" | os.arch == "ppc64" | os.arch == "ppc64le" | os.arch == "s390x") | > 43: * (os.arch == "riscv64" & vm.cpu.features ~= ".*rvv.*") Can it be as next?: * @requires os.arch != "riscv64" | vm.cpu.features ~= ".*rvv.*" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22901#discussion_r1901307105 From kvn at openjdk.org Thu Jan 2 22:21:55 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 2 Jan 2025 22:21:55 GMT Subject: RFR: 8346924: TestVectorizationNegativeScale.java fails without the rvv extension on RISCV fastdebug VM In-Reply-To: References: Message-ID: On Thu, 2 Jan 2025 07:38:49 GMT, Gui Cao wrote: > Hi, TestVectorizationNegativeScale.java fails without the rvv extension on RISCV fastdebug VM, on riscv platform need to have rvv extension to run it. > > ### Testing > - [x] Run TestVectorizationNegativeScale.java tests on SOPHON SG2042 without rvv1.0 (fastdebug) > - [x] Run TestVectorizationNegativeScale.java tests on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) > - [x] Run TestVectorizationNegativeScale.java tests on aarch64 with neon > - [x] Run TestVectorizationNegativeScale.java tests on Xeon(R) Platinum 8378A CPU test/hotspot/jtreg/compiler/vectorization/TestVectorizationNegativeScale.java line 29: > 27: * @summary [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs > 28: * @requires (os.arch=="x86" | os.arch=="i386" | os.arch=="amd64" | os.arch=="x86_64" | os.arch=="aarch64" | os.arch == "ppc64" | os.arch == "ppc64le" | os.arch == "s390x") | > 29: * (os.arch == "riscv64" & vm.cpu.features ~= ".*rvv.*") Can you try next: * @requires os.arch != "riscv64" | vm.cpu.features ~= ".*rvv.*" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22902#discussion_r1901308889 From gcao at openjdk.org Fri Jan 3 02:45:22 2025 From: gcao at openjdk.org (Gui Cao) Date: Fri, 3 Jan 2025 02:45:22 GMT Subject: RFR: 8346924: TestVectorizationNegativeScale.java fails without the rvv extension on RISCV fastdebug VM [v2] In-Reply-To: References: Message-ID: > Hi, TestVectorizationNegativeScale.java fails without the rvv extension on RISCV fastdebug VM, on riscv platform need to have rvv extension to run it. > > ### Testing > - [x] Run TestVectorizationNegativeScale.java tests on SOPHON SG2042 without rvv1.0 (fastdebug) > - [x] Run TestVectorizationNegativeScale.java tests on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) > - [x] Run TestVectorizationNegativeScale.java tests on aarch64 with neon > - [x] Run TestVectorizationNegativeScale.java tests on Xeon(R) Platinum 8378A CPU Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Update test requires ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22902/files - new: https://git.openjdk.org/jdk/pull/22902/files/17624372..a7b4b4e3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22902&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22902&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22902.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22902/head:pull/22902 PR: https://git.openjdk.org/jdk/pull/22902 From gcao at openjdk.org Fri Jan 3 02:46:54 2025 From: gcao at openjdk.org (Gui Cao) Date: Fri, 3 Jan 2025 02:46:54 GMT Subject: RFR: 8346922: TestVectorReinterpret.java fails without the rvv extension on RISCV fastdebug VM [v2] In-Reply-To: References: Message-ID: > Hi, TestVectorReinterpret.java fails without the rvv extension on RISCV fastdebug VM, on riscv platform need to have rvv extension to run it. > > ### Testing > - [x] Run TestVectorReinterpret.java tests on SOPHON SG2042 without rvv1.0 (fastdebug) > - [x] Run TestVectorReinterpret.java tests on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) > - [x] Run TestVectorReinterpret.java tests on aarch64 with neon > - [x] Run TestVectorReinterpret.java tests on Xeon(R) Platinum 8378A CPU Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Update test requires ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22901/files - new: https://git.openjdk.org/jdk/pull/22901/files/e48d42ba..5d0c3667 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22901&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22901&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22901.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22901/head:pull/22901 PR: https://git.openjdk.org/jdk/pull/22901 From gcao at openjdk.org Fri Jan 3 02:53:46 2025 From: gcao at openjdk.org (Gui Cao) Date: Fri, 3 Jan 2025 02:53:46 GMT Subject: RFR: 8346924: TestVectorizationNegativeScale.java fails without the rvv extension on RISCV fastdebug VM [v2] In-Reply-To: References: Message-ID: On Thu, 2 Jan 2025 22:18:32 GMT, Vladimir Kozlov wrote: > Can you try next: > > ``` > * @requires os.arch != "riscv64" | vm.cpu.features ~= ".*rvv.*" > ``` Thanks for your review. it was very helpful and I have fixed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22902#discussion_r1901431720 From gcao at openjdk.org Fri Jan 3 02:54:39 2025 From: gcao at openjdk.org (Gui Cao) Date: Fri, 3 Jan 2025 02:54:39 GMT Subject: RFR: 8346922: TestVectorReinterpret.java fails without the rvv extension on RISCV fastdebug VM [v2] In-Reply-To: References: Message-ID: <2p1Sz8z5Bt2HVG3-5XcrOaDvw4YcKJwhsSzNqIGZKxI=.9858aa0d-30e4-4cad-bfab-9deab3f5f123@github.com> On Thu, 2 Jan 2025 22:17:13 GMT, Vladimir Kozlov wrote: > Can it be as next?: > > ``` > * @requires os.arch != "riscv64" | vm.cpu.features ~= ".*rvv.*" > ``` Thanks for your review. it was very helpful and I have fixed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22901#discussion_r1901431918 From fyang at openjdk.org Fri Jan 3 03:06:34 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 3 Jan 2025 03:06:34 GMT Subject: RFR: 8346922: TestVectorReinterpret.java fails without the rvv extension on RISCV fastdebug VM [v2] In-Reply-To: References: Message-ID: On Fri, 3 Jan 2025 02:46:54 GMT, Gui Cao wrote: >> Hi, TestVectorReinterpret.java fails without the rvv extension on RISCV fastdebug VM, on riscv platform need to have rvv extension to run it. >> >> ### Testing >> - [x] Run TestVectorReinterpret.java tests on SOPHON SG2042 without rvv1.0 (fastdebug) >> - [x] Run TestVectorReinterpret.java tests on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) >> - [x] Run TestVectorReinterpret.java tests on aarch64 with neon >> - [x] Run TestVectorReinterpret.java tests on Xeon(R) Platinum 8378A CPU > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Update test requires LGTM. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22901#pullrequestreview-2528502687 From fyang at openjdk.org Fri Jan 3 03:07:34 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 3 Jan 2025 03:07:34 GMT Subject: RFR: 8346924: TestVectorizationNegativeScale.java fails without the rvv extension on RISCV fastdebug VM [v2] In-Reply-To: References: Message-ID: <4ccU6ay0Kvhnxz6K89xoYNx6vyLNq0tPKQQ5eDKK_J0=.d3fb46c8-4613-431b-a8a5-4b149bb89414@github.com> On Fri, 3 Jan 2025 02:45:22 GMT, Gui Cao wrote: >> Hi, TestVectorizationNegativeScale.java fails without the rvv extension on RISCV fastdebug VM, on riscv platform need to have rvv extension to run it. >> >> ### Testing >> - [x] Run TestVectorizationNegativeScale.java tests on SOPHON SG2042 without rvv1.0 (fastdebug) >> - [x] Run TestVectorizationNegativeScale.java tests on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) >> - [x] Run TestVectorizationNegativeScale.java tests on aarch64 with neon >> - [x] Run TestVectorizationNegativeScale.java tests on Xeon(R) Platinum 8378A CPU > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Update test requires LGTM. Thanks. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22902#pullrequestreview-2528502731 From epeter at openjdk.org Fri Jan 3 08:27:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 3 Jan 2025 08:27:54 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v4] In-Reply-To: References: Message-ID: > **This is a required step towards adding runtime-checks for Aliasing Analysis, especially Important for FFM / MemorySegments.** > > I know this one is large, but it consists of a lot of renamings, and new tests. On the whole, the new `VPointer` code is less than the old! > > **Goal** > > Replace old `VPointer` with a new version that relies on `MemPointer` - which then is a shared utility for both `MergeStores` and `SuperWord / AutoVectorization`. `MemPointer` generally parses pointers, and `VPointer` specializes this facility for the use in loops (`VLoop`). > > The old `VPointer` implementation with its recursive pattern matching was quite complicated and difficult to reason about for correctness. The approach in `MemPointer` is much simpler: iteratively decomposing sub-expressions. Further: the new implementation is more powerful at detecting equivalent invariants. > > **Future**: with the `MemPointer` implementation of `VPointer`, it should be easier to implement speculative runtime-checks for Aliasing-Analysis [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). The pressing need for this has come from the FFM / MemorySegment folks, like @mcimadamore and @minborg . > > **Details** > > This looks like a rather big patch, so let me explain the parts. > - Refactor of `MemPointer` in `mepointer.hpp/cpp`: > - Added concept of `Base` to `MemPointer`. This is required for the aliasing computation in `VPointer`. > - `sub_expression_has_native_base_candidate`: add special case to parse through `CastX2P` if we find a native memory base `MemorySegment.address()`, i.e. `jdk.internal.foreign.NativeMemorySegmentImpl.min`. This helps some native memory segment cases to vectorize that did not before. > - So far `MemPointer` could only answer adjacency queries. But VPointer also needs overlap queries, see the old `VPointer::not_equal` (i.e. can we prove that the two `VPointer` never overlap?). So I had to add a new case to aliasing computation: `NotOrAtDistance`. It is useful to answer the new and better named `MemPointer::never_overlaps_with`. > - Collapsed together `MemPointerDecomposedForm` and `MemPointer`. It was an unnecessary and unhelpful split. > - Re-write of `VPointer` based on `MemPointer`: > - Old pattern: > - `VPointer[mem: 847 StoreI, base: 37, adr: 37, base[ 37] + offset( 16) + invar( 0) + scale( 4) * iv]` > - `VPointer[mem: 3189 LoadB, base: 1, adr: 2273, base[ 1] + offset( 0) + invar( 0) + scale( 1) * iv]` -> `adr = CastX2P`, the a... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 116 commits: - copyright 2025 - Merge branch 'master' into JDK-8343685-VPointer-MemPointer - manual merge - fix printing - rename - fix up print - add TestEquivalentInvariants.java - improve documentation - hide parser via delegation - Merge branch 'master' into JDK-8343685-VPointer-MemPointer - ... and 106 more: https://git.openjdk.org/jdk/compare/84e6432b...b64f9295 ------------- Changes: https://git.openjdk.org/jdk/pull/21926/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21926&range=03 Stats: 4047 lines in 18 files changed: 1849 ins; 1536 del; 662 mod Patch: https://git.openjdk.org/jdk/pull/21926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21926/head:pull/21926 PR: https://git.openjdk.org/jdk/pull/21926 From epeter at openjdk.org Fri Jan 3 09:01:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 3 Jan 2025 09:01:57 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v7] In-Reply-To: References: Message-ID: On Mon, 23 Dec 2024 06:54:34 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 9. X86 backend implementation for all supported intrinsics. >> 10. Functional and Performance validation tests. >> >> Kindly review the patch and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Review comments resolutions src/hotspot/share/opto/type.cpp line 1465: > 1463: //------------------------------meet------------------------------------------- > 1464: // Compute the MEET of two types. It returns a new Type object. > 1465: const Type *TypeH::xmeet( const Type *t ) const { Suggestion: const Type* TypeH::xmeet( const Type* t ) const { Please check all other occurances. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1901578736 From kvn at openjdk.org Fri Jan 3 15:41:44 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 3 Jan 2025 15:41:44 GMT Subject: RFR: 8346922: TestVectorReinterpret.java fails without the rvv extension on RISCV fastdebug VM [v2] In-Reply-To: References: Message-ID: On Fri, 3 Jan 2025 02:46:54 GMT, Gui Cao wrote: >> Hi, TestVectorReinterpret.java fails without the rvv extension on RISCV fastdebug VM, on riscv platform need to have rvv extension to run it. >> >> ### Testing >> - [x] Run TestVectorReinterpret.java tests on SOPHON SG2042 without rvv1.0 (fastdebug) >> - [x] Run TestVectorReinterpret.java tests on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) >> - [x] Run TestVectorReinterpret.java tests on aarch64 with neon >> - [x] Run TestVectorReinterpret.java tests on Xeon(R) Platinum 8378A CPU > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Update test requires Good ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22901#pullrequestreview-2529295013 From kvn at openjdk.org Fri Jan 3 15:42:45 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 3 Jan 2025 15:42:45 GMT Subject: RFR: 8346924: TestVectorizationNegativeScale.java fails without the rvv extension on RISCV fastdebug VM [v2] In-Reply-To: References: Message-ID: <4MKmLj0HXoFvTxYtoY6o-UBo2H_IXPhweez2IuoGJMc=.948ee98f-8f5d-41b4-80c5-5ddb4c88022d@github.com> On Fri, 3 Jan 2025 02:45:22 GMT, Gui Cao wrote: >> Hi, TestVectorizationNegativeScale.java fails without the rvv extension on RISCV fastdebug VM, on riscv platform need to have rvv extension to run it. >> >> ### Testing >> - [x] Run TestVectorizationNegativeScale.java tests on SOPHON SG2042 without rvv1.0 (fastdebug) >> - [x] Run TestVectorizationNegativeScale.java tests on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) >> - [x] Run TestVectorizationNegativeScale.java tests on aarch64 with neon >> - [x] Run TestVectorizationNegativeScale.java tests on Xeon(R) Platinum 8378A CPU > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Update test requires Looks good ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22902#pullrequestreview-2529298548 From jbhateja at openjdk.org Fri Jan 3 20:36:21 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 3 Jan 2025 20:36:21 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v8] In-Reply-To: References: Message-ID: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 9. X86 backend implementation for all supported intrinsics. > 10. Functional and Performance validation tests. > > Kindly review the patch and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains nine additional commits since the last revision: - Updating copyright year of modified files. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342103 - Review suggestions incorporated. - Review comments resolutions - Addressing review comments - Fixing obfuscation due to intrinsic entries - Adding more test points - Adding missed check in container type detection. - C2 compiler support for float16 scalar operations. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22754/files - new: https://git.openjdk.org/jdk/pull/22754/files/dd444c44..d3cbf2c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22754&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22754&range=06-07 Stats: 17820 lines in 567 files changed: 13308 ins; 2583 del; 1929 mod Patch: https://git.openjdk.org/jdk/pull/22754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22754/head:pull/22754 PR: https://git.openjdk.org/jdk/pull/22754 From jbhateja at openjdk.org Fri Jan 3 20:42:15 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 3 Jan 2025 20:42:15 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v9] In-Reply-To: References: Message-ID: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 9. X86 backend implementation for all supported intrinsics. > 10. Functional and Performance validation tests. > > Kindly review the patch and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Updating copyright year of modified files. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22754/files - new: https://git.openjdk.org/jdk/pull/22754/files/d3cbf2c4..175f4ed2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22754&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22754&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22754/head:pull/22754 PR: https://git.openjdk.org/jdk/pull/22754 From qamai at openjdk.org Sat Jan 4 16:11:22 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 4 Jan 2025 16:11:22 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v31] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 42 commits: - copyright - Merge branch 'master' into unsignedbounds - Merge branch 'master' into unsignedbounds - move try_cast to Type - Merge branch 'master' into unsignedbounds - build failure - build failures - whitespace - further reviews - Merge branch 'master' into unsignedbounds - ... and 32 more: https://git.openjdk.org/jdk/compare/07c9f713...4d330142 ------------- Changes: https://git.openjdk.org/jdk/pull/17508/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=30 Stats: 2009 lines in 10 files changed: 1446 ins; 325 del; 238 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From syan at openjdk.org Sun Jan 5 08:35:38 2025 From: syan at openjdk.org (SendaoYan) Date: Sun, 5 Jan 2025 08:35:38 GMT Subject: RFR: 8343763: Aarch64: Gtest codestrings.validate_vm intermittent fails extra addr In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 13:43:10 GMT, SendaoYan wrote: > Hi all, > The `Gtest codestrings.validate_vm` intermittent fails with different disassembly symbol name, such as different symbol name with instruction `adrp`/`b` etc. I think the difference of symbol name is acceptable, this PR remove the releated symbol name to make the fragile disassemble identical compare more robustness. > The change has been verified locally, the gtest test run with 20k times all passed, except sometimes the subtest `ThreadsListHandle::sanity_vm` intermittent fails which has been recorded by [JDK-8315141](https://bugs.openjdk.org/browse/JDK-8315141). Test-fix only, no risk. Hi, can anyone take look this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21955#issuecomment-2571546424 From aph at openjdk.org Sun Jan 5 09:32:34 2025 From: aph at openjdk.org (Andrew Haley) Date: Sun, 5 Jan 2025 09:32:34 GMT Subject: RFR: 8343763: Aarch64: Gtest codestrings.validate_vm intermittent fails extra addr In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 13:43:10 GMT, SendaoYan wrote: > Hi all, > The `Gtest codestrings.validate_vm` intermittent fails with different disassembly symbol name, such as different symbol name with instruction `adrp`/`b` etc. I think the difference of symbol name is acceptable, this PR remove the releated symbol name to make the fragile disassemble identical compare more robustness. > The change has been verified locally, the gtest test run with 20k times all passed, except sometimes the subtest `ThreadsListHandle::sanity_vm` intermittent fails which has been recorded by [JDK-8315141](https://bugs.openjdk.org/browse/JDK-8315141). Test-fix only, no risk. test/hotspot/gtest/code/test_codestrings.cpp line 53: > 51: std::basic_string tmp6 = std::regex_replace(tmp5, std::regex("adrp[\\\\t\\s]+([wx][0-9]+).*"), "adrp $1 = "); > 52: std::basic_string tmp7 = std::regex_replace(tmp6, std::regex(", \\w+::\\w+.*"), ""); > 53: std::basic_string tmp8 = std::regex_replace(tmp7, std::regex("\\s+:\\s+udf\\t#0"), ""); It would help this reviewer if you added some commentary that explains what's going on. An example of the AArch64 output would be nice. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21955#discussion_r1903231295 From syan at openjdk.org Sun Jan 5 11:23:44 2025 From: syan at openjdk.org (SendaoYan) Date: Sun, 5 Jan 2025 11:23:44 GMT Subject: RFR: 8343763: Aarch64: Gtest codestrings.validate_vm intermittent fails extra addr In-Reply-To: References: Message-ID: <0uRt6-eRb3UmG9YF9H_9ZIwqhCqFHCPi3sqHqeWUPsM=.f7dd57dc-409f-46fe-9526-442d4a14d37b@github.com> On Sun, 5 Jan 2025 09:30:02 GMT, Andrew Haley wrote: >> Hi all, >> The `Gtest codestrings.validate_vm` intermittent fails with different disassembly symbol name, such as different symbol name with instruction `adrp`/`b` etc. I think the difference of symbol name is acceptable, this PR remove the releated symbol name to make the fragile disassemble identical compare more robustness. >> The change has been verified locally, the gtest test run with 20k times all passed, except sometimes the subtest `ThreadsListHandle::sanity_vm` intermittent fails which has been recorded by [JDK-8315141](https://bugs.openjdk.org/browse/JDK-8315141). Test-fix only, no risk. > > test/hotspot/gtest/code/test_codestrings.cpp line 53: > >> 51: std::basic_string tmp6 = std::regex_replace(tmp5, std::regex("adrp[\\\\t\\s]+([wx][0-9]+).*"), "adrp $1 = "); >> 52: std::basic_string tmp7 = std::regex_replace(tmp6, std::regex(", \\w+::\\w+.*"), ""); >> 53: std::basic_string tmp8 = std::regex_replace(tmp7, std::regex("\\s+:\\s+udf\\t#0"), ""); > > It would help this reviewer if you added some commentary that explains what's going on. An example of the AArch64 output would be nice. Okey, I will add some commentary later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21955#discussion_r1903249381 From gcao at openjdk.org Mon Jan 6 03:27:41 2025 From: gcao at openjdk.org (Gui Cao) Date: Mon, 6 Jan 2025 03:27:41 GMT Subject: RFR: 8346924: TestVectorizationNegativeScale.java fails without the rvv extension on RISCV fastdebug VM [v2] In-Reply-To: References: Message-ID: On Fri, 3 Jan 2025 02:45:22 GMT, Gui Cao wrote: >> Hi, TestVectorizationNegativeScale.java fails without the rvv extension on RISCV fastdebug VM, on riscv platform need to have rvv extension to run it. >> >> ### Testing >> - [x] Run TestVectorizationNegativeScale.java tests on SOPHON SG2042 without rvv1.0 (fastdebug) >> - [x] Run TestVectorizationNegativeScale.java tests on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) >> - [x] Run TestVectorizationNegativeScale.java tests on aarch64 with neon >> - [x] Run TestVectorizationNegativeScale.java tests on Xeon(R) Platinum 8378A CPU > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Update test requires Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22902#issuecomment-2572206724 From duke at openjdk.org Mon Jan 6 03:27:41 2025 From: duke at openjdk.org (duke) Date: Mon, 6 Jan 2025 03:27:41 GMT Subject: RFR: 8346924: TestVectorizationNegativeScale.java fails without the rvv extension on RISCV fastdebug VM [v2] In-Reply-To: References: Message-ID: On Fri, 3 Jan 2025 02:45:22 GMT, Gui Cao wrote: >> Hi, TestVectorizationNegativeScale.java fails without the rvv extension on RISCV fastdebug VM, on riscv platform need to have rvv extension to run it. >> >> ### Testing >> - [x] Run TestVectorizationNegativeScale.java tests on SOPHON SG2042 without rvv1.0 (fastdebug) >> - [x] Run TestVectorizationNegativeScale.java tests on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) >> - [x] Run TestVectorizationNegativeScale.java tests on aarch64 with neon >> - [x] Run TestVectorizationNegativeScale.java tests on Xeon(R) Platinum 8378A CPU > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Update test requires @zifeihan Your change (at version a7b4b4e3dd90af271115fbfe6ec58e5e30295693) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22902#issuecomment-2572207057 From gcao at openjdk.org Mon Jan 6 03:28:41 2025 From: gcao at openjdk.org (Gui Cao) Date: Mon, 6 Jan 2025 03:28:41 GMT Subject: RFR: 8346922: TestVectorReinterpret.java fails without the rvv extension on RISCV fastdebug VM [v2] In-Reply-To: References: Message-ID: On Fri, 3 Jan 2025 02:46:54 GMT, Gui Cao wrote: >> Hi, TestVectorReinterpret.java fails without the rvv extension on RISCV fastdebug VM, on riscv platform need to have rvv extension to run it. >> >> ### Testing >> - [x] Run TestVectorReinterpret.java tests on SOPHON SG2042 without rvv1.0 (fastdebug) >> - [x] Run TestVectorReinterpret.java tests on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) >> - [x] Run TestVectorReinterpret.java tests on aarch64 with neon >> - [x] Run TestVectorReinterpret.java tests on Xeon(R) Platinum 8378A CPU > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Update test requires Thanks all for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22901#issuecomment-2572207000 From duke at openjdk.org Mon Jan 6 03:28:41 2025 From: duke at openjdk.org (duke) Date: Mon, 6 Jan 2025 03:28:41 GMT Subject: RFR: 8346922: TestVectorReinterpret.java fails without the rvv extension on RISCV fastdebug VM [v2] In-Reply-To: References: Message-ID: <-tksOcRWfFkudaY04MqbU6XAZUIvfLROubIl6KYKi_o=.17d78b7d-f346-400d-9ba5-d0307d8b49d1@github.com> On Fri, 3 Jan 2025 02:46:54 GMT, Gui Cao wrote: >> Hi, TestVectorReinterpret.java fails without the rvv extension on RISCV fastdebug VM, on riscv platform need to have rvv extension to run it. >> >> ### Testing >> - [x] Run TestVectorReinterpret.java tests on SOPHON SG2042 without rvv1.0 (fastdebug) >> - [x] Run TestVectorReinterpret.java tests on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) >> - [x] Run TestVectorReinterpret.java tests on aarch64 with neon >> - [x] Run TestVectorReinterpret.java tests on Xeon(R) Platinum 8378A CPU > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Update test requires @zifeihan Your change (at version 5d0c3667e567808de775298fae02728adbae9b88) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22901#issuecomment-2572207912 From gcao at openjdk.org Mon Jan 6 03:38:39 2025 From: gcao at openjdk.org (Gui Cao) Date: Mon, 6 Jan 2025 03:38:39 GMT Subject: Integrated: 8346924: TestVectorizationNegativeScale.java fails without the rvv extension on RISCV fastdebug VM In-Reply-To: References: Message-ID: On Thu, 2 Jan 2025 07:38:49 GMT, Gui Cao wrote: > Hi, TestVectorizationNegativeScale.java fails without the rvv extension on RISCV fastdebug VM, on riscv platform need to have rvv extension to run it. > > ### Testing > - [x] Run TestVectorizationNegativeScale.java tests on SOPHON SG2042 without rvv1.0 (fastdebug) > - [x] Run TestVectorizationNegativeScale.java tests on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) > - [x] Run TestVectorizationNegativeScale.java tests on aarch64 with neon > - [x] Run TestVectorizationNegativeScale.java tests on Xeon(R) Platinum 8378A CPU This pull request has now been integrated. Changeset: ca5390c4 Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/ca5390c4d9a8744fbbfb0f378f7e31ac9486d0d6 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8346924: TestVectorizationNegativeScale.java fails without the rvv extension on RISCV fastdebug VM Reviewed-by: fyang, kvn ------------- PR: https://git.openjdk.org/jdk/pull/22902 From gcao at openjdk.org Mon Jan 6 03:38:49 2025 From: gcao at openjdk.org (Gui Cao) Date: Mon, 6 Jan 2025 03:38:49 GMT Subject: Integrated: 8346922: TestVectorReinterpret.java fails without the rvv extension on RISCV fastdebug VM In-Reply-To: References: Message-ID: <9ArN_4rTXHdjqzL_Z5rGcSK31glD0nI7IAyjuPc_-PA=.dc2bb185-a6c8-474f-9b17-a3d7f27a4f88@github.com> On Thu, 2 Jan 2025 07:33:32 GMT, Gui Cao wrote: > Hi, TestVectorReinterpret.java fails without the rvv extension on RISCV fastdebug VM, on riscv platform need to have rvv extension to run it. > > ### Testing > - [x] Run TestVectorReinterpret.java tests on SOPHON SG2042 without rvv1.0 (fastdebug) > - [x] Run TestVectorReinterpret.java tests on Banana Pi BPI-F3 board (with RVV1.0) (fastdebug) > - [x] Run TestVectorReinterpret.java tests on aarch64 with neon > - [x] Run TestVectorReinterpret.java tests on Xeon(R) Platinum 8378A CPU This pull request has now been integrated. Changeset: e98f4126 Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/e98f41266346aa676a3e764528806f2b82ec7e46 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8346922: TestVectorReinterpret.java fails without the rvv extension on RISCV fastdebug VM Reviewed-by: fyang, kvn ------------- PR: https://git.openjdk.org/jdk/pull/22901 From epeter at openjdk.org Mon Jan 6 07:44:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 6 Jan 2025 07:44:57 GMT Subject: RFR: 8346993: C2 SuperWord: refactor to make more vector nodes available in VectorNode::make Message-ID: <5TzH_HHTi_8QL2wh0BEGuoncesUI6cNjkBToW481iOs=.1767bf74-76e7-41dc-8972-584ee34e188f@github.com> Extracted from cost-model code https://github.com/openjdk/jdk/pull/20964. Currently, some nodes are only generated in their dedicated methods: - VectorNode::shift_count - LShiftCntVNode - RShiftCntVNode - VectorCastNode::make - Vector(U)Cast... - VectorBlendNode has no corresponding "factory" method. The goal is to have all available under VectorNode::make, so that they can be generated simply with the vector opcode. This is helpful for the plans with the cost-model, where the VTransform nodes will only carry the vector-opc, and I need to generate vectors for these vector-opc. ------------- Commit messages: - JDK-8340093 Changes: https://git.openjdk.org/jdk/pull/22917/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22917&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346993 Stats: 48 lines in 2 files changed: 22 ins; 15 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/22917.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22917/head:pull/22917 PR: https://git.openjdk.org/jdk/pull/22917 From epeter at openjdk.org Mon Jan 6 07:44:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 6 Jan 2025 07:44:57 GMT Subject: RFR: 8346993: C2 SuperWord: refactor to make more vector nodes available in VectorNode::make In-Reply-To: <5TzH_HHTi_8QL2wh0BEGuoncesUI6cNjkBToW481iOs=.1767bf74-76e7-41dc-8972-584ee34e188f@github.com> References: <5TzH_HHTi_8QL2wh0BEGuoncesUI6cNjkBToW481iOs=.1767bf74-76e7-41dc-8972-584ee34e188f@github.com> Message-ID: <8Bw8ZDdeQJaRC7SSsz78LcqGC5W-4MuIm6lc2KSZ5DA=.575cff5f-288e-47b8-9826-45edb5ffab6b@github.com> On Fri, 3 Jan 2025 15:44:02 GMT, Emanuel Peter wrote: > Extracted from cost-model code https://github.com/openjdk/jdk/pull/20964. > > Currently, some nodes are only generated in their dedicated methods: > - VectorNode::shift_count > - LShiftCntVNode > - RShiftCntVNode > - VectorCastNode::make > - Vector(U)Cast... > - VectorBlendNode has no corresponding "factory" method. > > The goal is to have all available under VectorNode::make, so that they can be generated simply with the vector opcode. This is helpful for the plans with the cost-model, where the VTransform nodes will only carry the vector-opc, and I need to generate vectors for these vector-opc. src/hotspot/share/opto/vectornode.cpp line 669: > 667: } > 668: > 669: // Make a vector node for unary or binary operation Note: there were already unary cases, this just fixes the comment. src/hotspot/share/opto/vectornode.cpp line 752: > 750: case Op_LShiftCntV: return new LShiftCntVNode(n1, vt); > 751: case Op_RShiftCntV: return new RShiftCntVNode(n1, vt); > 752: Note: used to be in `VectorNode::shift_count` src/hotspot/share/opto/vectornode.cpp line 783: > 781: case Op_VectorCastHF2F: return new VectorCastHF2FNode(n1, vt); > 782: case Op_VectorCastF2HF: return new VectorCastF2HFNode(n1, vt); > 783: Note: used to be in `VectorCastNode::make` src/hotspot/share/opto/vectornode.cpp line 856: > 854: default: > 855: fatal("Node class '%s' is not supported for shift count", NodeClassNames[opc]); > 856: return -1; Note: splitting the old `VectorNode::shift_count` (scalar opc -> L/RShiftCntVNode). - Created new `VectorNode::shift_count_opcode` (scalar opc -> vector shift cnt opc) - Moved node creation to `VectorNode::make` (vector shift cnt opc -> L/RShiftCntVNode) I need this for later VTransform changes, where I need to do the vector-opc determination in one stage (VTransform build), and then the vector-node creation in a later stage (VTransform apply). src/hotspot/share/opto/vectornode.cpp line 1437: > 1435: VectorNode* VectorCastNode::make(int vopc, Node* n1, BasicType bt, uint vlen) { > 1436: const TypeVect* vt = TypeVect::make(bt, vlen); > 1437: return VectorNode::make(vopc, n1, nullptr, vt); Note: the code is moved to `VectorNode::make` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22917#discussion_r1901918030 PR Review Comment: https://git.openjdk.org/jdk/pull/22917#discussion_r1901918317 PR Review Comment: https://git.openjdk.org/jdk/pull/22917#discussion_r1901918499 PR Review Comment: https://git.openjdk.org/jdk/pull/22917#discussion_r1901922127 PR Review Comment: https://git.openjdk.org/jdk/pull/22917#discussion_r1901922545 From dfenacci at openjdk.org Mon Jan 6 08:06:35 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 6 Jan 2025 08:06:35 GMT Subject: RFR: 8346787: Fix two C2 IR matching tests for RISC-V In-Reply-To: References: Message-ID: On Tue, 24 Dec 2024 05:23:42 GMT, Fei Yang wrote: > Two IR matching tests added by [JDK-8332268](https://bugs.openjdk.org/browse/JDK-8332268) are failing on RISC-V: > TEST: compiler/c2/irTests/ModINodeIdealizationTests.java > TEST: compiler/c2/irTests/ModLNodeIdealizationTests.java > > These two tests require conditional move support. See ModLNode::Ideal & ModLNode::Ideal [1][2]. But RISC-V base ISA (RV64GCV) does not support conditional move, so we set `ConditionalMoveLimit` to 0 for this CPU platform. This change simply skips these two tests for now. > > Some further information: > An initial version of conditional move based on RISC-V `Zicond` extension has been added by: [JDK-8344306](https://bugs.openjdk.org/browse/JDK-8344306). But that still lacks performance tunning and we will reconsider and adjust this `ConditionalMoveLimit` parameter as we go. New issue: [JDK-8346786](https://bugs.openjdk.org/browse/JDK-8346786). > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L993 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L1253 This looks good as temporary fix. Thanks @RealFYang (you might just want to change the copyright date: it's that time of the year again ?) ------------- PR Review: https://git.openjdk.org/jdk/pull/22874#pullrequestreview-2531562276 From chagedorn at openjdk.org Mon Jan 6 08:11:40 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 6 Jan 2025 08:11:40 GMT Subject: RFR: 8346993: C2 SuperWord: refactor to make more vector nodes available in VectorNode::make In-Reply-To: <5TzH_HHTi_8QL2wh0BEGuoncesUI6cNjkBToW481iOs=.1767bf74-76e7-41dc-8972-584ee34e188f@github.com> References: <5TzH_HHTi_8QL2wh0BEGuoncesUI6cNjkBToW481iOs=.1767bf74-76e7-41dc-8972-584ee34e188f@github.com> Message-ID: On Fri, 3 Jan 2025 15:44:02 GMT, Emanuel Peter wrote: > Extracted from cost-model code https://github.com/openjdk/jdk/pull/20964. > > Currently, some nodes are only generated in their dedicated methods: > - VectorNode::shift_count > - LShiftCntVNode > - RShiftCntVNode > - VectorCastNode::make > - Vector(U)Cast... > - VectorBlendNode has no corresponding "factory" method. > > The goal is to have all available under VectorNode::make, so that they can be generated simply with the vector opcode. This is helpful for the plans with the cost-model, where the VTransform nodes will only carry the vector-opc, and I need to generate vectors for these vector-opc. Looks good to me. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22917#pullrequestreview-2531569151 From christian.hagedorn at oracle.com Mon Jan 6 09:37:03 2025 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Mon, 6 Jan 2025 10:37:03 +0100 Subject: Bad assert in OuterStripMinedLoopNode::transform_to_counted_loop? In-Reply-To: References: Message-ID: <8a00acc1-499b-4b26-84c3-f9228e46d36f@oracle.com> Hi Matthias Thanks for your report! I was able to reproduce this and trace it back to JDK-8343394 [1]. But this only changed the underlying memory segment Java code. So, this must have revealed an existing issue. I filed JDK-8347040 [2] for this bug. Best regards, Christian [1] https://bugs.openjdk.org/browse/JDK-8343394 [2] https://bugs.openjdk.org/browse/JDK-8347040 On 02.01.25 17:11, Matthias Ernst wrote: > In fact, the assertion can be triggered by just this code: > > import static java.lang.foreign.ValueLayout.JAVA_LONG; > > import java.lang.foreign.Arena; > import java.lang.foreign.MemorySegment; > > public class Repro { > ? static final int COUNT = 100000; > ? static final MemorySegment segment = Arena.global().allocate(JAVA_LONG, COUNT); > > ? public static void main(String[] args) { > ? ? var i = 0; > ? ? var j = 0; > ? ? while (i < COUNT) { > ? ? ? segment.setAtIndex(JAVA_LONG, i++, 0); > ? ? ? segment.setAtIndex(JAVA_LONG, j++, 0); > ? ? } > ? } > } > > > > On Tue, Dec 31, 2024 at 13:26 Matthias Ernst > wrote: > > Hi, I've come across an assertion tripping in JDK debug builds (24, latest, > not 23) in combination with new foreign memory apis. Product builds seem to > be functioning ok: > > |# Internal Error (src/hotspot/share/opto/loopnode.cpp:3196), pid=27, tid=44 > # Error: assert(!loop->_body.contains(in)) failed | > > > I've been able to narrow it down somewhat, repro code/data as well as crash > logs can be found here:?https://github.com/mernst-github/repro/tree/main/ > loopnode-assertion loopnode-assertion> (this was originally a standard `quicksort` on top of a > MemorySegment). > > I have not been able to discover anything more systematic. Small changes to > the input data, or using a heap array instead of a MemorySegment make the > issue go away. > > Cheers > Matthias > (PS: lmk if this is not an opportune place to report such an issue) > From shade at openjdk.org Mon Jan 6 09:39:15 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 6 Jan 2025 09:39:15 GMT Subject: RFR: 8346264: "Total compile time" counter should include time spent in failing/bailout compiles [v3] In-Reply-To: References: Message-ID: > Noticed this when looking through JMH compiler profiler results. > > Current `CompilerBroker` counters that are fed into `CompilationMXBean.getTotalCompilationTime()` and JFR `CompilerStatistics` only records the time for successful compilations. If we take a while in compilation and then fail/bail, that time would not be accounted for. > > While this seems to be a long-standing behavior, there are problems with this: > 1. This is not what "total" means. > 2. This gives us a blind spot in measuring time taken in failing/bailing compilations. > 3. It does not match well the Javadoc for `CompilationMXBean.getTotalCompilationTime()`: "Returns the approximate accumulated elapsed time (in milliseconds) spent in compilation." -- since the time spent in failing/bailing compilation is still time spent in compilation. > > Additional testing: > - [x] Linux x86_64 server release, `jdk/jfr java/lang/management` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8346264-compmxbean-account-all - Update comment - Fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22760/files - new: https://git.openjdk.org/jdk/pull/22760/files/67a11988..2ed82a16 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22760&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22760&range=01-02 Stats: 20004 lines in 630 files changed: 14939 ins; 2945 del; 2120 mod Patch: https://git.openjdk.org/jdk/pull/22760.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22760/head:pull/22760 PR: https://git.openjdk.org/jdk/pull/22760 From shade at openjdk.org Mon Jan 6 09:39:15 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 6 Jan 2025 09:39:15 GMT Subject: RFR: 8346264: "Total compile time" counter should include time spent in failing/bailout compiles [v2] In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 11:25:50 GMT, Aleksey Shipilev wrote: >> Noticed this when looking through JMH compiler profiler results. >> >> Current `CompilerBroker` counters that are fed into `CompilationMXBean.getTotalCompilationTime()` and JFR `CompilerStatistics` only records the time for successful compilations. If we take a while in compilation and then fail/bail, that time would not be accounted for. >> >> While this seems to be a long-standing behavior, there are problems with this: >> 1. This is not what "total" means. >> 2. This gives us a blind spot in measuring time taken in failing/bailing compilations. >> 3. It does not match well the Javadoc for `CompilationMXBean.getTotalCompilationTime()`: "Returns the approximate accumulated elapsed time (in milliseconds) spent in compilation." -- since the time spent in failing/bailing compilation is still time spent in compilation. >> >> Additional testing: >> - [x] Linux x86_64 server release, `jdk/jfr java/lang/management` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Update comment Happy new year! I will respin GHA testing for this and then integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22760#issuecomment-2572726838 From chagedorn at openjdk.org Mon Jan 6 10:08:13 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 6 Jan 2025 10:08:13 GMT Subject: RFR: 8344035: Replace predicate walking code in Loop Unswitching with a predicate visitor [v2] In-Reply-To: References: Message-ID: > This patch is a follow up to the clean-ups done with [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945) and introduces a new predicate visitor for Loop Unswitching to update the last remaining custom predicate cloning code. > > This patch includes the following: > > - New `CloneUnswitchedLoopPredicatesVisitor` class which delegates the cloning work to a new `ClonePredicateToTargetLoop` class. > - We walk the predicate chain in the `PredicateIterator` and call the `CloneUnswitchedLoopPredicatesVisitor` for each visited predicate. Then we clone the predicate on the fly to the target loop. > - New `ClonePredicateToTargetLoop` class: > - Clones Parse Predicates > - Clones Template Assertion Predicates > - Includes rewiring of control dependent data nodes > - Rewires the cloned predicates to the target loop with new `TargetLoopPredicateChain` class: > - Keeps track of the current chain head, which is the target loop itself when the chain is still empty. > - Each time a new predicate is inserted at the target loop, the old predicate chain head is set as output of the new predicate. > - An example is shown as class comment at `TargetLoopPredicateChain`. > - I plan to reuse this class later again when also updating `CreateAssertionPredicatesVisitor` which is done when we tackle the actual still remaining Assertion Predicate bugs. > - Removal of custom predicate cloning code found in `PhaseIdealLoop`. > - Changed steps performed in Loop Unswitching from: > 1. Clone loop > 2. Clone predicates and insert them below the unswitched loop selector If projections > 3. Connect the cloned predicates to the unswitched loops > > to: > > 1. Clone loop > 2. Connect unswitched loop selector If projections to unswitched loops such that they are now the new loop entries > 3. Clone predicates and insert them between the unswitched loop selector If projections and the unswitched loops > - Rename/update `get_template_assertion_predicates()`/`TemplateAssertionPredicateCollector` to reflect the only use left. > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into JDK-8344035 - 8344035: Replace predicate walking code in Loop Unswitching with a predicate visitor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22828/files - new: https://git.openjdk.org/jdk/pull/22828/files/9cc2db29..acc729f9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22828&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22828&range=00-01 Stats: 17581 lines in 483 files changed: 13245 ins; 2540 del; 1796 mod Patch: https://git.openjdk.org/jdk/pull/22828.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22828/head:pull/22828 PR: https://git.openjdk.org/jdk/pull/22828 From fyang at openjdk.org Mon Jan 6 11:03:09 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 6 Jan 2025 11:03:09 GMT Subject: RFR: 8346787: Fix two C2 IR matching tests for RISC-V [v2] In-Reply-To: References: Message-ID: > Two IR matching tests added by [JDK-8332268](https://bugs.openjdk.org/browse/JDK-8332268) are failing on RISC-V: > TEST: compiler/c2/irTests/ModINodeIdealizationTests.java > TEST: compiler/c2/irTests/ModLNodeIdealizationTests.java > > These two tests require conditional move support. See ModLNode::Ideal & ModLNode::Ideal [1][2]. But RISC-V base ISA (RV64GCV) does not support conditional move, so we set `ConditionalMoveLimit` to 0 for this CPU platform. This change simply skips these two tests for now. > > Some further information: > An initial version of conditional move based on RISC-V `Zicond` extension has been added by: [JDK-8344306](https://bugs.openjdk.org/browse/JDK-8344306). But that still lacks performance tunning and we will reconsider and adjust this `ConditionalMoveLimit` parameter as we go. New issue: [JDK-8346786](https://bugs.openjdk.org/browse/JDK-8346786). > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L993 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L1253 Fei Yang has updated the pull request incrementally with one additional commit since the last revision: Comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22874/files - new: https://git.openjdk.org/jdk/pull/22874/files/3f175370..27884e81 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22874&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22874&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22874.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22874/head:pull/22874 PR: https://git.openjdk.org/jdk/pull/22874 From fyang at openjdk.org Mon Jan 6 11:03:09 2025 From: fyang at openjdk.org (Fei Yang) Date: Mon, 6 Jan 2025 11:03:09 GMT Subject: RFR: 8346787: Fix two C2 IR matching tests for RISC-V [v2] In-Reply-To: References: Message-ID: On Mon, 6 Jan 2025 08:03:39 GMT, Damon Fenacci wrote: > This looks good as temporary fix. Thanks @RealFYang (you might just want to change the copyright date: it's that time of the year again ?) Thanks for having a look! I have updated the copyright year for both of them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22874#issuecomment-2572872506 From dfenacci at openjdk.org Mon Jan 6 11:54:34 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 6 Jan 2025 11:54:34 GMT Subject: RFR: 8346787: Fix two C2 IR matching tests for RISC-V [v2] In-Reply-To: References: Message-ID: On Mon, 6 Jan 2025 11:03:09 GMT, Fei Yang wrote: >> Two IR matching tests added by [JDK-8332268](https://bugs.openjdk.org/browse/JDK-8332268) are failing on RISC-V: >> TEST: compiler/c2/irTests/ModINodeIdealizationTests.java >> TEST: compiler/c2/irTests/ModLNodeIdealizationTests.java >> >> These two tests require conditional move support. See ModLNode::Ideal & ModLNode::Ideal [1][2]. But RISC-V base ISA (RV64GCV) does not support conditional move, so we set `ConditionalMoveLimit` to 0 for this CPU platform. This change simply skips these two tests for now. >> >> Some further information: >> An initial version of conditional move based on RISC-V `Zicond` extension has been added by: [JDK-8344306](https://bugs.openjdk.org/browse/JDK-8344306). But that still lacks performance tunning and we will reconsider and adjust this `ConditionalMoveLimit` parameter as we go. New issue: [JDK-8346786](https://bugs.openjdk.org/browse/JDK-8346786). >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L993 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L1253 > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Comment Marked as reviewed by dfenacci (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22874#pullrequestreview-2531983317 From syan at openjdk.org Mon Jan 6 12:45:43 2025 From: syan at openjdk.org (SendaoYan) Date: Mon, 6 Jan 2025 12:45:43 GMT Subject: RFR: 8346965: Test compiler/ciReplay/TestInlining.java fails with -XX:+SegmentedCodeCache Message-ID: <25ygZomJjdSYgcc5KcxtosY_836TVfXdA9rlehJRTrs=.80e764ac-6fea-4b8f-bbb8-bf37a9d23ffb@github.com> Hi all, There are 4 tests fails run with JVM option '-XX:+SegmentedCodeCache'. JVM option '-XX:ReservedCodeCacheSize=4m' [inside test](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/ciReplay/CiReplayBase.java#L68) conflict with JVM option '-XX:+SegmentedCodeCache' which pass from outside test. This PR add '-XX:-SegmentedCodeCache' explicitly inside test to make test run success whenever received '-XX:+SegmentedCodeCache' outside or not. Change has been verified locally, test-fix only, make test more robustness, no risk. 'SegmentedCodeCache' is enable by default > java -XX:+PrintFlagsFinal 2>&1 | grep -P "(SegmentedCodeCache)|(ReservedCodeCacheSize)" uintx ReservedCodeCacheSize = 251658240 {pd product} {ergonomic} bool SegmentedCodeCache = true {product} {ergonomic} Pass '-XX:ReservedCodeCacheSize=4m' to JVM will automatic disable `SegmentedCodeCache` > java -XX:ReservedCodeCacheSize=4m -XX:+PrintFlagsFinal 2>&1 | grep -P "(SegmentedCodeCache)|(ReservedCodeCacheSize)" uintx ReservedCodeCacheSize = 4194304 {pd product} {command line} bool SegmentedCodeCache = false {product} {default} '-XX:+SegmentedCodeCache' conflict with '-XX:ReservedCodeCacheSize=4m' > java -XX:ReservedCodeCacheSize=4m -XX:+SegmentedCodeCache -version Error occurred during initialization of VM Invalid code heap sizes: NonNMethodCodeHeapSize (8006K) + ProfiledCodeHeapSize (4K) + NonProfiledCodeHeapSize (4K) = 8014K is greater than ReservedCodeCacheSize (4096K). ------------- Commit messages: - 8346965: Test compiler/ciReplay/TestInlining.java fails with -XX:+SegmentedCodeCache Changes: https://git.openjdk.org/jdk/pull/22926/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22926&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346965 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22926/head:pull/22926 PR: https://git.openjdk.org/jdk/pull/22926 From dfenacci at openjdk.org Mon Jan 6 14:36:34 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 6 Jan 2025 14:36:34 GMT Subject: RFR: 8346965: Test compiler/ciReplay/TestInlining.java fails with -XX:+SegmentedCodeCache In-Reply-To: <25ygZomJjdSYgcc5KcxtosY_836TVfXdA9rlehJRTrs=.80e764ac-6fea-4b8f-bbb8-bf37a9d23ffb@github.com> References: <25ygZomJjdSYgcc5KcxtosY_836TVfXdA9rlehJRTrs=.80e764ac-6fea-4b8f-bbb8-bf37a9d23ffb@github.com> Message-ID: <1nVShr7KjkuLaCf1kIt2VG2l46W0smxzpQu2-vRTDPA=.1cfbffee-4803-4deb-a21e-ccac75227024@github.com> On Mon, 6 Jan 2025 12:41:13 GMT, SendaoYan wrote: > Hi all, > There are 4 tests fails run with JVM option '-XX:+SegmentedCodeCache'. JVM option '-XX:ReservedCodeCacheSize=4m' [inside test](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/ciReplay/CiReplayBase.java#L68) conflict with JVM option '-XX:+SegmentedCodeCache' which pass from outside test. This PR add '-XX:-SegmentedCodeCache' explicitly inside test to make test run success whenever received '-XX:+SegmentedCodeCache' outside or not. Change has been verified locally, test-fix only, make test more robustness, no risk. > > 'SegmentedCodeCache' is enable by default > >> java -XX:+PrintFlagsFinal 2>&1 | grep -P "(SegmentedCodeCache)|(ReservedCodeCacheSize)" > uintx ReservedCodeCacheSize = 251658240 {pd product} {ergonomic} > bool SegmentedCodeCache = true {product} {ergonomic} > > > Pass '-XX:ReservedCodeCacheSize=4m' to JVM will automatic disable `SegmentedCodeCache` > >> java -XX:ReservedCodeCacheSize=4m -XX:+PrintFlagsFinal 2>&1 | grep -P "(SegmentedCodeCache)|(ReservedCodeCacheSize)" > uintx ReservedCodeCacheSize = 4194304 {pd product} {command line} > bool SegmentedCodeCache = false {product} {default} > > > '-XX:+SegmentedCodeCache' conflict with '-XX:ReservedCodeCacheSize=4m' > >> java -XX:ReservedCodeCacheSize=4m -XX:+SegmentedCodeCache -version > Error occurred during initialization of VM > Invalid code heap sizes: NonNMethodCodeHeapSize (8006K) + ProfiledCodeHeapSize (4K) + NonProfiledCodeHeapSize (4K) = 8014K is greater than ReservedCodeCacheSize (4096K). Also, as the failing tests are 4, it might be a good idea to update the title. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22926#issuecomment-2573238287 From syan at openjdk.org Mon Jan 6 15:15:34 2025 From: syan at openjdk.org (SendaoYan) Date: Mon, 6 Jan 2025 15:15:34 GMT Subject: RFR: 8346965: Multiple compiler/ciReplay test fails with -XX:+SegmentedCodeCache In-Reply-To: <1nVShr7KjkuLaCf1kIt2VG2l46W0smxzpQu2-vRTDPA=.1cfbffee-4803-4deb-a21e-ccac75227024@github.com> References: <25ygZomJjdSYgcc5KcxtosY_836TVfXdA9rlehJRTrs=.80e764ac-6fea-4b8f-bbb8-bf37a9d23ffb@github.com> <1nVShr7KjkuLaCf1kIt2VG2l46W0smxzpQu2-vRTDPA=.1cfbffee-4803-4deb-a21e-ccac75227024@github.com> Message-ID: <6fNfWjlnufG2HgukGakRB2LHhThPNHTXMEp-pTl73dQ=.ffc948c9-8275-48ff-b249-851ca8883384@github.com> On Mon, 6 Jan 2025 14:34:21 GMT, Damon Fenacci wrote: > Also, as the failing tests are 4, it might be a good idea to update the title. Okay, the title has been updated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22926#issuecomment-2573315897 From shade at openjdk.org Mon Jan 6 15:39:44 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 6 Jan 2025 15:39:44 GMT Subject: Integrated: 8346264: "Total compile time" counter should include time spent in failing/bailout compiles In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 10:56:48 GMT, Aleksey Shipilev wrote: > Noticed this when looking through JMH compiler profiler results. > > Current `CompilerBroker` counters that are fed into `CompilationMXBean.getTotalCompilationTime()` and JFR `CompilerStatistics` only records the time for successful compilations. If we take a while in compilation and then fail/bail, that time would not be accounted for. > > While this seems to be a long-standing behavior, there are problems with this: > 1. This is not what "total" means. > 2. This gives us a blind spot in measuring time taken in failing/bailing compilations. > 3. It does not match well the Javadoc for `CompilationMXBean.getTotalCompilationTime()`: "Returns the approximate accumulated elapsed time (in milliseconds) spent in compilation." -- since the time spent in failing/bailing compilation is still time spent in compilation. > > Additional testing: > - [x] Linux x86_64 server release, `jdk/jfr java/lang/management` This pull request has now been integrated. Changeset: 12700cb8 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/12700cb81bdfc006bcd228e43b509b8810af6549 Stats: 11 lines in 1 file changed: 5 ins; 6 del; 0 mod 8346264: "Total compile time" counter should include time spent in failing/bailout compiles Reviewed-by: kvn, mli ------------- PR: https://git.openjdk.org/jdk/pull/22760 From kvn at openjdk.org Mon Jan 6 17:35:36 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 6 Jan 2025 17:35:36 GMT Subject: RFR: 8346993: C2 SuperWord: refactor to make more vector nodes available in VectorNode::make In-Reply-To: <5TzH_HHTi_8QL2wh0BEGuoncesUI6cNjkBToW481iOs=.1767bf74-76e7-41dc-8972-584ee34e188f@github.com> References: <5TzH_HHTi_8QL2wh0BEGuoncesUI6cNjkBToW481iOs=.1767bf74-76e7-41dc-8972-584ee34e188f@github.com> Message-ID: On Fri, 3 Jan 2025 15:44:02 GMT, Emanuel Peter wrote: > Extracted from cost-model code https://github.com/openjdk/jdk/pull/20964. > > Currently, some nodes are only generated in their dedicated methods: > - VectorNode::shift_count > - LShiftCntVNode > - RShiftCntVNode > - VectorCastNode::make > - Vector(U)Cast... > - VectorBlendNode has no corresponding "factory" method. > > The goal is to have all available under VectorNode::make, so that they can be generated simply with the vector opcode. This is helpful for the plans with the cost-model, where the VTransform nodes will only carry the vector-opc, and I need to generate vectors for these vector-opc. Looks good to me too. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22917#pullrequestreview-2532636710 From kvn at openjdk.org Mon Jan 6 18:05:37 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 6 Jan 2025 18:05:37 GMT Subject: RFR: 8346965: Multiple compiler/ciReplay test fails with -XX:+SegmentedCodeCache In-Reply-To: <25ygZomJjdSYgcc5KcxtosY_836TVfXdA9rlehJRTrs=.80e764ac-6fea-4b8f-bbb8-bf37a9d23ffb@github.com> References: <25ygZomJjdSYgcc5KcxtosY_836TVfXdA9rlehJRTrs=.80e764ac-6fea-4b8f-bbb8-bf37a9d23ffb@github.com> Message-ID: <9DKY4bhfcxdPkhUMZq05ack0A0A7x75VvZ2HpG3NkuA=.ff666109-0205-4a1e-b8f2-9afcca1b5b84@github.com> On Mon, 6 Jan 2025 12:41:13 GMT, SendaoYan wrote: > Hi all, > There are 4 tests fails run with JVM option '-XX:+SegmentedCodeCache'. JVM option '-XX:ReservedCodeCacheSize=4m' [inside test](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/ciReplay/CiReplayBase.java#L68) conflict with JVM option '-XX:+SegmentedCodeCache' which pass from outside test. This PR add '-XX:-SegmentedCodeCache' explicitly inside test to make test run success whenever received '-XX:+SegmentedCodeCache' outside or not. Change has been verified locally, test-fix only, make test more robustness, no risk. > > 'SegmentedCodeCache' is enable by default > >> java -XX:+PrintFlagsFinal 2>&1 | grep -P "(SegmentedCodeCache)|(ReservedCodeCacheSize)" > uintx ReservedCodeCacheSize = 251658240 {pd product} {ergonomic} > bool SegmentedCodeCache = true {product} {ergonomic} > > > Pass '-XX:ReservedCodeCacheSize=4m' to JVM will automatic disable `SegmentedCodeCache` > >> java -XX:ReservedCodeCacheSize=4m -XX:+PrintFlagsFinal 2>&1 | grep -P "(SegmentedCodeCache)|(ReservedCodeCacheSize)" > uintx ReservedCodeCacheSize = 4194304 {pd product} {command line} > bool SegmentedCodeCache = false {product} {default} > > > '-XX:+SegmentedCodeCache' conflict with '-XX:ReservedCodeCacheSize=4m' > >> java -XX:ReservedCodeCacheSize=4m -XX:+SegmentedCodeCache -version > Error occurred during initialization of VM > Invalid code heap sizes: NonNMethodCodeHeapSize (8006K) + ProfiledCodeHeapSize (4K) + NonProfiledCodeHeapSize (4K) = 8014K is greater than ReservedCodeCacheSize (4096K). Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22926#pullrequestreview-2532686600 From fjiang at openjdk.org Tue Jan 7 00:48:36 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Tue, 7 Jan 2025 00:48:36 GMT Subject: RFR: 8346787: Fix two C2 IR matching tests for RISC-V [v2] In-Reply-To: References: Message-ID: On Mon, 6 Jan 2025 11:03:09 GMT, Fei Yang wrote: >> Two IR matching tests added by [JDK-8332268](https://bugs.openjdk.org/browse/JDK-8332268) are failing on RISC-V: >> TEST: compiler/c2/irTests/ModINodeIdealizationTests.java >> TEST: compiler/c2/irTests/ModLNodeIdealizationTests.java >> >> These two tests require conditional move support. See ModLNode::Ideal & ModLNode::Ideal [1][2]. But RISC-V base ISA (RV64GCV) does not support conditional move, so we set `ConditionalMoveLimit` to 0 for this CPU platform. This change simply skips these two tests for now. >> >> Some further information: >> An initial version of conditional move based on RISC-V `Zicond` extension has been added by: [JDK-8344306](https://bugs.openjdk.org/browse/JDK-8344306). But that still lacks performance tunning and we will reconsider and adjust this `ConditionalMoveLimit` parameter as we go. New issue: [JDK-8346786](https://bugs.openjdk.org/browse/JDK-8346786). >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L993 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L1253 > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Comment Looks good? ------------- Marked as reviewed by fjiang (Committer). PR Review: https://git.openjdk.org/jdk/pull/22874#pullrequestreview-2533197980 From fyang at openjdk.org Tue Jan 7 00:53:35 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 7 Jan 2025 00:53:35 GMT Subject: RFR: 8346787: Fix two C2 IR matching tests for RISC-V [v2] In-Reply-To: References: Message-ID: On Mon, 6 Jan 2025 11:51:57 GMT, Damon Fenacci wrote: >> Fei Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> Comment > > Marked as reviewed by dfenacci (Committer). @dafedafe @feilongjiang : Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22874#issuecomment-2574185274 From syan at openjdk.org Tue Jan 7 01:14:39 2025 From: syan at openjdk.org (SendaoYan) Date: Tue, 7 Jan 2025 01:14:39 GMT Subject: RFR: 8346965: Multiple compiler/ciReplay test fails with -XX:+SegmentedCodeCache In-Reply-To: <25ygZomJjdSYgcc5KcxtosY_836TVfXdA9rlehJRTrs=.80e764ac-6fea-4b8f-bbb8-bf37a9d23ffb@github.com> References: <25ygZomJjdSYgcc5KcxtosY_836TVfXdA9rlehJRTrs=.80e764ac-6fea-4b8f-bbb8-bf37a9d23ffb@github.com> Message-ID: On Mon, 6 Jan 2025 12:41:13 GMT, SendaoYan wrote: > Hi all, > There are 4 tests fails run with JVM option '-XX:+SegmentedCodeCache'. JVM option '-XX:ReservedCodeCacheSize=4m' [inside test](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/ciReplay/CiReplayBase.java#L68) conflict with JVM option '-XX:+SegmentedCodeCache' which pass from outside test. This PR add '-XX:-SegmentedCodeCache' explicitly inside test to make test run success whenever received '-XX:+SegmentedCodeCache' outside or not. Change has been verified locally, test-fix only, make test more robustness, no risk. > > 'SegmentedCodeCache' is enable by default > >> java -XX:+PrintFlagsFinal 2>&1 | grep -P "(SegmentedCodeCache)|(ReservedCodeCacheSize)" > uintx ReservedCodeCacheSize = 251658240 {pd product} {ergonomic} > bool SegmentedCodeCache = true {product} {ergonomic} > > > Pass '-XX:ReservedCodeCacheSize=4m' to JVM will automatic disable `SegmentedCodeCache` > >> java -XX:ReservedCodeCacheSize=4m -XX:+PrintFlagsFinal 2>&1 | grep -P "(SegmentedCodeCache)|(ReservedCodeCacheSize)" > uintx ReservedCodeCacheSize = 4194304 {pd product} {command line} > bool SegmentedCodeCache = false {product} {default} > > > '-XX:+SegmentedCodeCache' conflict with '-XX:ReservedCodeCacheSize=4m' > >> java -XX:ReservedCodeCacheSize=4m -XX:+SegmentedCodeCache -version > Error occurred during initialization of VM > Invalid code heap sizes: NonNMethodCodeHeapSize (8006K) + ProfiledCodeHeapSize (4K) + NonProfiledCodeHeapSize (4K) = 8014K is greater than ReservedCodeCacheSize (4096K). GHA report 1 failure: 1. 'macos-x64 / test - Test (tier1)' job fails 'GitHub Actions 11 lost communication with the server.', it seems like environmental issue, it's unrelated to this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22926#issuecomment-2574205150 From epeter at openjdk.org Tue Jan 7 06:18:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 Jan 2025 06:18:45 GMT Subject: RFR: 8346993: C2 SuperWord: refactor to make more vector nodes available in VectorNode::make In-Reply-To: References: <5TzH_HHTi_8QL2wh0BEGuoncesUI6cNjkBToW481iOs=.1767bf74-76e7-41dc-8972-584ee34e188f@github.com> Message-ID: On Mon, 6 Jan 2025 17:32:50 GMT, Vladimir Kozlov wrote: >> Extracted from cost-model code https://github.com/openjdk/jdk/pull/20964. >> >> Currently, some nodes are only generated in their dedicated methods: >> - VectorNode::shift_count >> - LShiftCntVNode >> - RShiftCntVNode >> - VectorCastNode::make >> - Vector(U)Cast... >> - VectorBlendNode has no corresponding "factory" method. >> >> The goal is to have all available under VectorNode::make, so that they can be generated simply with the vector opcode. This is helpful for the plans with the cost-model, where the VTransform nodes will only carry the vector-opc, and I need to generate vectors for these vector-opc. > > Looks good to me too. Thanks @vnkozlov @chhagedorn for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22917#issuecomment-2574469626 From epeter at openjdk.org Tue Jan 7 06:18:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 Jan 2025 06:18:46 GMT Subject: Integrated: 8346993: C2 SuperWord: refactor to make more vector nodes available in VectorNode::make In-Reply-To: <5TzH_HHTi_8QL2wh0BEGuoncesUI6cNjkBToW481iOs=.1767bf74-76e7-41dc-8972-584ee34e188f@github.com> References: <5TzH_HHTi_8QL2wh0BEGuoncesUI6cNjkBToW481iOs=.1767bf74-76e7-41dc-8972-584ee34e188f@github.com> Message-ID: On Fri, 3 Jan 2025 15:44:02 GMT, Emanuel Peter wrote: > Extracted from cost-model code https://github.com/openjdk/jdk/pull/20964. > > Currently, some nodes are only generated in their dedicated methods: > - VectorNode::shift_count > - LShiftCntVNode > - RShiftCntVNode > - VectorCastNode::make > - Vector(U)Cast... > - VectorBlendNode has no corresponding "factory" method. > > The goal is to have all available under VectorNode::make, so that they can be generated simply with the vector opcode. This is helpful for the plans with the cost-model, where the VTransform nodes will only carry the vector-opc, and I need to generate vectors for these vector-opc. This pull request has now been integrated. Changeset: 08debd33 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/08debd335e9160d20b87e06a2e83ddedd5f473b8 Stats: 48 lines in 2 files changed: 22 ins; 15 del; 11 mod 8346993: C2 SuperWord: refactor to make more vector nodes available in VectorNode::make Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/22917 From thartmann at openjdk.org Tue Jan 7 07:51:16 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 7 Jan 2025 07:51:16 GMT Subject: [jdk24] RFR: 8343747: C2: TestReplicateAtConv.java crashes with -XX:MaxVectorSize=8 Message-ID: <5HSy2Q_zyxPmkKFa3qix_6fzeBkKf4PsseNpgw-UPlE=.72418433-72d9-4f0b-839b-56e4be3fcc37@github.com> Hi all, This pull request contains a backport of commit [874d68a9](https://github.com/openjdk/jdk/commit/874d68a96ce67caaf944dd25fbfb44eab965dfd3) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Roland Westrelin on 6 Dec 2024 and was reviewed by Emanuel Peter and Vladimir Kozlov. Thanks! ------------- Commit messages: - Backport 874d68a96ce67caaf944dd25fbfb44eab965dfd3 Changes: https://git.openjdk.org/jdk/pull/22938/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22938&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343747 Stats: 6 lines in 2 files changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22938.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22938/head:pull/22938 PR: https://git.openjdk.org/jdk/pull/22938 From chagedorn at openjdk.org Tue Jan 7 07:55:36 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 7 Jan 2025 07:55:36 GMT Subject: [jdk24] RFR: 8343747: C2: TestReplicateAtConv.java crashes with -XX:MaxVectorSize=8 In-Reply-To: <5HSy2Q_zyxPmkKFa3qix_6fzeBkKf4PsseNpgw-UPlE=.72418433-72d9-4f0b-839b-56e4be3fcc37@github.com> References: <5HSy2Q_zyxPmkKFa3qix_6fzeBkKf4PsseNpgw-UPlE=.72418433-72d9-4f0b-839b-56e4be3fcc37@github.com> Message-ID: On Tue, 7 Jan 2025 07:46:02 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [874d68a9](https://github.com/openjdk/jdk/commit/874d68a96ce67caaf944dd25fbfb44eab965dfd3) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 6 Dec 2024 and was reviewed by Emanuel Peter and Vladimir Kozlov. > > Thanks! Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22938#pullrequestreview-2533595410 From thartmann at openjdk.org Tue Jan 7 07:58:34 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 7 Jan 2025 07:58:34 GMT Subject: [jdk24] RFR: 8343747: C2: TestReplicateAtConv.java crashes with -XX:MaxVectorSize=8 In-Reply-To: <5HSy2Q_zyxPmkKFa3qix_6fzeBkKf4PsseNpgw-UPlE=.72418433-72d9-4f0b-839b-56e4be3fcc37@github.com> References: <5HSy2Q_zyxPmkKFa3qix_6fzeBkKf4PsseNpgw-UPlE=.72418433-72d9-4f0b-839b-56e4be3fcc37@github.com> Message-ID: On Tue, 7 Jan 2025 07:46:02 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [874d68a9](https://github.com/openjdk/jdk/commit/874d68a96ce67caaf944dd25fbfb44eab965dfd3) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 6 Dec 2024 and was reviewed by Emanuel Peter and Vladimir Kozlov. > > Thanks! Thanks for the review, Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22938#issuecomment-2574609518 From rehn at openjdk.org Tue Jan 7 08:23:44 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 7 Jan 2025 08:23:44 GMT Subject: RFR: 8346868: RISC-V: compiler/sharedstubs tests fail after JDK-8332689 In-Reply-To: References: Message-ID: On Sat, 28 Dec 2024 02:37:18 GMT, Fei Yang wrote: > Hi, please review this change. > > Here is the change history. [JDK-8293011](https://bugs.openjdk.org/browse/JDK-8293011) shared stubs to interpreter for static calls. And [JDK-8293770](https://bugs.openjdk.org/browse/JDK-8293770) further reused runtime call trampolines. So we have `static constexpr bool supports_shared_stubs() { return true; }`. And both cases are handled in `CodeBuffer::pd_finalize_stubs`. > > > bool CodeBuffer::pd_finalize_stubs() { > return emit_shared_stubs_to_interp(this, _shared_stub_to_interp_requests) > && emit_shared_trampolines(this, _shared_trampoline_requests); > } > > > Then [JDK-8332689](https://bugs.openjdk.org/browse/JDK-8332689) turned off uses of trampolines for far calls by default and changed this function into: `static bool supports_shared_stubs() { return UseTrampolines; }`. This will cause the two test failures as option `UseTrampolines` is off by default. Further, [JDK-8343430](https://bugs.openjdk.org/browse/JDK-8343430) removed old trampoline call and option `UseTrampolines` as well. So now we have `static bool supports_shared_stubs() { return false; }` and a simplified `CodeBuffer::pd_finalize_stubs`. > > > bool CodeBuffer::pd_finalize_stubs() { > return emit_shared_stubs_to_interp(this, _shared_stub_to_interp_requests); > } > > > But [JDK-8332689](https://bugs.openjdk.org/browse/JDK-8332689) is supposed to only make reusing of the runtime call trampolines depend on option `UseTrampolines`. It should not affect the use of shared stubs to interpreter for static calls. So this restores `CodeBuffer::pd_finalize_stubs` letting it return true and disables SharedTrampolineTest.java test for this platform. Tagging @robehn. > > Testing on Premier P550 SBC: > - [x] SharedStubToInterpTest.java (fastdebug) > - [x] Tier1-3 and gtest:all (release) Thanks! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22888#pullrequestreview-2533652334 From jbhateja at openjdk.org Tue Jan 7 08:58:12 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 7 Jan 2025 08:58:12 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v2] In-Reply-To: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> Message-ID: > Adding following IR transforms for unsigned vector Min / Max nodes. > > => UMinV (UMinV(a, b), UMaxV(a, b)) => UMinV(a, b) > => UMinV (UMinV(a, b), UMaxV(b, a)) => UMinV(a, b) > => UMaxV (UMinV(a, b), UMaxV(a, b)) => UMaxV(a, b) > => UMaxV (UMinV(a, b), UMaxV(b, a)) => UMaxV(a, b) > => UMaxV (a, a) => a > => UMinV (a, a) => a > > New IR validation test accompanies the patch. > > This is a follow-up PR for https://github.com/openjdk/jdk/pull/20507 > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Updating copyright year of modified files - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 - Update IR transforms and tests - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 - 8342676: Unsigned Vector Min / Max transforms ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21604/files - new: https://git.openjdk.org/jdk/pull/21604/files/7c035802..cc39220a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21604&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21604&range=00-01 Stats: 5813 lines in 147 files changed: 3939 ins; 1462 del; 412 mod Patch: https://git.openjdk.org/jdk/pull/21604.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21604/head:pull/21604 PR: https://git.openjdk.org/jdk/pull/21604 From mli at openjdk.org Tue Jan 7 09:17:37 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 7 Jan 2025 09:17:37 GMT Subject: RFR: 8346868: RISC-V: compiler/sharedstubs tests fail after JDK-8332689 In-Reply-To: References: Message-ID: <3pHtsxe5IeUk7PB-7J1DX_FhbDKgzGk9IPyri_bHNN8=.bb9e1e3a-9a79-4ab8-a5e9-d1dedac2b266@github.com> On Sat, 28 Dec 2024 02:37:18 GMT, Fei Yang wrote: > Hi, please review this change. > > Here is the change history. [JDK-8293011](https://bugs.openjdk.org/browse/JDK-8293011) shared stubs to interpreter for static calls. And [JDK-8293770](https://bugs.openjdk.org/browse/JDK-8293770) further reused runtime call trampolines. So we have `static constexpr bool supports_shared_stubs() { return true; }`. And both cases are handled in `CodeBuffer::pd_finalize_stubs`. > > > bool CodeBuffer::pd_finalize_stubs() { > return emit_shared_stubs_to_interp(this, _shared_stub_to_interp_requests) > && emit_shared_trampolines(this, _shared_trampoline_requests); > } > > > Then [JDK-8332689](https://bugs.openjdk.org/browse/JDK-8332689) turned off uses of trampolines for far calls by default and changed this function into: `static bool supports_shared_stubs() { return UseTrampolines; }`. This will cause the two test failures as option `UseTrampolines` is off by default. Further, [JDK-8343430](https://bugs.openjdk.org/browse/JDK-8343430) removed old trampoline call and option `UseTrampolines` as well. So now we have `static bool supports_shared_stubs() { return false; }` and a simplified `CodeBuffer::pd_finalize_stubs`. > > > bool CodeBuffer::pd_finalize_stubs() { > return emit_shared_stubs_to_interp(this, _shared_stub_to_interp_requests); > } > > > But [JDK-8332689](https://bugs.openjdk.org/browse/JDK-8332689) is supposed to only make reusing of the runtime call trampolines depend on option `UseTrampolines`. It should not affect the use of shared stubs to interpreter for static calls. So this restores `CodeBuffer::pd_finalize_stubs` letting it return true and disables SharedTrampolineTest.java test for this platform. Tagging @robehn. > > Testing on Premier P550 SBC: > - [x] SharedStubToInterpTest.java (fastdebug) > - [x] Tier1-3 and gtest:all (release) Looks good. ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22888#pullrequestreview-2533783034 From tweidmann at openjdk.org Tue Jan 7 09:18:34 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Tue, 7 Jan 2025 09:18:34 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation Message-ID: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: > For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. > > I would like to start a collection of such generators, that can then be used in testing. > > The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. > > This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. > > Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. ------------- Commit messages: - Improve documentation - Fix typo - Add copyright - Add new lines - Improve documentation, naming, more restrictable - Update SingleValueGenerator.java - Delete patches.iml - Implement most changes - Some refactoring - use Generators.Random, rm some imports - ... and 6 more: https://git.openjdk.org/jdk/compare/28ae281b...36066fa6 Changes: https://git.openjdk.org/jdk/pull/22941/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346107 Stats: 1816 lines in 21 files changed: 1816 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22941/head:pull/22941 PR: https://git.openjdk.org/jdk/pull/22941 From thartmann at openjdk.org Tue Jan 7 09:52:44 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 7 Jan 2025 09:52:44 GMT Subject: [jdk24] Integrated: 8343747: C2: TestReplicateAtConv.java crashes with -XX:MaxVectorSize=8 In-Reply-To: <5HSy2Q_zyxPmkKFa3qix_6fzeBkKf4PsseNpgw-UPlE=.72418433-72d9-4f0b-839b-56e4be3fcc37@github.com> References: <5HSy2Q_zyxPmkKFa3qix_6fzeBkKf4PsseNpgw-UPlE=.72418433-72d9-4f0b-839b-56e4be3fcc37@github.com> Message-ID: On Tue, 7 Jan 2025 07:46:02 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [874d68a9](https://github.com/openjdk/jdk/commit/874d68a96ce67caaf944dd25fbfb44eab965dfd3) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Roland Westrelin on 6 Dec 2024 and was reviewed by Emanuel Peter and Vladimir Kozlov. > > Thanks! This pull request has now been integrated. Changeset: 256856a5 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/256856a5a18813eb13b6667c68b154b984afe6f3 Stats: 6 lines in 2 files changed: 4 ins; 0 del; 2 mod 8343747: C2: TestReplicateAtConv.java crashes with -XX:MaxVectorSize=8 Reviewed-by: chagedorn Backport-of: 874d68a96ce67caaf944dd25fbfb44eab965dfd3 ------------- PR: https://git.openjdk.org/jdk/pull/22938 From dlunden at openjdk.org Tue Jan 7 10:04:37 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 7 Jan 2025 10:04:37 GMT Subject: RFR: 8341697: C2: Register allocation inefficiency in tight loop [v7] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 14:17:09 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch improves the spill placement in the presence of loops. Currently, when trying to spill a live range, we will create a `Phi` at the loop head, this `Phi` will then be spilt inside the loop body, and as the `Phi` is `UP` (lives in register) at the loop head, we need to emit an additional reload at the loop back-edge block. This introduces loop-carried dependencies, greatly reduces loop throughput. >> >> My proposal is to be aware of loop heads and try to eagerly spill or reload live ranges at the loop entries. In general, if a live range is spilt in the loop common path, then we should spill it in the loop entries and reload it at its use sites, this may increase the number of loads but will eliminate loop-carried dependencies, making the load latency-free. On the otherhand, if a live range is only spilt in the uncommon path but is used in the common path, then we should reload it eagerly. I think it is appropriate to bias towards spilling, i.e. if a live range is both spilt and reloaded in the common path, we spill it. This eliminates loop-carried dependencies. >> >> A downfall of this algorithm is that we may overspill, which means that after spilling some live ranges, the others do not need to be spilt anymore but are unnecessarily spilt. >> >> - A possible approach is to split the live ranges one-by-one and try to colour them afterwards. This seems prohibitively expensive. >> - Another approach is to be aware of the number of registers that need spilling, sorting the live ones accordingly. >> - Finally, we can eagerly split a live range at uncommon branches and do conservative coalescing afterwards. I think this is the most elegant and efficient solution for that. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix uncommon_freq Keep active. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21472#issuecomment-2574868682 From fyang at openjdk.org Tue Jan 7 10:34:36 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 7 Jan 2025 10:34:36 GMT Subject: RFR: 8346787: Fix two C2 IR matching tests for RISC-V [v2] In-Reply-To: References: Message-ID: On Mon, 6 Jan 2025 11:03:09 GMT, Fei Yang wrote: >> Two IR matching tests added by [JDK-8332268](https://bugs.openjdk.org/browse/JDK-8332268) are failing on RISC-V: >> TEST: compiler/c2/irTests/ModINodeIdealizationTests.java >> TEST: compiler/c2/irTests/ModLNodeIdealizationTests.java >> >> These two tests require conditional move support. See ModLNode::Ideal & ModLNode::Ideal [1][2]. But RISC-V base ISA (RV64GCV) does not support conditional move, so we set `ConditionalMoveLimit` to 0 for this CPU platform. This change simply skips these two tests for now. >> >> Some further information: >> An initial version of conditional move based on RISC-V `Zicond` extension has been added by: [JDK-8344306](https://bugs.openjdk.org/browse/JDK-8344306). But that still lacks performance tunning and we will reconsider and adjust this `ConditionalMoveLimit` parameter as we go. New issue: [JDK-8346786](https://bugs.openjdk.org/browse/JDK-8346786). >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L993 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L1253 > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Comment Can we have a Reviewer then? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22874#issuecomment-2574940551 From fyang at openjdk.org Tue Jan 7 10:58:42 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 7 Jan 2025 10:58:42 GMT Subject: Integrated: 8346868: RISC-V: compiler/sharedstubs tests fail after JDK-8332689 In-Reply-To: References: Message-ID: On Sat, 28 Dec 2024 02:37:18 GMT, Fei Yang wrote: > Hi, please review this change. > > Here is the change history. [JDK-8293011](https://bugs.openjdk.org/browse/JDK-8293011) shared stubs to interpreter for static calls. And [JDK-8293770](https://bugs.openjdk.org/browse/JDK-8293770) further reused runtime call trampolines. So we have `static constexpr bool supports_shared_stubs() { return true; }`. And both cases are handled in `CodeBuffer::pd_finalize_stubs`. > > > bool CodeBuffer::pd_finalize_stubs() { > return emit_shared_stubs_to_interp(this, _shared_stub_to_interp_requests) > && emit_shared_trampolines(this, _shared_trampoline_requests); > } > > > Then [JDK-8332689](https://bugs.openjdk.org/browse/JDK-8332689) turned off uses of trampolines for far calls by default and changed this function into: `static bool supports_shared_stubs() { return UseTrampolines; }`. This will cause the two test failures as option `UseTrampolines` is off by default. Further, [JDK-8343430](https://bugs.openjdk.org/browse/JDK-8343430) removed old trampoline call and option `UseTrampolines` as well. So now we have `static bool supports_shared_stubs() { return false; }` and a simplified `CodeBuffer::pd_finalize_stubs`. > > > bool CodeBuffer::pd_finalize_stubs() { > return emit_shared_stubs_to_interp(this, _shared_stub_to_interp_requests); > } > > > But [JDK-8332689](https://bugs.openjdk.org/browse/JDK-8332689) is supposed to only make reusing of the runtime call trampolines depend on option `UseTrampolines`. It should not affect the use of shared stubs to interpreter for static calls. So this restores `CodeBuffer::pd_finalize_stubs` letting it return true and disables SharedTrampolineTest.java test for this platform. Tagging @robehn. > > Testing on Premier P550 SBC: > - [x] SharedStubToInterpTest.java (fastdebug) > - [x] Tier1-3 and gtest:all (release) This pull request has now been integrated. Changeset: 3f7052ed Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/3f7052ed7af89efd1c6977df0b4f3b95fcfec764 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod 8346868: RISC-V: compiler/sharedstubs tests fail after JDK-8332689 Reviewed-by: rehn, mli ------------- PR: https://git.openjdk.org/jdk/pull/22888 From fyang at openjdk.org Tue Jan 7 10:58:41 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 7 Jan 2025 10:58:41 GMT Subject: RFR: 8346868: RISC-V: compiler/sharedstubs tests fail after JDK-8332689 In-Reply-To: References: Message-ID: On Tue, 7 Jan 2025 08:20:56 GMT, Robbin Ehn wrote: >> Hi, please review this change. >> >> Here is the change history. [JDK-8293011](https://bugs.openjdk.org/browse/JDK-8293011) shared stubs to interpreter for static calls. And [JDK-8293770](https://bugs.openjdk.org/browse/JDK-8293770) further reused runtime call trampolines. So we have `static constexpr bool supports_shared_stubs() { return true; }`. And both cases are handled in `CodeBuffer::pd_finalize_stubs`. >> >> >> bool CodeBuffer::pd_finalize_stubs() { >> return emit_shared_stubs_to_interp(this, _shared_stub_to_interp_requests) >> && emit_shared_trampolines(this, _shared_trampoline_requests); >> } >> >> >> Then [JDK-8332689](https://bugs.openjdk.org/browse/JDK-8332689) turned off uses of trampolines for far calls by default and changed this function into: `static bool supports_shared_stubs() { return UseTrampolines; }`. This will cause the two test failures as option `UseTrampolines` is off by default. Further, [JDK-8343430](https://bugs.openjdk.org/browse/JDK-8343430) removed old trampoline call and option `UseTrampolines` as well. So now we have `static bool supports_shared_stubs() { return false; }` and a simplified `CodeBuffer::pd_finalize_stubs`. >> >> >> bool CodeBuffer::pd_finalize_stubs() { >> return emit_shared_stubs_to_interp(this, _shared_stub_to_interp_requests); >> } >> >> >> But [JDK-8332689](https://bugs.openjdk.org/browse/JDK-8332689) is supposed to only make reusing of the runtime call trampolines depend on option `UseTrampolines`. It should not affect the use of shared stubs to interpreter for static calls. So this restores `CodeBuffer::pd_finalize_stubs` letting it return true and disables SharedTrampolineTest.java test for this platform. Tagging @robehn. >> >> Testing on Premier P550 SBC: >> - [x] SharedStubToInterpTest.java (fastdebug) >> - [x] Tier1-3 and gtest:all (release) > > Thanks! @robehn @Hamlin-Li : Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22888#issuecomment-2574985100 From fyang at openjdk.org Tue Jan 7 11:06:46 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 7 Jan 2025 11:06:46 GMT Subject: [jdk24] RFR: 8346868: RISC-V: compiler/sharedstubs tests fail after JDK-8332689 Message-ID: Hi all, This pull request contains a backport of commit [3f7052ed](https://github.com/openjdk/jdk/commit/3f7052ed7af89efd1c6977df0b4f3b95fcfec764) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Fei Yang on 7 Jan 2025 and was reviewed by Robbin Ehn and Hamlin Li. Thanks! ------------- Commit messages: - Backport 3f7052ed7af89efd1c6977df0b4f3b95fcfec764 Changes: https://git.openjdk.org/jdk/pull/22945/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22945&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346868 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22945.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22945/head:pull/22945 PR: https://git.openjdk.org/jdk/pull/22945 From tholenstein at openjdk.org Tue Jan 7 12:03:43 2025 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 7 Jan 2025 12:03:43 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout [v9] In-Reply-To: References: Message-ID: On Thu, 12 Dec 2024 21:01:24 GMT, Tobias Holenstein wrote: >> This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: >> >> git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 >> git checkout pull/22438 >> >> >> Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. >> >> In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. >> >> This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. >> >> ![free](https://github.com/user-attachments/assets/c150334e-4d9f-4abf-97ea-3cb42bd1c602) > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update LayoutGraph.java Can someone please re-review? thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22438#issuecomment-2575110516 From tweidmann at openjdk.org Tue Jan 7 12:53:37 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Tue, 7 Jan 2025 12:53:37 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v3] In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 06:53:21 GMT, Tobias Hartmann wrote: > Please make sure to run testing with javac flag `-XDstringConcat=inline` to have it use StringBuffer instead of invokedynamic based string concat (see [JEP 280](https://openjdk.org/jeps/280)). As we discussed, this did not lead to any failures related to this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22537#issuecomment-2575226727 From epeter at openjdk.org Tue Jan 7 13:14:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 Jan 2025 13:14:01 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v3] In-Reply-To: References: Message-ID: On Fri, 13 Dec 2024 13:07:32 GMT, Roland Westrelin wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 114 commits: >> >> - manual merge >> - fix printing >> - rename >> - fix up print >> - add TestEquivalentInvariants.java >> - improve documentation >> - hide parser via delegation >> - Merge branch 'master' into JDK-8343685-VPointer-MemPointer >> - make sort stable >> - some comment and naming improvements >> - ... and 104 more: https://git.openjdk.org/jdk/compare/31ceec7c...4b0504d0 > > This is tricky to review but looks reasonable to me. @rwestrel Do you intend to review this patch, or did you only place some drive-through comments? Just asking so I know if I should ask someone else to review. @chhagedorn @rwestrel @vnkozlov FYI: I created this "Overview" for all the SuperWord issues (Press [F] for fullscreen mode): https://eme64.github.io/blog/2025/01/01/AutoVectorization-Status.html ------------- PR Comment: https://git.openjdk.org/jdk/pull/21926#issuecomment-2575264017 From chagedorn at openjdk.org Tue Jan 7 13:51:51 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 7 Jan 2025 13:51:51 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout [v9] In-Reply-To: References: Message-ID: On Thu, 12 Dec 2024 21:01:24 GMT, Tobias Holenstein wrote: >> This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: >> >> git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 >> git checkout pull/22438 >> >> >> Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. >> >> In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. >> >> This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. >> >> ![free](https://github.com/user-attachments/assets/c150334e-4d9f-4abf-97ea-3cb42bd1c602) > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Update LayoutGraph.java Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22438#pullrequestreview-2534386269 From epeter at openjdk.org Tue Jan 7 14:02:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 Jan 2025 14:02:42 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation In-Reply-To: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: On Tue, 7 Jan 2025 08:58:17 GMT, Theo Weidmann wrote: > This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: > >> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >> >> I would like to start a collection of such generators, that can then be used in testing. >> >> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >> >> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >> >> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 > > The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. Nice work! I have a few comments in the code. I am also wondering if we can still do this: - pick a random int distribution -> suppose we get mixed special + uniform - now sample from different int ranges, all from that same underlying distribution: - [0, max] - [10, 20] - ... I think it is not possible: gen = Generators.ints(); // gen is not resrictable, sadly... but I would like to do gen1 = gen.restrict(0, max); v1 = gen1.next(); gen2 = gen.restrict(10, 20); v2 = gen2.next() If that is indeed not possible: How can we ensure the continuity of the distribution across different range restrictions, if we want to pick a random distribution? test/hotspot/jtreg/compiler/lib/generators/EmptyGeneratorException.java line 29: > 27: * An EmptyGeneratorException is thrown if a generator configuration is requested that would result in an empty > 28: * set of values. For example, bounds such as [1, 0] cause an EmptyGeneratorException. Another example would be > 29: * restricting a uniform integer generator over the range [0, 1] to [10, 11]. What if I mix distributions, and one of them has no values from that range, but the other does? test/hotspot/jtreg/compiler/lib/generators/Generator.java line 33: > 31: * Returns the next value from the stream. > 32: */ > 33: T next(); Should this not have a `@return`? Why don't you run something like: `/oracle-work/jdk-fork0/build/linux-x64-debug/jdk/bin/javadoc -sourcepath test/hotspot/jtreg:./test/lib compiler.lib.generators` And make sure you don't have any errors/warnings :) test/hotspot/jtreg/compiler/lib/generators/GeneratorBase.java line 26: > 24: package compiler.lib.generators; > 25: > 26: abstract class GeneratorBase implements Generator { Can you add a comment what this is for? test/hotspot/jtreg/compiler/lib/generators/Generators.java line 39: > 37: * optimizations. > 38: *

> 39: * Normally, clients get the default Generators instance by referring to the static variable {@link #G}. It would be nice to have an example test that uses it as you would expect. test/hotspot/jtreg/compiler/lib/generators/Generators.java line 68: > 66: * > 67: *

> 68: * If there is a single value that is interesting as the to all three parameters, we might even call this "as the to all" Looks like a typo? test/hotspot/jtreg/compiler/lib/generators/Generators.java line 257: > 255: } > 256: > 257: public Generator mixedWithSpecialInts(int weightA, int weightB, int rangeSpecial) { Suggestion: public Generator uniformMixedWithSpecialInts(int weightA, int weightB, int rangeSpecial) { Optional, you can also leave it as is. test/hotspot/jtreg/compiler/lib/generators/RandomnessSource.java line 28: > 26: /** > 27: * Defines the underlying randomness source used by the generators. This is essentially a subset of > 28: * {@link java.util.random.RandomGenerator} and the present methods have the same contract. Why do you need this? For testing? -> Add comment about that. test/hotspot/jtreg/testlibrary_tests/generators/tests/TestGenerators.java line 78: > 76: Asserts.assertEQ(g.next(), 4); > 77: Asserts.assertEQ(g.next(), 18); > 78: } It would be nice if you told us / a future person who extends this, what this mocking does, and how it works. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22941#pullrequestreview-2534311495 PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1905439519 PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1905441799 PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1905442788 PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1905452818 PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1905455587 PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1905463832 PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1905473446 PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1905484613 From tweidmann at openjdk.org Tue Jan 7 14:11:39 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Tue, 7 Jan 2025 14:11:39 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation In-Reply-To: References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: On Tue, 7 Jan 2025 13:15:04 GMT, Emanuel Peter wrote: >> This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: >> >>> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >>> >>> I would like to start a collection of such generators, that can then be used in testing. >>> >>> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >>> >>> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >>> >>> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 >> >> The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. > > test/hotspot/jtreg/compiler/lib/generators/EmptyGeneratorException.java line 29: > >> 27: * An EmptyGeneratorException is thrown if a generator configuration is requested that would result in an empty >> 28: * set of values. For example, bounds such as [1, 0] cause an EmptyGeneratorException. Another example would be >> 29: * restricting a uniform integer generator over the range [0, 1] to [10, 11]. > > What if I mix distributions, and one of them has no values from that range, but the other does? Currently the mixed generator does not even support restricting, but I think that's something that I should still add, also with regard to your comment above. > What if I mix distributions, and one of them has no values from that range, but the other does? I would make the mixed generator call `restricted` on both its generators, which might in turn throw EmptyGeneratorException, if they cannot be restricted to that range. > test/hotspot/jtreg/compiler/lib/generators/Generators.java line 257: > >> 255: } >> 256: >> 257: public Generator mixedWithSpecialInts(int weightA, int weightB, int rangeSpecial) { > > Suggestion: > > public Generator uniformMixedWithSpecialInts(int weightA, int weightB, int rangeSpecial) { > > Optional, you can also leave it as is. Oh, yeah, agreed. I wanted to change this and then forgot. > test/hotspot/jtreg/testlibrary_tests/generators/tests/TestGenerators.java line 78: > >> 76: Asserts.assertEQ(g.next(), 4); >> 77: Asserts.assertEQ(g.next(), 18); >> 78: } > > It would be nice if you told us / a future person who extends this, what this mocking does, and how it works. Do you mean specifically how this test here works or how the mocking works? Or both? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1905505679 PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1905507125 PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1905507808 From epeter at openjdk.org Tue Jan 7 14:11:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 Jan 2025 14:11:39 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation In-Reply-To: References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: <2RYMpkapuoewCcYdF8OoT94EZGN1cLGgJnf7oozRQrg=.cf85bf87-10ac-4ee8-946d-e0496b0a5c83@github.com> On Tue, 7 Jan 2025 14:05:33 GMT, Theo Weidmann wrote: >> test/hotspot/jtreg/compiler/lib/generators/EmptyGeneratorException.java line 29: >> >>> 27: * An EmptyGeneratorException is thrown if a generator configuration is requested that would result in an empty >>> 28: * set of values. For example, bounds such as [1, 0] cause an EmptyGeneratorException. Another example would be >>> 29: * restricting a uniform integer generator over the range [0, 1] to [10, 11]. >> >> What if I mix distributions, and one of them has no values from that range, but the other does? > > Currently the mixed generator does not even support restricting, but I think that's something that I should still add, also with regard to your comment above. > >> What if I mix distributions, and one of them has no values from that range, but the other does? > > I would make the mixed generator call `restricted` on both its generators, which might in turn throw EmptyGeneratorException, if they cannot be restricted to that range. You could fix it this way: when you restrict mixed generators, you try to recursively restrict all its sub-generators. If one cannot be restricted, you do not throw, but just remove it from the mixed distribution ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1905508138 From epeter at openjdk.org Tue Jan 7 14:11:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 Jan 2025 14:11:40 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation In-Reply-To: <2RYMpkapuoewCcYdF8OoT94EZGN1cLGgJnf7oozRQrg=.cf85bf87-10ac-4ee8-946d-e0496b0a5c83@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> <2RYMpkapuoewCcYdF8OoT94EZGN1cLGgJnf7oozRQrg=.cf85bf87-10ac-4ee8-946d-e0496b0a5c83@github.com> Message-ID: On Tue, 7 Jan 2025 14:07:33 GMT, Emanuel Peter wrote: >> Currently the mixed generator does not even support restricting, but I think that's something that I should still add, also with regard to your comment above. >> >>> What if I mix distributions, and one of them has no values from that range, but the other does? >> >> I would make the mixed generator call `restricted` on both its generators, which might in turn throw EmptyGeneratorException, if they cannot be restricted to that range. > > You could fix it this way: when you restrict mixed generators, you try to recursively restrict all its sub-generators. If one cannot be restricted, you do not throw, but just remove it from the mixed distribution ;) Only if none remain -> throw ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1905508443 From tweidmann at openjdk.org Tue Jan 7 14:11:40 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Tue, 7 Jan 2025 14:11:40 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation In-Reply-To: References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> <2RYMpkapuoewCcYdF8OoT94EZGN1cLGgJnf7oozRQrg=.cf85bf87-10ac-4ee8-946d-e0496b0a5c83@github.com> Message-ID: On Tue, 7 Jan 2025 14:07:46 GMT, Emanuel Peter wrote: >> You could fix it this way: when you restrict mixed generators, you try to recursively restrict all its sub-generators. If one cannot be restricted, you do not throw, but just remove it from the mixed distribution ;) > > Only if none remain -> throw Also sounds good to me. You have a better overview over the use-cases, so I'll just go with that :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1905510579 From epeter at openjdk.org Tue Jan 7 14:17:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 Jan 2025 14:17:37 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation In-Reply-To: References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> <2RYMpkapuoewCcYdF8OoT94EZGN1cLGgJnf7oozRQrg=.cf85bf87-10ac-4ee8-946d-e0496b0a5c83@github.com> Message-ID: On Tue, 7 Jan 2025 14:09:14 GMT, Theo Weidmann wrote: >> Only if none remain -> throw > > Also sounds good to me. You have a better overview over the use-cases, so I'll just go with that :) Hmm. But what if I at some point get the "special" distribution from `Generators.ints`, but then want to draw from a range that has no elements? I kinda need that to not throw for the Templates. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1905518088 From tweidmann at openjdk.org Tue Jan 7 14:21:35 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Tue, 7 Jan 2025 14:21:35 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation In-Reply-To: References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> <2RYMpkapuoewCcYdF8OoT94EZGN1cLGgJnf7oozRQrg=.cf85bf87-10ac-4ee8-946d-e0496b0a5c83@github.com> Message-ID: On Tue, 7 Jan 2025 14:14:41 GMT, Emanuel Peter wrote: >> Also sounds good to me. You have a better overview over the use-cases, so I'll just go with that :) > > Hmm. But what if I at some point get the "special" distribution from `Generators.ints`, but then want to draw from a range that has no elements? I kinda need that to not throw for the Templates. But what is it good for to draw from a range with no elements? What is supposed to happen then? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1905524507 From epeter at openjdk.org Tue Jan 7 14:25:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 Jan 2025 14:25:41 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation In-Reply-To: References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> <2RYMpkapuoewCcYdF8OoT94EZGN1cLGgJnf7oozRQrg=.cf85bf87-10ac-4ee8-946d-e0496b0a5c83@github.com> Message-ID: On Tue, 7 Jan 2025 14:19:28 GMT, Theo Weidmann wrote: >> Hmm. But what if I at some point get the "special" distribution from `Generators.ints`, but then want to draw from a range that has no elements? I kinda need that to not throw for the Templates. > > But what is it good for to draw from a range with no elements? What is supposed to happen then? Well, it is more about this: I want to be able to draw from restricted ranges in the Templates. But the distribution should be random. Maybe the solution is just to make sure that for `Generators.ints`, we always mix in uniform, but at a very low weight. That way, if all other sub-distribution of a mixed distribution fall away (empty), we at least still have the uniform distribution. Because if the template wants a range, then we must sample something from that range. Does that make sense? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1905529209 From tholenstein at openjdk.org Tue Jan 7 14:32:59 2025 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 7 Jan 2025 14:32:59 GMT Subject: Integrated: 8345041: IGV: Free Placement Mode in IGV Layout In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 13:45:45 GMT, Tobias Holenstein wrote: > This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: > > git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 > git checkout pull/22438 > > > Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. > > In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. > > This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. > > ![free](https://github.com/user-attachments/assets/c150334e-4d9f-4abf-97ea-3cb42bd1c602) This pull request has now been integrated. Changeset: e5f0c190 Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/e5f0c19084dcb5f16a5e7665f98005a35173f61d Stats: 677 lines in 12 files changed: 649 ins; 7 del; 21 mod 8345041: IGV: Free Placement Mode in IGV Layout Reviewed-by: chagedorn, epeter, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/22438 From syan at openjdk.org Tue Jan 7 15:09:46 2025 From: syan at openjdk.org (SendaoYan) Date: Tue, 7 Jan 2025 15:09:46 GMT Subject: RFR: 8346965: Multiple compiler/ciReplay test fails with -XX:+SegmentedCodeCache In-Reply-To: <25ygZomJjdSYgcc5KcxtosY_836TVfXdA9rlehJRTrs=.80e764ac-6fea-4b8f-bbb8-bf37a9d23ffb@github.com> References: <25ygZomJjdSYgcc5KcxtosY_836TVfXdA9rlehJRTrs=.80e764ac-6fea-4b8f-bbb8-bf37a9d23ffb@github.com> Message-ID: On Mon, 6 Jan 2025 12:41:13 GMT, SendaoYan wrote: > Hi all, > There are 4 tests fails run with JVM option '-XX:+SegmentedCodeCache'. JVM option '-XX:ReservedCodeCacheSize=4m' [inside test](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/ciReplay/CiReplayBase.java#L68) conflict with JVM option '-XX:+SegmentedCodeCache' which pass from outside test. This PR add '-XX:-SegmentedCodeCache' explicitly inside test to make test run success whenever received '-XX:+SegmentedCodeCache' outside or not. Change has been verified locally, test-fix only, make test more robustness, no risk. > > 'SegmentedCodeCache' is enable by default > >> java -XX:+PrintFlagsFinal 2>&1 | grep -P "(SegmentedCodeCache)|(ReservedCodeCacheSize)" > uintx ReservedCodeCacheSize = 251658240 {pd product} {ergonomic} > bool SegmentedCodeCache = true {product} {ergonomic} > > > Pass '-XX:ReservedCodeCacheSize=4m' to JVM will automatic disable `SegmentedCodeCache` > >> java -XX:ReservedCodeCacheSize=4m -XX:+PrintFlagsFinal 2>&1 | grep -P "(SegmentedCodeCache)|(ReservedCodeCacheSize)" > uintx ReservedCodeCacheSize = 4194304 {pd product} {command line} > bool SegmentedCodeCache = false {product} {default} > > > '-XX:+SegmentedCodeCache' conflict with '-XX:ReservedCodeCacheSize=4m' > >> java -XX:ReservedCodeCacheSize=4m -XX:+SegmentedCodeCache -version > Error occurred during initialization of VM > Invalid code heap sizes: NonNMethodCodeHeapSize (8006K) + ProfiledCodeHeapSize (4K) + NonProfiledCodeHeapSize (4K) = 8014K is greater than ReservedCodeCacheSize (4096K). Thanks all for the advice and review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22926#issuecomment-2575528557 From syan at openjdk.org Tue Jan 7 15:09:47 2025 From: syan at openjdk.org (SendaoYan) Date: Tue, 7 Jan 2025 15:09:47 GMT Subject: Integrated: 8346965: Multiple compiler/ciReplay test fails with -XX:+SegmentedCodeCache In-Reply-To: <25ygZomJjdSYgcc5KcxtosY_836TVfXdA9rlehJRTrs=.80e764ac-6fea-4b8f-bbb8-bf37a9d23ffb@github.com> References: <25ygZomJjdSYgcc5KcxtosY_836TVfXdA9rlehJRTrs=.80e764ac-6fea-4b8f-bbb8-bf37a9d23ffb@github.com> Message-ID: <-Owf56KIUn2KBpN42TcHgq7M2ZAGTVQTqLB2-zmZwxU=.89c3e7c2-a8d7-4648-ae13-d0a72709c12a@github.com> On Mon, 6 Jan 2025 12:41:13 GMT, SendaoYan wrote: > Hi all, > There are 4 tests fails run with JVM option '-XX:+SegmentedCodeCache'. JVM option '-XX:ReservedCodeCacheSize=4m' [inside test](https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/ciReplay/CiReplayBase.java#L68) conflict with JVM option '-XX:+SegmentedCodeCache' which pass from outside test. This PR add '-XX:-SegmentedCodeCache' explicitly inside test to make test run success whenever received '-XX:+SegmentedCodeCache' outside or not. Change has been verified locally, test-fix only, make test more robustness, no risk. > > 'SegmentedCodeCache' is enable by default > >> java -XX:+PrintFlagsFinal 2>&1 | grep -P "(SegmentedCodeCache)|(ReservedCodeCacheSize)" > uintx ReservedCodeCacheSize = 251658240 {pd product} {ergonomic} > bool SegmentedCodeCache = true {product} {ergonomic} > > > Pass '-XX:ReservedCodeCacheSize=4m' to JVM will automatic disable `SegmentedCodeCache` > >> java -XX:ReservedCodeCacheSize=4m -XX:+PrintFlagsFinal 2>&1 | grep -P "(SegmentedCodeCache)|(ReservedCodeCacheSize)" > uintx ReservedCodeCacheSize = 4194304 {pd product} {command line} > bool SegmentedCodeCache = false {product} {default} > > > '-XX:+SegmentedCodeCache' conflict with '-XX:ReservedCodeCacheSize=4m' > >> java -XX:ReservedCodeCacheSize=4m -XX:+SegmentedCodeCache -version > Error occurred during initialization of VM > Invalid code heap sizes: NonNMethodCodeHeapSize (8006K) + ProfiledCodeHeapSize (4K) + NonProfiledCodeHeapSize (4K) = 8014K is greater than ReservedCodeCacheSize (4096K). This pull request has now been integrated. Changeset: cf3e48e7 Author: SendaoYan URL: https://git.openjdk.org/jdk/commit/cf3e48e77172db7e27530af9754e1ead8d493f52 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8346965: Multiple compiler/ciReplay test fails with -XX:+SegmentedCodeCache Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/22926 From syan at openjdk.org Tue Jan 7 15:19:51 2025 From: syan at openjdk.org (SendaoYan) Date: Tue, 7 Jan 2025 15:19:51 GMT Subject: [jdk24] RFR: 8346965: Multiple compiler/ciReplay test fails with -XX:+SegmentedCodeCache Message-ID: Hi all, This pull request contains a backport of commit [cf3e48e7](https://github.com/openjdk/jdk/commit/cf3e48e77172db7e27530af9754e1ead8d493f52) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by SendaoYan on 7 Jan 2025 and was reviewed by Vladimir Kozlov. Thanks! ------------- Commit messages: - Backport cf3e48e77172db7e27530af9754e1ead8d493f52 Changes: https://git.openjdk.org/jdk/pull/22950/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22950&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346965 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22950.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22950/head:pull/22950 PR: https://git.openjdk.org/jdk/pull/22950 From tweidmann at openjdk.org Tue Jan 7 15:52:42 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Tue, 7 Jan 2025 15:52:42 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation In-Reply-To: References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: On Tue, 7 Jan 2025 13:17:00 GMT, Emanuel Peter wrote: >> This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: >> >>> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >>> >>> I would like to start a collection of such generators, that can then be used in testing. >>> >>> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >>> >>> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >>> >>> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 >> >> The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. > > test/hotspot/jtreg/compiler/lib/generators/Generator.java line 33: > >> 31: * Returns the next value from the stream. >> 32: */ >> 33: T next(); > > Should this not have a `@return`? > > Why don't you run something like: > `/oracle-work/jdk-fork0/build/linux-x64-debug/jdk/bin/javadoc -sourcepath test/hotspot/jtreg:./test/lib compiler.lib.generators` > And make sure you don't have any errors/warnings :) What would you like to see here with @return? I don't think there's any more information I can provide? Do you think the entire doc comment should be like below? In my experience that can be detrimental for some IDEs as they will only show the main text (for a lack of a better word) in some circumstances. ``` /** * @returns the next value from the stream. */ To get away with all the warnings would mean documenting every single parameter everywhere (all lo's and hi's). Do you think that's really necessary? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1905662756 From tweidmann at openjdk.org Tue Jan 7 16:02:22 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Tue, 7 Jan 2025 16:02:22 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v2] In-Reply-To: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: > This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: > >> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >> >> I would like to start a collection of such generators, that can then be used in testing. >> >> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >> >> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >> >> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 > > The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. Theo Weidmann has updated the pull request incrementally with two additional commits since the last revision: - Improve phrasing - Naming and documentation improvements ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22941/files - new: https://git.openjdk.org/jdk/pull/22941/files/36066fa6..bb5074d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=00-01 Stats: 77 lines in 9 files changed: 39 ins; 32 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22941/head:pull/22941 PR: https://git.openjdk.org/jdk/pull/22941 From shade at openjdk.org Tue Jan 7 16:44:56 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 7 Jan 2025 16:44:56 GMT Subject: RFR: 8347127: CTW fails to build after JDK-8334733 Message-ID: [JDK-8334733](https://bugs.openjdk.org/browse/JDK-8334733) removed the filter for `ModuleInfoWriter`, which now causes standalone CTW to fail when building: $ export JAVA_HOME= $ export PATH=$JAVA_HOME/bin:$PATH $ cd test/hotspot/jtreg/testlibrary/ctw $ make /home/shipilev/shipilev-jdk/build/linux-x86_64-server-fastdebug/images/jdk/bin/../bin/javac --add-exports java.base/jdk.internal.jimage=ALL-UNNAMED --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/jdk.internal.reflect=ALL-UNNAMED --add-exports java.base/jdk.internal.access=ALL-UNNAMED -sourcepath src -d build/classes -cp dist/wb.jar @filelist ../../../../../test/lib/jdk/test/lib/util/ModuleInfoWriter.java:44: error: package jdk.internal.module is not visible import jdk.internal.module.ModuleResolution; ^ (package jdk.internal.module is declared in module java.base, which does not export it to the unnamed module) ../../../../../test/lib/jdk/test/lib/util/ModuleInfoWriter.java:45: error: package jdk.internal.module is not visible import jdk.internal.module.ModuleTarget; ^ (package jdk.internal.module is declared in module java.base, which does not export it to the unnamed module) Note: Some input files use unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. 2 errors make: *** [dist/ctw.jar] Error 1 Additional testing: - [x] CTW `make` works now - [x] Standalone CTW works now ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/22952/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22952&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347127 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22952.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22952/head:pull/22952 PR: https://git.openjdk.org/jdk/pull/22952 From shade at openjdk.org Tue Jan 7 16:52:42 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 7 Jan 2025 16:52:42 GMT Subject: RFR: 8347127: CTW fails to build after JDK-8334733 In-Reply-To: References: Message-ID: On Tue, 7 Jan 2025 16:47:08 GMT, Chen Liang wrote: > Is there a reason ctw/hotspot-misc is not included in any of the tiers so such failures are never caught by CIs? This is a standalone CTW runner. I think we run CTW through jtreg regularly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22952#issuecomment-2575779102 From liach at openjdk.org Tue Jan 7 16:52:42 2025 From: liach at openjdk.org (Chen Liang) Date: Tue, 7 Jan 2025 16:52:42 GMT Subject: RFR: 8347127: CTW fails to build after JDK-8334733 In-Reply-To: References: Message-ID: On Tue, 7 Jan 2025 16:39:43 GMT, Aleksey Shipilev wrote: > [JDK-8334733](https://bugs.openjdk.org/browse/JDK-8334733) removed the filter for `ModuleInfoWriter`, which now causes standalone CTW to fail when building: > > > $ export JAVA_HOME= > $ export PATH=$JAVA_HOME/bin:$PATH > $ cd test/hotspot/jtreg/testlibrary/ctw > $ make > > /home/shipilev/shipilev-jdk/build/linux-x86_64-server-fastdebug/images/jdk/bin/../bin/javac --add-exports java.base/jdk.internal.jimage=ALL-UNNAMED --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/jdk.internal.reflect=ALL-UNNAMED --add-exports java.base/jdk.internal.access=ALL-UNNAMED -sourcepath src -d build/classes -cp dist/wb.jar @filelist > ../../../../../test/lib/jdk/test/lib/util/ModuleInfoWriter.java:44: error: package jdk.internal.module is not visible > import jdk.internal.module.ModuleResolution; > ^ > (package jdk.internal.module is declared in module java.base, which does not export it to the unnamed module) > ../../../../../test/lib/jdk/test/lib/util/ModuleInfoWriter.java:45: error: package jdk.internal.module is not visible > import jdk.internal.module.ModuleTarget; > ^ > (package jdk.internal.module is declared in module java.base, which does not export it to the unnamed module) > Note: Some input files use unchecked or unsafe operations. > Note: Recompile with -Xlint:unchecked for details. > 2 errors > make: *** [dist/ctw.jar] Error 1 > > > Additional testing: > - [x] CTW `make` works now > - [x] Standalone CTW works now Is there a reason ctw/hotspot-misc is not included in any of the tiers so such failures are never caught by CIs? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22952#issuecomment-2575773717 From kvn at openjdk.org Tue Jan 7 16:52:42 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 7 Jan 2025 16:52:42 GMT Subject: RFR: 8347127: CTW fails to build after JDK-8334733 In-Reply-To: References: Message-ID: On Tue, 7 Jan 2025 16:39:43 GMT, Aleksey Shipilev wrote: > [JDK-8334733](https://bugs.openjdk.org/browse/JDK-8334733) removed the filter for `ModuleInfoWriter`, which now causes standalone CTW to fail when building: > > > $ export JAVA_HOME= > $ export PATH=$JAVA_HOME/bin:$PATH > $ cd test/hotspot/jtreg/testlibrary/ctw > $ make > > /home/shipilev/shipilev-jdk/build/linux-x86_64-server-fastdebug/images/jdk/bin/../bin/javac --add-exports java.base/jdk.internal.jimage=ALL-UNNAMED --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/jdk.internal.reflect=ALL-UNNAMED --add-exports java.base/jdk.internal.access=ALL-UNNAMED -sourcepath src -d build/classes -cp dist/wb.jar @filelist > ../../../../../test/lib/jdk/test/lib/util/ModuleInfoWriter.java:44: error: package jdk.internal.module is not visible > import jdk.internal.module.ModuleResolution; > ^ > (package jdk.internal.module is declared in module java.base, which does not export it to the unnamed module) > ../../../../../test/lib/jdk/test/lib/util/ModuleInfoWriter.java:45: error: package jdk.internal.module is not visible > import jdk.internal.module.ModuleTarget; > ^ > (package jdk.internal.module is declared in module java.base, which does not export it to the unnamed module) > Note: Some input files use unchecked or unsafe operations. > Note: Recompile with -Xlint:unchecked for details. > 2 errors > make: *** [dist/ctw.jar] Error 1 > > > Additional testing: > - [x] CTW `make` works now > - [x] Standalone CTW works now Trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22952#pullrequestreview-2534909722 From epeter at openjdk.org Tue Jan 7 17:08:35 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 Jan 2025 17:08:35 GMT Subject: RFR: 8347127: CTW fails to build after JDK-8334733 In-Reply-To: References: Message-ID: On Tue, 7 Jan 2025 16:39:43 GMT, Aleksey Shipilev wrote: > [JDK-8334733](https://bugs.openjdk.org/browse/JDK-8334733) removed the filter for `ModuleInfoWriter`, which now causes standalone CTW to fail when building: > > > $ export JAVA_HOME= > $ export PATH=$JAVA_HOME/bin:$PATH > $ cd test/hotspot/jtreg/testlibrary/ctw > $ make > > /home/shipilev/shipilev-jdk/build/linux-x86_64-server-fastdebug/images/jdk/bin/../bin/javac --add-exports java.base/jdk.internal.jimage=ALL-UNNAMED --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/jdk.internal.reflect=ALL-UNNAMED --add-exports java.base/jdk.internal.access=ALL-UNNAMED -sourcepath src -d build/classes -cp dist/wb.jar @filelist > ../../../../../test/lib/jdk/test/lib/util/ModuleInfoWriter.java:44: error: package jdk.internal.module is not visible > import jdk.internal.module.ModuleResolution; > ^ > (package jdk.internal.module is declared in module java.base, which does not export it to the unnamed module) > ../../../../../test/lib/jdk/test/lib/util/ModuleInfoWriter.java:45: error: package jdk.internal.module is not visible > import jdk.internal.module.ModuleTarget; > ^ > (package jdk.internal.module is declared in module java.base, which does not export it to the unnamed module) > Note: Some input files use unchecked or unsafe operations. > Note: Recompile with -Xlint:unchecked for details. > 2 errors > make: *** [dist/ctw.jar] Error 1 > > > Additional testing: > - [x] CTW `make` works now > - [x] Standalone CTW works now Looks good ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22952#pullrequestreview-2534946627 From jsjolen at openjdk.org Tue Jan 7 17:12:46 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 7 Jan 2025 17:12:46 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v25] In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 08:29:04 GMT, Theo Weidmann wrote: >> In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. >> >> Concerns were raised by @rwestrel in the previous PR: >> >>> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? >> >> As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). >> >> The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: >> >> >> 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant >> @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call >> >> >> Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. > > Theo Weidmann has updated the pull request incrementally with two additional commits since the last revision: > > - Revert "Add UDivI/L and UModI/L to no_dependent_zero_check" > > This reverts commit b72ff10ba1581372b72edeadd3cf01a97ccf1c73. > - Revert "Update TestSplitDivisionThroughPhi.java" > > This reverts commit 7526bafff4ea26cb45894477b33f3dd24215e667. Hi, It looks good, but maybe we can get rid of `make_stream` by doing something like this: ```c++ class InlinePrinter { stringStream* _stream; public: IPInlineAttempt() : _stream(new (mtCompiler) stringStream) { } IPInlineAttempt(const IPInlineAttempt& other) : _stream(other._stream) {} ~IPInlineAttempt() { // Doesn't delete _stream on purpose, // in order to avoid issue with GA:s copy semantics on resize. } void deallocate_stream() { delete _stream; } } So we can still use the copy semantics of GA to our advantage at times :-). Unfortunately, the `TreapCHeap` does not call destructors on deallocation. We've only used it for PODs so far, let me fix that before you integrate this. src/hotspot/share/opto/printinlining.hpp line 42: > 40: private: > 41: class IPInlineAttempt { > 42: private: Style: No initial `private:` as it's private by default. src/hotspot/share/opto/printinlining.hpp line 44: > 42: private: > 43: InliningResult _result; > 44: stringStream* _stream = nullptr; Style: Put initialization into ctrs. ------------- PR Review: https://git.openjdk.org/jdk/pull/21899#pullrequestreview-2534905667 PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1905770817 PR Review Comment: https://git.openjdk.org/jdk/pull/21899#discussion_r1905785411 From epeter at openjdk.org Tue Jan 7 17:36:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 7 Jan 2025 17:36:37 GMT Subject: [jdk24] RFR: 8346965: Multiple compiler/ciReplay test fails with -XX:+SegmentedCodeCache In-Reply-To: References: Message-ID: On Tue, 7 Jan 2025 15:14:34 GMT, SendaoYan wrote: > Hi all, > > This pull request contains a backport of commit [cf3e48e7](https://github.com/openjdk/jdk/commit/cf3e48e77172db7e27530af9754e1ead8d493f52) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by SendaoYan on 7 Jan 2025 and was reviewed by Vladimir Kozlov. > > Thanks! @sendaoYan the Affected version of JDK-8346965 only lists JDK25. Since you are backporting to JDK24, this must be incomplete... Can you add all affected versions and link the change that caused the issue? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22950#issuecomment-2575867980 From tweidmann at openjdk.org Tue Jan 7 17:52:40 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Tue, 7 Jan 2025 17:52:40 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: On Mon, 25 Nov 2024 21:07:40 GMT, Johan Sj?len wrote: >>> Do you think I should introduce an explicit synchronization mechanism to ensure the formatting is still correct with multiple compile threads? >> >> Yes, we could try grabbing the tty lock in dump(), but in the past I think there were sometimes problems with that approach, which is why there were places where we print everything to a stringStream first. > >> > Do you think I should introduce an explicit synchronization mechanism to ensure the formatting is still correct with multiple compile threads? >> >> Yes, we could try grabbing the tty lock in dump(), but in the past I think there were sometimes problems with that approach, which is why there were places where we print everything to a stringStream first. > > Don't use `ttyLock`, we really want to get rid of that mechanism. The best would be to port the output to UL, but if that's not possible use a `stringStream` as Dean said. @jdksjolen It's been a while since I was working on this, but if I remember correctly: The problem with the approach you suggest is that GrowableArray will fill the entire allocated buffer by calling the default constructor. Moving the new call into the constructor would therefore cause "n+1" heap allocations every time GrowableArray grows and some of these allocations might never be used. https://github.com/openjdk/jdk/blob/9702accdd9a25e05628d470bf248edd5d80c0c4d/src/hotspot/share/utilities/growableArray.hpp#L521-L534 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2575894097 From jbhateja at openjdk.org Tue Jan 7 17:56:24 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 7 Jan 2025 17:56:24 GMT Subject: RFR: 8342393: Promote commutative vector IR node sharing Message-ID: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> Patch promotes the sharing of commutative vector IR with the same inputs but different input ordering. Unlike scalar IR where we perform edge swapping by [sorting inputs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/addnode.cpp#L122) based on node indices during IR idealization, for vector IR we chose a simpler approach to decorate commutative operations with a special node-level flag during IR construction thus obviating any dependency on explicit idealization routines. This flag is later used during GVN hashing to enable node sharing. Following are the performance stats for JMH included with the patch. Granite Rapids (P-core Xeon Server) Baseline : Benchmark (size) Mode Cnt Score Error Units VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 8982.549 ops/ms VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 6072.773 ops/ms VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2368.856 ops/ms VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 15215.087 ops/ms Withopt: Benchmark (size) Mode Cnt Score Error Units VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 11963.554 ops/ms VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 7036.088 ops/ms VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2906.731 ops/ms VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 17148.131 ops/ms Sierra Forest (E-core Xeon Server) Baseline: Benchmark (size) Mode Cnt Score Error Units VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 2444.359 ops/ms VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 1710.256 ops/ms VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 308.766 ops/ms VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 3902.179 ops/ms Withopt: Benchmark (size) Mode Cnt Score Error Units VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 3352.839 ops/ms VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 2918.805 ops/ms VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 409.482 ops/ms VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 6955.057 ops/ms Please review and share your comments. Best Regards, Jatin ------------- Commit messages: - Adding functional and performance tests - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342393 - 8342393: Initial version Changes: https://git.openjdk.org/jdk/pull/22863/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22863&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342393 Stats: 389 lines in 5 files changed: 363 ins; 0 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/22863.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22863/head:pull/22863 PR: https://git.openjdk.org/jdk/pull/22863 From dlunden at openjdk.org Tue Jan 7 18:03:21 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 7 Jan 2025 18:03:21 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2] In-Reply-To: References: Message-ID: > When searching for load anti dependences in GCM, it is not always sufficient to just search starting at the direct initial memory input to the load. Specifically, there are cases when we must also search for anti dependences starting at relevant Phi memory nodes in between the load's early block and the initial memory input's block. Here, "in between" refers to blocks in the dominator tree in between the early and initial memory blocks. > > #### Example 1 > > Consider the ideal graph below. The initial memory for 183 loadI is 107 Phi and there is an important anti dependency for node 64 membar_release. To discover this anti dependency, we must rather search from 119 Phi which contains overlapping memory slices with 107 Phi. Looking at the ideal graph block view, we see that both 107 Phi and 119 Phi are in the initial memory block (B7) and thus dominate the early block (B20). If we only search from 107 Phi, we fail to add the anti dependency to 64 membar_release and do not force the load to schedule before 64 membar_release as we should. In the block view, we see that the load is actually scheduled in B24 _after_ a number of anti-dependent stores, the first of which is in block B20 (corresponding to the anti dependency on 64 membar_release). The result is the failure we see in this issue (we load the wrong value). > > ![failure-graph-1](https://github.com/user-attachments/assets/e5458646-7a5c-40e1-b1d8-e3f101e29b73) > ![failure-blocks-1](https://github.com/user-attachments/assets/a0b1f724-0809-4b2f-9feb-93e9c59a5d6a) > > #### Example 2 > > There are also situations when we need to start searching from Phis that are strictly in between the initial memory block and early block. Consider the ideal graph below. The initial memory for 100 loadI is 18 MachProj, but we also need to search from 76 Phi to find that we must raise the LCA to the last block on the path between 76 Phi and 75 Phi: B9 (= the load's early block). If we do not search from 76 Phi, the load is again likely scheduled too late (in B11 in the example) after anti-dependent stores (the first of which corresponds to 58 membar_release in B10). Note that the block B6 for 76 Phi is strictly dominated by the initial memory block B2 and also strictly dominates the early block B9. > > ![failure-graph-2](https://github.com/user-attachments/assets/ede0c299-6251-4ff8-8b84-af40a1ee9e8c) > ![failure-blocks-2](https://github.com/user-attachments/assets/e5a87e43-b6fe-4fa3-8961-54752f63633e) > > ### Changeset > > - Update `PhaseCFG::insert... Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Updates after comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22852/files - new: https://git.openjdk.org/jdk/pull/22852/files/e154c9f0..9ec33d53 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22852&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22852&range=00-01 Stats: 24 lines in 1 file changed: 14 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/22852.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22852/head:pull/22852 PR: https://git.openjdk.org/jdk/pull/22852 From dlunden at openjdk.org Tue Jan 7 18:03:21 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 7 Jan 2025 18:03:21 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2] In-Reply-To: References: Message-ID: On Mon, 23 Dec 2024 11:10:35 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates after comments > > Good analysis, Daniel! Given the presence of overlapping memory phis (memory phis that are placed in the same block and include aliasing memory slices), the general idea of this fix seems reasonable to me. As a more fundamental solution, it would be worth investigating (perhaps separately) the root cause of this overlap and exploring whether it is feasible to enforce disjointness (an invariant apparently assumed by the original `PhaseCFG::insert_anti_dependences` algorithm), at least during code generation. > > Does the comment above the definition of `initial_mem` require any update as part of this change? Thanks for the review @robcasloz! > Given the presence of overlapping memory phis (memory phis that are placed in the same block and include aliasing memory slices), the general idea of this fix seems reasonable to me. As a more fundamental solution, it would be worth investigating (perhaps separately) the root cause of this overlap and exploring whether it is feasible to enforce disjointness (an invariant apparently assumed by the original PhaseCFG::insert_anti_dependences algorithm), at least during code generation. Yes, I think it could be useful to investigate further. Based on my observations while working on this issue, the overlapping memory Phis likely result from loop peeling. However, the Phi overlap is not the sole cause of this issue, as the second example demonstrates. I suggest we write an RFE and investigate in a separate issue. > Does the comment above the definition of initial_mem require any update as part of this change? Yes, thanks. Added now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22852#issuecomment-2575915756 From dlunden at openjdk.org Tue Jan 7 18:06:37 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 7 Jan 2025 18:06:37 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2] In-Reply-To: References: Message-ID: On Mon, 23 Dec 2024 10:52:35 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates after comments > > src/hotspot/share/opto/gcm.cpp line 773: > >> 771: worklist_def_use_mem_states.push(nullptr, initial_mem); >> 772: Block* initial_mem_block = get_block_for_node(initial_mem); >> 773: if (load->in(0) && initial_mem_block != nullptr) { > > What would be the conditions needed for `initial_mem_block == nullptr`? (I did a quick run with an additional assertion and could not find any). It would be great to narrow down this to understand better the completeness of the fix and convince ourselves we are not leaving interesting cases unaddressed. This was more of a sanity check, because I did notice memory nodes without a block elsewhere in the graph at this stage of compilation. But, you are right. It seems initial memory for loads here always have a block. I replaced the check with an assert and reran tests, and it looks fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1905860244 From jbhateja at openjdk.org Tue Jan 7 18:06:50 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 7 Jan 2025 18:06:50 GMT Subject: RFR: 8342393: Promote commutative vector IR node sharing [v2] In-Reply-To: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> References: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> Message-ID: <9Mx8rZTEbpj0Cxbxl6qYDSipvRA0wWYhjcw2BI7gYzo=.2217a5a9-f031-43db-b0ed-3220a38de382@github.com> > Patch promotes the sharing of commutative vector IR with the same inputs but different input ordering. > Unlike scalar IR where we perform edge swapping by [sorting inputs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/addnode.cpp#L122) based on node indices during IR idealization, for vector IR we chose a simpler approach to decorate commutative operations with a special node-level flag during IR construction thus > obviating any dependency on explicit idealization routines. This flag is later used during GVN hashing to enable node sharing. > > Following are the performance stats for JMH included with the patch. > > > Granite Rapids (P-core Xeon Server) > Baseline : > Benchmark (size) Mode Cnt Score Error Units > VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 8982.549 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 6072.773 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2368.856 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 15215.087 ops/ms > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 11963.554 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 7036.088 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2906.731 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 17148.131 ops/ms > > Sierra Forest (E-core Xeon Server) > Baseline: > Benchmark (size) Mode Cnt Score Error Units > VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 2444.359 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 1710.256 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 308.766 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 390... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: removing spaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22863/files - new: https://git.openjdk.org/jdk/pull/22863/files/21b2ffd8..f42645a1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22863&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22863&range=00-01 Stats: 5 lines in 2 files changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22863.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22863/head:pull/22863 PR: https://git.openjdk.org/jdk/pull/22863 From dlunden at openjdk.org Tue Jan 7 18:10:36 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 7 Jan 2025 18:10:36 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2] In-Reply-To: References: Message-ID: On Mon, 23 Dec 2024 10:58:56 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates after comments > > src/hotspot/share/opto/gcm.cpp line 783: > >> 781: // Stop searching when we run out of dominators (b == nullptr) or when we >> 782: // step past the initial memory block (b == initial_mem_block->_idom). >> 783: while (b != nullptr && b != initial_mem_block->_idom) { > > Can `b` be `nullptr` here? The highest up we can go in the dominator tree is the start block, whose immediate dominator is the root block, no? Thanks, you are right. `b` is never `nullptr` and the second condition on its own is sufficient. I've replaced the first condition with an assert. > src/hotspot/share/opto/gcm.cpp line 785: > >> 783: while (b != nullptr && b != initial_mem_block->_idom) { >> 784: if (b == initial_mem_block && !initial_mem->is_Phi()) { >> 785: break; > > Could you add a brief code comment here explaining why the early break? Sure, now added! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1905863109 PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1905863951 From dlunden at openjdk.org Tue Jan 7 18:13:37 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 7 Jan 2025 18:13:37 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2] In-Reply-To: References: Message-ID: On Mon, 23 Dec 2024 11:03:17 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates after comments > > src/hotspot/share/opto/gcm.cpp line 788: > >> 786: } >> 787: for (uint i = 0; i < b->number_of_nodes(); ++i) { >> 788: Node* n = b->get_node(i); > > For sanity (and efficiency), we might want to (in a separate RFE, and if at all possible) make GCM insert Phi nodes to blocks before any Mach node, and add an early break here. Meanwhile, could you add a comment here explaining that we have to traverse the entire block because Phi nodes might be interleaved with Mach nodes as LCM may not have run yet? Sure, I've now added a comment. It seems that Phi nodes are usually predictably located close to the beginning of blocks, but there are exceptions. Sounds good to investigate separately, I'll create an RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1905867175 From dlunden at openjdk.org Tue Jan 7 18:16:42 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 7 Jan 2025 18:16:42 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2] In-Reply-To: References: Message-ID: On Mon, 23 Dec 2024 11:04:54 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates after comments > > src/hotspot/share/opto/gcm.cpp line 790: > >> 788: Node* n = b->get_node(i); >> 789: if (n->is_memory_phi() && C->can_alias(n->adr_type(), load_alias_idx)) { >> 790: worklist_def_use_mem_states.push(nullptr, n); > > We may push `` again here, is that an issue? If not, maybe add a comment explaining why. It is not a problem because the `push` method includes duplication handling for Phis. However, I think it is better for both clarity and efficiency to just check `n != initial_mem` here explicitly (now added). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1905869926 From shade at openjdk.org Tue Jan 7 18:21:39 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 7 Jan 2025 18:21:39 GMT Subject: RFR: 8347127: CTW fails to build after JDK-8334733 In-Reply-To: References: Message-ID: On Tue, 7 Jan 2025 16:39:43 GMT, Aleksey Shipilev wrote: > [JDK-8334733](https://bugs.openjdk.org/browse/JDK-8334733) removed the filter for `ModuleInfoWriter`, which now causes standalone CTW to fail when building: > > > $ export JAVA_HOME= > $ export PATH=$JAVA_HOME/bin:$PATH > $ cd test/hotspot/jtreg/testlibrary/ctw > $ make > > /home/shipilev/shipilev-jdk/build/linux-x86_64-server-fastdebug/images/jdk/bin/../bin/javac --add-exports java.base/jdk.internal.jimage=ALL-UNNAMED --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/jdk.internal.reflect=ALL-UNNAMED --add-exports java.base/jdk.internal.access=ALL-UNNAMED -sourcepath src -d build/classes -cp dist/wb.jar @filelist > ../../../../../test/lib/jdk/test/lib/util/ModuleInfoWriter.java:44: error: package jdk.internal.module is not visible > import jdk.internal.module.ModuleResolution; > ^ > (package jdk.internal.module is declared in module java.base, which does not export it to the unnamed module) > ../../../../../test/lib/jdk/test/lib/util/ModuleInfoWriter.java:45: error: package jdk.internal.module is not visible > import jdk.internal.module.ModuleTarget; > ^ > (package jdk.internal.module is declared in module java.base, which does not export it to the unnamed module) > Note: Some input files use unchecked or unsafe operations. > Note: Recompile with -Xlint:unchecked for details. > 2 errors > make: *** [dist/ctw.jar] Error 1 > > > Additional testing: > - [x] CTW `make` works now > - [x] Standalone CTW works now Thanks! I am going to integrate soon under triviality rule. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22952#issuecomment-2575951617 From kvn at openjdk.org Tue Jan 7 18:29:44 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 7 Jan 2025 18:29:44 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing In-Reply-To: References: Message-ID: <3T6TjR9TRTlU6nvFobZ5zepPOUuejca18NlXbh5AnyE=.943900a8-296b-4376-b8c0-7275c68f7e49@github.com> On Tue, 17 Dec 2024 09:01:57 GMT, Theo Weidmann wrote: > C2 currently emits runtime calls if the platform rules do not support lowering floating point remainder operations. For example, for float: > > https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L2305-L2318 > > https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L1099-L1109 > > The only platform, which currently supports this, however, is x86_32. On all other platforms, runtime calls are generated directly during parsing, which prevent any constant folding or other idealizations. Even C1 can perform these optimizations, resulting in significantly lower C2 performance compared to C1 for simple test cases. This function was observed to be around 15x slower with C2 compared to C1 due to redundant runtime calls: > > > public static double process(final double x) { > double w = (double) 0.1; > double p = 0; > p = (double) (3.109615012413746E307 % (w % Z)); > p = (double) (7.614949555185036E307 / (x % x)); // <- return value only dependends on this line > return (double) (x * p); > } > > > To fix this, this PR turns ModFNode and ModDNode into macros, which are always created during parsing. They support idealization (constant folding) and are lowered to runtime calls during macro expansion. For simplicity, these operations will now also call into the runtime on x86_32, as this platform is deprecated. Most likely it will affect performance of 32-bit x86 since it was simple nodes and not Call nodes. But the platforma is going away so it should be fine. You need to treat new Mod nodes as leaf calls without side effects instead of general Call nodes. src/hotspot/share/opto/callnode.cpp line 721: > 719: const Type *CallNode::bottom_type() const { return tf()->range(); } > 720: const Type* CallNode::Value(PhaseGVN* phase) const { > 721: if (!in(0) || phase->type(in(0)) == Type::TOP) { We use explicit compare `in(0) == nullptr`. src/hotspot/share/opto/divnode.hpp line 145: > 143: > 144: // Base class for float and double modulus > 145: class ModFloatingNode : public CallNode { I think it should subclass `CallLeafNode` class since Mod runtime functions do not have side effects. src/hotspot/share/opto/macro.cpp line 2602: > 2600: call->init_req(TypeFunc::Parms + i, mod_macro->in(TypeFunc::Parms + i)); > 2601: } > 2602: call->copy_call_debug_info(&_igvn, call); This is used only for `CallJavaNode`. src/hotspot/share/opto/parse2.cpp line 1100: > 1098: > 1099: Node* Parse::floating_point_mod(Node* a, Node* b, bool dbl) { > 1100: CallNode* mod = dbl ? static_cast(new ModDNode(C, a, b)) : new ModFNode(C, a, b); Why you need the case for `ModDNode`? src/hotspot/share/opto/parse2.cpp line 1103: > 1101: > 1102: Node* prev_mem = set_predefined_input_for_runtime_call(mod); > 1103: mod = _gvn.transform(mod)->as_Call(); Is `as_Call()` used to check with assert? ------------- PR Review: https://git.openjdk.org/jdk/pull/22786#pullrequestreview-2534921424 PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1905780293 PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1905867970 PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1905877628 PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1905785504 PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1905788193 From shade at openjdk.org Tue Jan 7 19:36:55 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 7 Jan 2025 19:36:55 GMT Subject: Integrated: 8347127: CTW fails to build after JDK-8334733 In-Reply-To: References: Message-ID: On Tue, 7 Jan 2025 16:39:43 GMT, Aleksey Shipilev wrote: > [JDK-8334733](https://bugs.openjdk.org/browse/JDK-8334733) removed the filter for `ModuleInfoWriter`, which now causes standalone CTW to fail when building: > > > $ export JAVA_HOME= > $ export PATH=$JAVA_HOME/bin:$PATH > $ cd test/hotspot/jtreg/testlibrary/ctw > $ make > > /home/shipilev/shipilev-jdk/build/linux-x86_64-server-fastdebug/images/jdk/bin/../bin/javac --add-exports java.base/jdk.internal.jimage=ALL-UNNAMED --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/jdk.internal.reflect=ALL-UNNAMED --add-exports java.base/jdk.internal.access=ALL-UNNAMED -sourcepath src -d build/classes -cp dist/wb.jar @filelist > ../../../../../test/lib/jdk/test/lib/util/ModuleInfoWriter.java:44: error: package jdk.internal.module is not visible > import jdk.internal.module.ModuleResolution; > ^ > (package jdk.internal.module is declared in module java.base, which does not export it to the unnamed module) > ../../../../../test/lib/jdk/test/lib/util/ModuleInfoWriter.java:45: error: package jdk.internal.module is not visible > import jdk.internal.module.ModuleTarget; > ^ > (package jdk.internal.module is declared in module java.base, which does not export it to the unnamed module) > Note: Some input files use unchecked or unsafe operations. > Note: Recompile with -Xlint:unchecked for details. > 2 errors > make: *** [dist/ctw.jar] Error 1 > > > Additional testing: > - [x] CTW `make` works now > - [x] Standalone CTW works now This pull request has now been integrated. Changeset: e413fc64 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/e413fc643c4a58e3c46d81025c3ac9fbf89db4b9 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8347127: CTW fails to build after JDK-8334733 Reviewed-by: kvn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/22952 From vlivanov at openjdk.org Tue Jan 7 21:02:15 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 7 Jan 2025 21:02:15 GMT Subject: RFR: 8342393: Promote commutative vector IR node sharing [v2] In-Reply-To: <9Mx8rZTEbpj0Cxbxl6qYDSipvRA0wWYhjcw2BI7gYzo=.2217a5a9-f031-43db-b0ed-3220a38de382@github.com> References: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> <9Mx8rZTEbpj0Cxbxl6qYDSipvRA0wWYhjcw2BI7gYzo=.2217a5a9-f031-43db-b0ed-3220a38de382@github.com> Message-ID: On Tue, 7 Jan 2025 18:06:50 GMT, Jatin Bhateja wrote: >> Patch promotes the sharing of commutative vector IR with the same inputs but different input ordering. >> Unlike scalar IR where we perform edge swapping by [sorting inputs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/addnode.cpp#L122) based on node indices during IR idealization, for vector IR we chose a simpler approach to decorate commutative operations with a special node-level flag during IR construction thus >> obviating any dependency on explicit idealization routines. This flag is later used during GVN hashing to enable node sharing. >> >> Following are the performance stats for JMH micro included with the patch. >> >> >> Granite Rapids (P-core Xeon Server) >> Baseline : >> Benchmark (size) Mode Cnt Score Error Units >> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 8982.549 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 6072.773 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2368.856 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 15215.087 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 11963.554 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 7036.088 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2906.731 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 17148.131 ops/ms >> >> Sierra Forest (E-core Xeon Server) >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 2444.359 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 1710.256 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 308.766 ops/ms >> VectorCommutativeOperSharingBenc... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > removing spaces Nice improvement, Jatin. src/hotspot/share/opto/phaseX.cpp line 144: > 142: // node sharing. > 143: if (n->is_commutative_operation() && k->in(0) == n->in(0) && req == 3) { > 144: if (!((n->in(1) == k->in(1) && n->in(2) == k->in(2)) || I find this piece hard to read. What do you think about the following? if (n->is_commutative_operation()) { assert(k->is_commutative_operation(), ""); assert(req == 3, ""); if (k->in(0) == n->in(0) && (k->in(1) == n->in(1) || k->in(1) == n->in(2)) && (k->in(2) == n->in(1) || k->in(2) == n->in(2))) { // nodes are equal } else { goto collision; } } else { ... src/hotspot/share/opto/vectornode.hpp line 78: > 76: virtual uint hash() const { > 77: if (is_commutative_operation()) { > 78: return (uintptr_t)in(1) + (uintptr_t)in(2) + Opcode(); Commutative implies the operation is binary (`req == 3`). Should there be an assert somewhere to ensure it's always the case (ideally, when marking a node as commutative during construction)? ------------- PR Review: https://git.openjdk.org/jdk/pull/22863#pullrequestreview-2535334749 PR Review Comment: https://git.openjdk.org/jdk/pull/22863#discussion_r1906032701 PR Review Comment: https://git.openjdk.org/jdk/pull/22863#discussion_r1906020222 From vlivanov at openjdk.org Tue Jan 7 22:50:46 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 7 Jan 2025 22:50:46 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v6] In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 02:23:30 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. >> >> In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Implement apply_identity Were there any experiments conducted to port existing lowering transformations to the new pass? As we discussed before, there are multiple places in the code where lowering takes place. It is still not clear to me how much proposed solution unifies across existing use cases. What I'd really like to avoid is yet another peculiar way to perform lowering transformations in C2. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2576372165 From dhanalla at openjdk.org Wed Jan 8 00:33:59 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Wed, 8 Jan 2025 00:33:59 GMT Subject: RFR: 8341293: Split field loads through Nested Phis [v3] In-Reply-To: References: Message-ID: > As an extension of the work done as part of https://github.com/openjdk/jdk/pull/12897, split the field loads (AddP -> Load*) with nested phi parent nodes to enable more scalar replacements, thereby reducing memory allocation. > > > Here are the sequence of Ideal graph transformations for Nested phi: > > > ![image](https://github.com/user-attachments/assets/c18e5ca0-c554-475c-814a-7cb288d96569) > > ![image](https://github.com/user-attachments/assets/b279b5f2-9ec6-4d9b-a627-506451f1cf81) > > ![image](https://github.com/user-attachments/assets/f506b918-2dd0-4dbe-a440-ff253afa3961) > > JMH results: > with disabled RAM > > Benchmark Mode Cnt Score Error Units > NestedPhiAndRematerialize.NopRAM.testBailOut_runner avgt 15 13.969 ? 0.248 ms/op > NestedPhiAndRematerialize.NopRAM.testFieldEscapeWithMerge_runner avgt 15 80.300 ? 4.306 ms/op > NestedPhiAndRematerialize.NopRAM.testMerge_TryCatchFinally_runner avgt 15 72.182 ? 1.781 ms/op > NestedPhiAndRematerialize.NopRAM.testMultiParentPhi_runner avgt 15 2.983 ? 0.001 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhiPolymorphic_runner avgt 15 18.342 ? 0.731 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhiProcessOrder_runner avgt 15 14.315 ? 0.443 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhiWithLambda_runner avgt 15 18.511 ? 1.212 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhiWithTrap_runner avgt 15 66.277 ? 1.478 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhi_FieldLoad_runner avgt 15 17.968 ? 0.306 ms/op > NestedPhiAndRematerialize.NopRAM.testNestedPhi_TryCatch_runner avgt 15 14.186 ? 0.247 ms/op > NestedPhiAndRematerialize.NopRAM.testRematerialize_MultiObj_runner avgt 15 88.435 ? 4.869 ms/op > NestedPhiAndRematerialize.NopRAM.testRematerialize_SingleObj_runner avgt 15 29560.130 ? 48.797 ms/op > NestedPhiAndRematerialize.NopRAM.testRematerialize_TryCatch_runner avgt 15 49.150 ? 2.307 ms/op > NestedPhiAndRematerialize.NopRAM.testThreeLevelNestedPhi_runner avgt 15 18.236 ? 0.308 ms/op > > with enabled RAM > Benchmark Mode Cnt Score Error Units > NestedPhiAndRematerialize.YesRAM.testBailOut_runner avgt 15 3.257 ? 0.423 ms/op > NestedPhiAndRematerialize.YesRAM.testFieldEscapeWithMerge_runner avgt 15 79.916 ? 3.477 ms/op > NestedPhiAndRematerialize.YesRAM.testMerge_TryCatchFinally_runner avgt 15 72.053 ? 1.916 ms/op > NestedPhiAndRematerialize.YesRAM.testMultiParentPhi_runner avgt 15 2.984 ? 0.001 ms/op > NestedPhiAndRematerialize.YesRAM.testNestedPhiPolymorphic_runner avgt 15 18.309 ? 0.706 ms/op > NestedPhiAndRematerialize.YesRAM.testNestedPhiProces... Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: update bug id in test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21270/files - new: https://git.openjdk.org/jdk/pull/21270/files/811232d4..9a1fb48a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21270&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21270&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21270.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21270/head:pull/21270 PR: https://git.openjdk.org/jdk/pull/21270 From dhanalla at openjdk.org Wed Jan 8 00:38:39 2025 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Wed, 8 Jan 2025 00:38:39 GMT Subject: RFR: 8341293: Split field loads through Nested Phis [v2] In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 07:56:46 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesNestedPhiTests.java line 34: >> >>> 32: * @summary Tests that C2 can correctly scalar replace some object allocation merges. >>> 33: * @library /test/lib / >>> 34: * @requires vm.debug == true & vm.flagless & vm.bits == 64 & vm.compiler2.enabled & vm.opt.final.EliminateAllocations >> >> Do you need all of these? Or is it just that IR rules are failing otherwise? >> If it is just about the IR rules, you can restrict IR rules with `applyIf...` > > Generally it is nice if tests can run on as many platforms, compilers and flags as possible. But of course IR rules can only apply under specific circumstances. These compiler flags are common for all the tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r1906191586 From syan at openjdk.org Wed Jan 8 02:40:41 2025 From: syan at openjdk.org (SendaoYan) Date: Wed, 8 Jan 2025 02:40:41 GMT Subject: [jdk24] RFR: 8346965: Multiple compiler/ciReplay test fails with -XX:+SegmentedCodeCache In-Reply-To: References: Message-ID: <411pqFXlbWE-kI_RzcfhSVHNWyyAue_QUmJXdwAmbn4=.b3050c1c-841d-44bb-a896-4574166caa9f@github.com> On Tue, 7 Jan 2025 17:34:14 GMT, Emanuel Peter wrote: > @sendaoYan the Affected version of JDK-8346965 only lists JDK25. Since you are backporting to JDK24, this must be incomplete... Can you add all affected versions and link the change that caused the issue? Sorry, the all affected versions has been updated. And the caused link has been added, ------------- PR Comment: https://git.openjdk.org/jdk/pull/22950#issuecomment-2576589476 From epeter at openjdk.org Wed Jan 8 06:50:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 06:50:36 GMT Subject: [jdk24] RFR: 8346965: Multiple compiler/ciReplay test fails with -XX:+SegmentedCodeCache In-Reply-To: References: Message-ID: On Tue, 7 Jan 2025 15:14:34 GMT, SendaoYan wrote: > Hi all, > > This pull request contains a backport of commit [cf3e48e7](https://github.com/openjdk/jdk/commit/cf3e48e77172db7e27530af9754e1ead8d493f52) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by SendaoYan on 7 Jan 2025 and was reviewed by Vladimir Kozlov. > > Thanks! Backport looks good. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22950#pullrequestreview-2536142158 From epeter at openjdk.org Wed Jan 8 06:50:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 06:50:36 GMT Subject: [jdk24] RFR: 8346965: Multiple compiler/ciReplay test fails with -XX:+SegmentedCodeCache In-Reply-To: <411pqFXlbWE-kI_RzcfhSVHNWyyAue_QUmJXdwAmbn4=.b3050c1c-841d-44bb-a896-4574166caa9f@github.com> References: <411pqFXlbWE-kI_RzcfhSVHNWyyAue_QUmJXdwAmbn4=.b3050c1c-841d-44bb-a896-4574166caa9f@github.com> Message-ID: On Wed, 8 Jan 2025 02:38:09 GMT, SendaoYan wrote: >> @sendaoYan the Affected version of JDK-8346965 only lists JDK25. >> Since you are backporting to JDK24, this must be incomplete... Can you add all affected versions and link the change that caused the issue? > >> @sendaoYan the Affected version of JDK-8346965 only lists JDK25. Since you are backporting to JDK24, this must be incomplete... Can you add all affected versions and link the change that caused the issue? > > Sorry, the all affected versions has been updated. And the caused link has been added, @sendaoYan But the likely cause [JDK-8311248](https://bugs.openjdk.org/browse/JDK-8311248) was integrated in JDK23, so should that version not also be in the affected list? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22950#issuecomment-2576868851 From epeter at openjdk.org Wed Jan 8 07:26:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 07:26:36 GMT Subject: RFR: 8346836: C2: Introduce a way to verify the correctness of ConstraintCastNodes at runtime In-Reply-To: References: Message-ID: <-AHr3PMJ65vouvjHIlhWR2zvJL_4IVyOtT3oJQoJxuQ=.70193b5b-c44c-4e34-929c-1036b68f5e55@github.com> On Wed, 25 Dec 2024 14:54:02 GMT, Quan Anh Mai wrote: > Hi, > > This patch adds a develop flag `VerifyConstraintCasts`, which will verify the correctness of `CastIINode`s and `CastLLNode`s at runtime and crash the VM if the dynamic value lies outside the type value range. > > Please take a look, thanks a lot. Looks interesting :) Would we have caught any bug with this, do you have an example? If I remember correctly, we relax/widen the Cast ranges somewhere later in optimizations, so that different CastII etc can common. Probably happens after loop-opts. So the ranges usually go from `[1..10]` -> `[0, max]` or `[-1 .. 1]` -> `int`. So this verification would then not be super effective, right? Things might have gone wrong much earlier with bad assumptions. I mean it could still catch issues, but I'm not sure how likely that is? TLDR: I'd like some more context / motivation for this patch ;) And: you should have at least one plain test where you enable the flag, and it compiles everything required to run an empty `main` function. Ah, I actually see that you have some examples. So you plan on introducing this flag first, and only then fixing the issues? But does it fail with a simple `java --version`? Or an empty `main` method, maybe with `-Xcomp`? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22880#pullrequestreview-2536189779 PR Comment: https://git.openjdk.org/jdk/pull/22880#issuecomment-2576918982 From qamai at openjdk.org Wed Jan 8 07:50:44 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 8 Jan 2025 07:50:44 GMT Subject: RFR: 8346836: C2: Introduce a way to verify the correctness of ConstraintCastNodes at runtime In-Reply-To: <-AHr3PMJ65vouvjHIlhWR2zvJL_4IVyOtT3oJQoJxuQ=.70193b5b-c44c-4e34-929c-1036b68f5e55@github.com> References: <-AHr3PMJ65vouvjHIlhWR2zvJL_4IVyOtT3oJQoJxuQ=.70193b5b-c44c-4e34-929c-1036b68f5e55@github.com> Message-ID: On Wed, 8 Jan 2025 07:24:18 GMT, Emanuel Peter wrote: >> Hi, >> >> This patch adds a develop flag `VerifyConstraintCasts`, which will verify the correctness of `CastIINode`s and `CastLLNode`s at runtime and crash the VM if the dynamic value lies outside the type value range. >> >> Please take a look, thanks a lot. > > Ah, I actually see that you have some examples. So you plan on introducing this flag first, and only then fixing the issues? But does it fail with a simple `java --version`? Or an empty `main` method, maybe with `-Xcomp`? @eme64 Thanks for looking at this The context is that while reviewing #22666 I came to the conclusion that our handling of `depends_only_on_test` is broken. I have added a comment explaining my understanding and concerns there. In principle, before the execution, a `DivINode` is the same as a `CastIINode` which limits the value range of the divisor to `!= 0`. As a result, there should not be any difference in the way we handle the movements of these nodes. This leads me to the conclusion that `CastIINode`s may also be wired to the wrong control input, the reason we have not caught them is that unlike a division complaining loudly, a `CastIINode` will silently accept incorrect input values. This motivates me to make this patch. > If I remember correctly, we relax/widen the Cast ranges somewhere later in optimizations, so that different CastII etc can common. Probably happens after loop-opts. So the ranges usually go from `[1..10]` -> `[0, max]` or `[-1 .. 1]` -> `int`. You are right, it is in `ConstraintCastNode::widen_type` for which I will disable that widening in the presence of `VerifyConstraintCasts`. > So you plan on introducing this flag first, and only then fixing the issues? There are several failures in tier 1 alone, and this flag is not enabled by default or in the pipeline, so I think incorporating it first would be preferable, then after fixing all the issues we can add it to the stress options. > But does it fail with a simple `java --version`? Or an empty `main` method, maybe with `-Xcomp`? No it does not fail with `--version` or with an empty `main` method with and without `-Xcomp`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22880#issuecomment-2576960560 From epeter at openjdk.org Wed Jan 8 08:25:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 08:25:49 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing In-Reply-To: <3T6TjR9TRTlU6nvFobZ5zepPOUuejca18NlXbh5AnyE=.943900a8-296b-4376-b8c0-7275c68f7e49@github.com> References: <3T6TjR9TRTlU6nvFobZ5zepPOUuejca18NlXbh5AnyE=.943900a8-296b-4376-b8c0-7275c68f7e49@github.com> Message-ID: On Tue, 7 Jan 2025 16:54:46 GMT, Vladimir Kozlov wrote: >> C2 currently emits runtime calls if the platform rules do not support lowering floating point remainder operations. For example, for float: >> >> https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L2305-L2318 >> >> https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L1099-L1109 >> >> The only platform, which currently supports this, however, is x86_32. On all other platforms, runtime calls are generated directly during parsing, which prevent any constant folding or other idealizations. Even C1 can perform these optimizations, resulting in significantly lower C2 performance compared to C1 for simple test cases. This function was observed to be around 15x slower with C2 compared to C1 due to redundant runtime calls: >> >> >> public static double process(final double x) { >> double w = (double) 0.1; >> double p = 0; >> p = (double) (3.109615012413746E307 % (w % Z)); >> p = (double) (7.614949555185036E307 / (x % x)); // <- return value only dependends on this line >> return (double) (x * p); >> } >> >> >> To fix this, this PR turns ModFNode and ModDNode into macros, which are always created during parsing. They support idealization (constant folding) and are lowered to runtime calls during macro expansion. For simplicity, these operations will now also call into the runtime on x86_32, as this platform is deprecated. > > src/hotspot/share/opto/callnode.cpp line 721: > >> 719: const Type *CallNode::bottom_type() const { return tf()->range(); } >> 720: const Type* CallNode::Value(PhaseGVN* phase) const { >> 721: if (!in(0) || phase->type(in(0)) == Type::TOP) { > > We use explicit compare `in(0) == nullptr`. Suggestion: if (in(0) == nullptr || phase->type(in(0)) == Type::TOP) { `Do not use ints or pointers as (implicit) booleans with &&, ||, if, while. Instead, compare explicitly, i.e. if (x != 0) or if (ptr != nullptr), etc.` See https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md > src/hotspot/share/opto/parse2.cpp line 1100: > >> 1098: >> 1099: Node* Parse::floating_point_mod(Node* a, Node* b, bool dbl) { >> 1100: CallNode* mod = dbl ? static_cast(new ModDNode(C, a, b)) : new ModFNode(C, a, b); > > Why you need the case for `ModDNode`? He has a call from `Bytecodes::_frem:` and from `Bytecodes::_drem:`. Why not make it a `BasicType bt` instead of `dbl`, and then switch on that? Might be more readable than true / false. I read `floating_point_mod(a, b, true)`, and am not sure what the `true` does. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1906654174 PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1906709210 From epeter at openjdk.org Wed Jan 8 08:25:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 08:25:49 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing In-Reply-To: References: <3T6TjR9TRTlU6nvFobZ5zepPOUuejca18NlXbh5AnyE=.943900a8-296b-4376-b8c0-7275c68f7e49@github.com> Message-ID: On Wed, 8 Jan 2025 07:56:22 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/callnode.cpp line 721: >> >>> 719: const Type *CallNode::bottom_type() const { return tf()->range(); } >>> 720: const Type* CallNode::Value(PhaseGVN* phase) const { >>> 721: if (!in(0) || phase->type(in(0)) == Type::TOP) { >> >> We use explicit compare `in(0) == nullptr`. > > Suggestion: > > if (in(0) == nullptr || phase->type(in(0)) == Type::TOP) { > > > `Do not use ints or pointers as (implicit) booleans with &&, ||, if, while. Instead, compare explicitly, i.e. if (x != 0) or if (ptr != nullptr), etc.` > > See > https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md Why can this now be nullptr? >> src/hotspot/share/opto/parse2.cpp line 1100: >> >>> 1098: >>> 1099: Node* Parse::floating_point_mod(Node* a, Node* b, bool dbl) { >>> 1100: CallNode* mod = dbl ? static_cast(new ModDNode(C, a, b)) : new ModFNode(C, a, b); >> >> Why you need the case for `ModDNode`? > > He has a call from `Bytecodes::_frem:` and from `Bytecodes::_drem:`. > > Why not make it a `BasicType bt` instead of `dbl`, and then switch on that? Might be more readable than true / false. > I read `floating_point_mod(a, b, true)`, and am not sure what the `true` does. Why do you need the `static_cast`? I mean why not use the common type `ModFloatingNode*`, which is a subtype of `CallNode*`, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1906656119 PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1906716363 From epeter at openjdk.org Wed Jan 8 08:25:47 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 08:25:47 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing In-Reply-To: References: Message-ID: <26SaQSwYmaVrCRI7OdE7LSanlixD2iV5zvaNqAwfz8Y=.9c8d450e-7453-4f15-8b91-dc612632a931@github.com> On Tue, 17 Dec 2024 09:01:57 GMT, Theo Weidmann wrote: > C2 currently emits runtime calls if the platform rules do not support lowering floating point remainder operations. For example, for float: > > https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L2305-L2318 > > https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L1099-L1109 > > The only platform, which currently supports this, however, is x86_32. On all other platforms, runtime calls are generated directly during parsing, which prevent any constant folding or other idealizations. Even C1 can perform these optimizations, resulting in significantly lower C2 performance compared to C1 for simple test cases. This function was observed to be around 15x slower with C2 compared to C1 due to redundant runtime calls: > > > public static double process(final double x) { > double w = (double) 0.1; > double p = 0; > p = (double) (3.109615012413746E307 % (w % Z)); > p = (double) (7.614949555185036E307 / (x % x)); // <- return value only dependends on this line > return (double) (x * p); > } > > > To fix this, this PR turns ModFNode and ModDNode into macros, which are always created during parsing. They support idealization (constant folding) and are lowered to runtime calls during macro expansion. For simplicity, these operations will now also call into the runtime on x86_32, as this platform is deprecated. Thanks for working on this! Generally looks reasonable to me, though I left a few comments. src/hotspot/share/opto/divnode.cpp line 61: > 59: init_req(TypeFunc::Parms + 0, a); > 60: init_req(TypeFunc::Parms + 1, b); > 61: } Is there a reason to put this in the cpp file? I think I usually see constructors for Nodes in the hpp file. Nitpicky sorry ? src/hotspot/share/opto/divnode.cpp line 1400: > 1398: > 1399: //============================================================================= > 1400: //------------------------------Idealize--------------------------------------- I would just remove these lines if you are already toughing the code here: `//------------------------------Idealize---------------------------------------` src/hotspot/share/opto/divnode.cpp line 1401: > 1399: //============================================================================= > 1400: //------------------------------Idealize--------------------------------------- > 1401: Node *UModLNode::Ideal(PhaseGVN *phase, bool can_reshape) { Suggestion: Node* UModLNode::Ideal(PhaseGVN* phase, bool can_reshape) { Ah, maybe you did not mean to touch it, but on GitHub it looks like you did... Maybe you just reordered things. src/hotspot/share/opto/divnode.cpp line 1409: > 1407: } > 1408: > 1409: Node* ModFNode::Ideal(PhaseGVN* phase, bool can_reshape) { Can you quickly say why you congerted this from a `Value` to an `Ideal` method? I guess it is because before it used to be a simple `Node` with a single output, but now it is a `Call` with multiple outputs... Ok makes sense. src/hotspot/share/opto/divnode.cpp line 1419: > 1417: if (t1 == Type::TOP || t2 == Type::TOP) { > 1418: return nullptr; > 1419: } The comment seems to contradict the result. You could return `C->top()`, right? src/hotspot/share/opto/macro.cpp line 2224: > 2222: > 2223: // Make slow path call > 2224: CallNode *call = make_slow_call(lock, OptoRuntime::complete_monitor_enter_Type(), Suggestion: CallNode* call = make_slow_call(lock, OptoRuntime::complete_monitor_enter_Type(), src/hotspot/share/opto/macro.cpp line 2592: > 2590: CallNode* mod_macro = n->as_Call(); > 2591: CallNode* call = new CallLeafNode(mod_macro->tf(), > 2592: is_drem ? CAST_FROM_FN_PTR(address, SharedRuntime::drem) : CAST_FROM_FN_PTR(address, SharedRuntime::frem), Suggestion: is_drem ? CAST_FROM_FN_PTR(address, SharedRuntime::drem) : CAST_FROM_FN_PTR(address, SharedRuntime::frem), The line is a little long. src/hotspot/share/opto/parse.hpp line 533: > 531: > 532: // Helper functions for shifting & arithmetic > 533: Node* floating_point_mod(Node* a, Node* b, bool dbl); Please use a more descriptive name than `dbl` ;) ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22786#pullrequestreview-2536297498 PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1906663110 PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1906668157 PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1906669805 PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1906679723 PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1906676399 PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1906688741 PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1906693395 PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1906700593 From epeter at openjdk.org Wed Jan 8 08:25:49 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 08:25:49 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing In-Reply-To: <26SaQSwYmaVrCRI7OdE7LSanlixD2iV5zvaNqAwfz8Y=.9c8d450e-7453-4f15-8b91-dc612632a931@github.com> References: <26SaQSwYmaVrCRI7OdE7LSanlixD2iV5zvaNqAwfz8Y=.9c8d450e-7453-4f15-8b91-dc612632a931@github.com> Message-ID: On Wed, 8 Jan 2025 08:04:26 GMT, Emanuel Peter wrote: >> C2 currently emits runtime calls if the platform rules do not support lowering floating point remainder operations. For example, for float: >> >> https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L2305-L2318 >> >> https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L1099-L1109 >> >> The only platform, which currently supports this, however, is x86_32. On all other platforms, runtime calls are generated directly during parsing, which prevent any constant folding or other idealizations. Even C1 can perform these optimizations, resulting in significantly lower C2 performance compared to C1 for simple test cases. This function was observed to be around 15x slower with C2 compared to C1 due to redundant runtime calls: >> >> >> public static double process(final double x) { >> double w = (double) 0.1; >> double p = 0; >> p = (double) (3.109615012413746E307 % (w % Z)); >> p = (double) (7.614949555185036E307 / (x % x)); // <- return value only dependends on this line >> return (double) (x * p); >> } >> >> >> To fix this, this PR turns ModFNode and ModDNode into macros, which are always created during parsing. They support idealization (constant folding) and are lowered to runtime calls during macro expansion. For simplicity, these operations will now also call into the runtime on x86_32, as this platform is deprecated. > > src/hotspot/share/opto/divnode.cpp line 1401: > >> 1399: //============================================================================= >> 1400: //------------------------------Idealize--------------------------------------- >> 1401: Node *UModLNode::Ideal(PhaseGVN *phase, bool can_reshape) { > > Suggestion: > > Node* UModLNode::Ideal(PhaseGVN* phase, bool can_reshape) { > > Ah, maybe you did not mean to touch it, but on GitHub it looks like you did... Maybe you just reordered things. Makes it a little trickier to review though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1906671275 From epeter at openjdk.org Wed Jan 8 08:46:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 08:46:51 GMT Subject: RFR: 8345580: Remove const from Node::_idx which is modified [v2] In-Reply-To: <9Y-j_kyLeTWwraDrmfZWu1vle6fFxiwczcOcSNLyfMs=.a8ef5382-6237-4edd-b272-c5328f0884bc@github.com> References: <9Y-j_kyLeTWwraDrmfZWu1vle6fFxiwczcOcSNLyfMs=.a8ef5382-6237-4edd-b272-c5328f0884bc@github.com> Message-ID: On Wed, 18 Dec 2024 14:36:11 GMT, Yagmur Eren wrote: >> `Node::_idx` is declared as `const`, however, it is modified by `Node::set_idx`. Please see: https://github.com/openjdk/jdk/blob/166c12771d9d8c466e73a9490c4eb1fc9a5f6c24/src/hotspot/share/opto/node.hpp#L588 >> >> As already stated in [JDK-8345580](https://bugs.openjdk.org/browse/JDK-8345580) issue, this behavior is counterintuitive because a const variable is expected to remain unmodified throughout its lifetime. Additionally, C++ International Standard states that "_...any attempt to modify a `const` object during its lifetime results in undefined behavior._ ". >> To address this, `const` should be removed from the declaration of `Node::_idx` to align with its intended use and avoid violating the C++ standard. Tested with tier1,2,3,4,5. > > Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: > > remove casting trick in set_idx I reran testing for commit 2. Patch looks good though. Ping me again once the tests are complete. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22646#issuecomment-2577092614 From epeter at openjdk.org Wed Jan 8 08:50:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 08:50:53 GMT Subject: RFR: 8343629: More MergeStore benchmark [v5] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 04:15:36 GMT, Shaojin Wen wrote: >> 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull >> 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. > > Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: > > seperate MergeStoreBench and MergeLoadBench Sorry it took me this long. I think it is ok as it is. Thanks for the work @wenshao ! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21659#pullrequestreview-2536508155 From syan at openjdk.org Wed Jan 8 09:11:43 2025 From: syan at openjdk.org (SendaoYan) Date: Wed, 8 Jan 2025 09:11:43 GMT Subject: [jdk24] RFR: 8346965: Multiple compiler/ciReplay test fails with -XX:+SegmentedCodeCache In-Reply-To: <411pqFXlbWE-kI_RzcfhSVHNWyyAue_QUmJXdwAmbn4=.b3050c1c-841d-44bb-a896-4574166caa9f@github.com> References: <411pqFXlbWE-kI_RzcfhSVHNWyyAue_QUmJXdwAmbn4=.b3050c1c-841d-44bb-a896-4574166caa9f@github.com> Message-ID: On Wed, 8 Jan 2025 02:38:09 GMT, SendaoYan wrote: >> @sendaoYan the Affected version of JDK-8346965 only lists JDK25. >> Since you are backporting to JDK24, this must be incomplete... Can you add all affected versions and link the change that caused the issue? > >> @sendaoYan the Affected version of JDK-8346965 only lists JDK25. Since you are backporting to JDK24, this must be incomplete... Can you add all affected versions and link the change that caused the issue? > > Sorry, the all affected versions has been updated. And the caused link has been added, > @sendaoYan But the likely cause [JDK-8311248](https://bugs.openjdk.org/browse/JDK-8311248) was integrated in JDK23, so should that version not also be in the affected list? Thanks, the '23' has been added to the affected list. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22950#issuecomment-2577154150 From epeter at openjdk.org Wed Jan 8 09:13:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 09:13:46 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v2] In-Reply-To: References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> Message-ID: On Wed, 8 Jan 2025 09:03:36 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Updating copyright year of modified files >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 >> - Update IR transforms and tests >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 >> - 8342676: Unsigned Vector Min / Max transforms > > src/hotspot/share/opto/vectornode.cpp line 2162: > >> 2160: // UMax (UMin(a, b), UMax(a, b)) => UMax(a, b) >> 2161: // UMax (UMax(a, b), UMin(b, a)) => UMax(a, b) >> 2162: if (umin && umax) { > > That looks like an implicit null check. Not allowed according to style guide: > `Do not use ints or pointers as (implicit) booleans with &&, ||, if, while. Instead, compare explicitly, i.e. if (x != 0) or if (ptr != nullptr), etc.` > https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md Suggestion: if (umin != null && umax) != null { > test/hotspot/jtreg/compiler/vectorapi/VectorUnsignedMinMaxOperationsTest.java line 29: > >> 27: * @summary Support new unsigned and saturating vector operators in VectorAPI >> 28: * @modules jdk.incubator.vector >> 29: * @requires vm.compiler2.enabled > > I think you can drop that requirement. IR testing is only enabled with C2, and so we can verify that the tests at least execute / have correct results on other platforms / compilers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r1906841553 PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r1906813301 From epeter at openjdk.org Wed Jan 8 09:13:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 09:13:45 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v2] In-Reply-To: References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> Message-ID: On Tue, 7 Jan 2025 08:58:12 GMT, Jatin Bhateja wrote: >> Adding following IR transforms for unsigned vector Min / Max nodes. >> >> => UMinV (UMinV(a, b), UMaxV(a, b)) => UMinV(a, b) >> => UMinV (UMinV(a, b), UMaxV(b, a)) => UMinV(a, b) >> => UMaxV (UMinV(a, b), UMaxV(a, b)) => UMaxV(a, b) >> => UMaxV (UMinV(a, b), UMaxV(b, a)) => UMaxV(a, b) >> => UMaxV (a, a) => a >> => UMinV (a, a) => a >> >> New IR validation test accompanies the patch. >> >> This is a follow-up PR for https://github.com/openjdk/jdk/pull/20507 >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Updating copyright year of modified files > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 > - Update IR transforms and tests > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 > - 8342676: Unsigned Vector Min / Max transforms Generally looks reasonable. Smaller changes like this are much easier to review, thank you for splitting it out ? I started some testing, so feel free to ping me again for that. src/hotspot/share/opto/vectornode.cpp line 2140: > 2138: } > 2139: > 2140: static Node* unsigned_min_max_xform(Node* n) { Suggestion: static Node* UMinMaxV_Ideal(Node* n) { I think this is how we generally name functions when we share them between nodes of different types. src/hotspot/share/opto/vectornode.cpp line 2162: > 2160: // UMax (UMin(a, b), UMax(a, b)) => UMax(a, b) > 2161: // UMax (UMax(a, b), UMin(b, a)) => UMax(a, b) > 2162: if (umin && umax) { That looks like an implicit null check. Not allowed according to style guide: `Do not use ints or pointers as (implicit) booleans with &&, ||, if, while. Instead, compare explicitly, i.e. if (x != 0) or if (ptr != nullptr), etc.` https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md src/hotspot/share/opto/vectornode.cpp line 2164: > 2162: if (umin && umax) { > 2163: if ((umin->in(1) == umax->in(1) && umin->in(2) == umax->in(2)) || > 2164: (umin->in(2) == umax->in(1) && umin->in(1) == umax->in(2))) { Suggestion: if ((umin->in(1) == umax->in(1) && umin->in(2) == umax->in(2)) || (umin->in(2) == umax->in(1) && umin->in(1) == umax->in(2))) { Alignment was off src/hotspot/share/opto/vectornode.hpp line 622: > 620: virtual uint hash() const { > 621: return (uintptr_t)in(1) + (uintptr_t)in(2) + Opcode(); > 622: } Can you explain why you do this instead of `Node::hash`? I think you do it so that you get the same hash if you swap the operands. I suggest you leave a comment in the code. test/hotspot/jtreg/compiler/vectorapi/VectorUnsignedMinMaxOperationsTest.java line 26: > 24: /** > 25: * @test > 26: * @bug 8338201 Bug number does not match this RFE test/hotspot/jtreg/compiler/vectorapi/VectorUnsignedMinMaxOperationsTest.java line 29: > 27: * @summary Support new unsigned and saturating vector operators in VectorAPI > 28: * @modules jdk.incubator.vector > 29: * @requires vm.compiler2.enabled I think you can drop that requirement. test/hotspot/jtreg/compiler/vectorapi/VectorUnsignedMinMaxOperationsTest.java line 109: > 107: > 108: @Test > 109: @IR(counts = {IRNode.UMAX_VB, " >0 "}, applyIf = {"UseAVX", " >0 "}) I think we usually use CPU features, like `avx`. test/hotspot/jtreg/compiler/vectorapi/VectorUnsignedMinMaxOperationsTest.java line 116: > 114: .lanewise(VectorOperators.UMAX, > 115: ByteVector.fromArray(bspec, byte_in2, i)) > 116: .intoArray(byte_out, i); Suggestion: ByteVector.fromArray(bspec, byte_in1, i) .lanewise(VectorOperators.UMAX, ByteVector.fromArray(bspec, byte_in2, i)) .intoArray(byte_out, i); Alignment looked off test/hotspot/jtreg/compiler/vectorapi/VectorUnsignedMinMaxOperationsTest.java line 453: > 451: IntVector vec1 = IntVector.fromArray(ispec, int_in1, i); > 452: IntVector vec2 = IntVector.fromArray(ispec, int_in2, i); > 453: // UMinV (UMinV vec1, vec2) (UMaxV vec1, vec2) => UMinV vec1 vec2 I think you now always have pattern `minmax (minmax vec1, vec2) (minmax vec1, vec2)`, i.e. with `vec1` as first operand and `vec2` as second. Would be nice to have one where they are swapped: `minmax (minmax vec1, vec2) (minmax vec2, vec1)` ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21604#pullrequestreview-2536521746 PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r1906839550 PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r1906841096 PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r1906844139 PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r1906850212 PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r1906810532 PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r1906812099 PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r1906818097 PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r1906824451 PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r1906835049 From epeter at openjdk.org Wed Jan 8 09:33:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 09:33:38 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v2] In-Reply-To: References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: On Tue, 7 Jan 2025 14:07:17 GMT, Theo Weidmann wrote: >> test/hotspot/jtreg/testlibrary_tests/generators/tests/TestGenerators.java line 78: >> >>> 76: Asserts.assertEQ(g.next(), 4); >>> 77: Asserts.assertEQ(g.next(), 18); >>> 78: } >> >> It would be nice if you told us / a future person who extends this, what this mocking does, and how it works. > > Do you mean specifically how this test here works or how the mocking works? Or both? Maybe just say what values you feed in, and why it produces the results. That should hopefully help a future person who tries to extend the test for their own type. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1906878215 From shade at openjdk.org Wed Jan 8 09:39:16 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 8 Jan 2025 09:39:16 GMT Subject: [jdk24] RFR: 8347127: CTW fails to build after JDK-8334733 Message-ID: Fixes the JDK 24 regression for standalone CTW runner. ------------- Commit messages: - Backport e413fc643c4a58e3c46d81025c3ac9fbf89db4b9 Changes: https://git.openjdk.org/jdk/pull/22964/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22964&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347127 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22964.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22964/head:pull/22964 PR: https://git.openjdk.org/jdk/pull/22964 From swen at openjdk.org Wed Jan 8 09:44:45 2025 From: swen at openjdk.org (Shaojin Wen) Date: Wed, 8 Jan 2025 09:44:45 GMT Subject: RFR: 8343629: More MergeStore benchmark [v5] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 04:15:36 GMT, Shaojin Wen wrote: >> 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull >> 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. > > Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: > > seperate MergeStoreBench and MergeLoadBench Thanks, Emanuel Peter. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2577224547 From swen at openjdk.org Wed Jan 8 09:44:46 2025 From: swen at openjdk.org (Shaojin Wen) Date: Wed, 8 Jan 2025 09:44:46 GMT Subject: Integrated: 8343629: More MergeStore benchmark In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 07:03:33 GMT, Shaojin Wen wrote: > 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull > 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. This pull request has now been integrated. Changeset: b741f3fe Author: Shaojin Wen URL: https://git.openjdk.org/jdk/commit/b741f3fe5b54755d19c5abeca76fdceeccafd448 Stats: 1195 lines in 2 files changed: 537 ins; 425 del; 233 mod 8343629: More MergeStore benchmark Reviewed-by: epeter ------------- PR: https://git.openjdk.org/jdk/pull/21659 From jsjolen at openjdk.org Wed Jan 8 10:26:48 2025 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 8 Jan 2025 10:26:48 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: On Mon, 25 Nov 2024 21:07:40 GMT, Johan Sj?len wrote: >>> Do you think I should introduce an explicit synchronization mechanism to ensure the formatting is still correct with multiple compile threads? >> >> Yes, we could try grabbing the tty lock in dump(), but in the past I think there were sometimes problems with that approach, which is why there were places where we print everything to a stringStream first. > >> > Do you think I should introduce an explicit synchronization mechanism to ensure the formatting is still correct with multiple compile threads? >> >> Yes, we could try grabbing the tty lock in dump(), but in the past I think there were sometimes problems with that approach, which is why there were places where we print everything to a stringStream first. > > Don't use `ttyLock`, we really want to get rid of that mechanism. The best would be to port the output to UL, but if that's not possible use a `stringStream` as Dean said. > @jdksjolen It's been a while since I was working on this, but if I remember correctly: The problem with the approach you suggest is that GrowableArray will fill the entire allocated buffer by calling the default constructor. Moving the new call into the constructor would therefore cause "n+1" heap allocations every time GrowableArray grows and some of these allocations might never be used. > > https://github.com/openjdk/jdk/blob/9702accdd9a25e05628d470bf248edd5d80c0c4d/src/hotspot/share/utilities/growableArray.hpp#L521-L534 Aha yes, I see. The 'uninitialized' capacity is, in fact, initialized. OK, let's not do that. I'll fix the Treap allocator and ping you here and give approval when that's done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2577318630 From jbhateja at openjdk.org Wed Jan 8 10:37:13 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 8 Jan 2025 10:37:13 GMT Subject: RFR: 8342393: Promote commutative vector IR node sharing [v3] In-Reply-To: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> References: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> Message-ID: > Patch promotes the sharing of commutative vector IR with the same inputs but different input ordering. > Unlike scalar IR where we perform edge swapping by [sorting inputs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/addnode.cpp#L122) based on node indices during IR idealization, for vector IR we chose a simpler approach to decorate commutative operations with a special node-level flag during IR construction thus > obviating any dependency on explicit idealization routines. This flag is later used during GVN hashing to enable node sharing. > > Following are the performance stats for JMH micro included with the patch. > > > Granite Rapids (P-core Xeon Server) > Baseline : > Benchmark (size) Mode Cnt Score Error Units > VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 8982.549 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 6072.773 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2368.856 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 15215.087 ops/ms > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 11963.554 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 7036.088 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2906.731 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 17148.131 ops/ms > > Sierra Forest (E-core Xeon Server) > Baseline: > Benchmark (size) Mode Cnt Score Error Units > VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 2444.359 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 1710.256 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 308.766 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt ... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22863/files - new: https://git.openjdk.org/jdk/pull/22863/files/f42645a1..a4c52ecc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22863&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22863&range=01-02 Stats: 8 lines in 1 file changed: 5 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22863.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22863/head:pull/22863 PR: https://git.openjdk.org/jdk/pull/22863 From jbhateja at openjdk.org Wed Jan 8 10:45:41 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 8 Jan 2025 10:45:41 GMT Subject: RFR: 8342393: Promote commutative vector IR node sharing [v4] In-Reply-To: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> References: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> Message-ID: > Patch promotes the sharing of commutative vector IR with the same inputs but different input ordering. > Unlike scalar IR where we perform edge swapping by [sorting inputs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/addnode.cpp#L122) based on node indices during IR idealization, for vector IR we chose a simpler approach to decorate commutative operations with a special node-level flag during IR construction thus > obviating any dependency on explicit idealization routines. This flag is later used during GVN hashing to enable node sharing. > > Following are the performance stats for JMH micro included with the patch. > > > Granite Rapids (P-core Xeon Server) > Baseline : > Benchmark (size) Mode Cnt Score Error Units > VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 8982.549 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 6072.773 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2368.856 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 15215.087 ops/ms > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 11963.554 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 7036.088 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2906.731 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 17148.131 ops/ms > > Sierra Forest (E-core Xeon Server) > Baseline: > Benchmark (size) Mode Cnt Score Error Units > VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 2444.359 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 1710.256 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 308.766 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt ... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: strict assertion check on commutative operation input count ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22863/files - new: https://git.openjdk.org/jdk/pull/22863/files/a4c52ecc..39184960 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22863&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22863&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22863.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22863/head:pull/22863 PR: https://git.openjdk.org/jdk/pull/22863 From jbhateja at openjdk.org Wed Jan 8 10:45:42 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 8 Jan 2025 10:45:42 GMT Subject: RFR: 8342393: Promote commutative vector IR node sharing [v2] In-Reply-To: References: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> <9Mx8rZTEbpj0Cxbxl6qYDSipvRA0wWYhjcw2BI7gYzo=.2217a5a9-f031-43db-b0ed-3220a38de382@github.com> Message-ID: On Tue, 7 Jan 2025 20:47:07 GMT, Vladimir Ivanov wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> removing spaces > > src/hotspot/share/opto/vectornode.hpp line 78: > >> 76: virtual uint hash() const { >> 77: if (is_commutative_operation()) { >> 78: return (uintptr_t)in(1) + (uintptr_t)in(2) + Opcode(); > > Commutative implies the operation is binary (`req == 3`). Should there be an assert somewhere to ensure it's always the case (ideally, when marking a node as commutative during construction)? Assertion check added. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22863#discussion_r1906981076 From dlunden at openjdk.org Wed Jan 8 11:01:49 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 8 Jan 2025 11:01:49 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2] In-Reply-To: References: Message-ID: On Mon, 23 Dec 2024 11:06:05 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates after comments > > src/hotspot/share/opto/gcm.cpp line 799: > >> 797: def_mem_state = use_mem_state; // It's not a possibly interfering store. >> 798: if (use_mem_state == initial_mem) >> 799: initial_mem = nullptr; // only process initial memory once > > Could you explain these changes? Yes, of course. In the old version, we ensure that we only add the children of `initial_mem` once by checking if `use_mem_state == initial_mem` and then setting `initial_mem = nullptr` after the first occurrence. In the new version, we make use of the fact that it is equivalent to instead simply check if `def_mem_state == nullptr`. Entries on the worklist have `def_mem_state == nullptr` iff they are initial memory states that were added before the main worklist loop. We never add any entries to the worklist with `def_mem_state == nullptr` within the worklist loop itself, so we are guaranteed to process the children of entries with `def_mem_state == nullptr` only once (and we no longer need to set `initial_mem = nullptr`). The simplification is especially important now that we can have multiple initial memory states. If we would do it the old way, we would instead need to check if `use_mem_state` is a member of the set of initial memory states, and also keep track of which initial memory states we have already visited. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1907002868 From jbhateja at openjdk.org Wed Jan 8 11:39:51 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 8 Jan 2025 11:39:51 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v2] In-Reply-To: References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> Message-ID: On Wed, 8 Jan 2025 09:10:12 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Updating copyright year of modified files >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 >> - Update IR transforms and tests >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 >> - 8342676: Unsigned Vector Min / Max transforms > > src/hotspot/share/opto/vectornode.hpp line 622: > >> 620: virtual uint hash() const { >> 621: return (uintptr_t)in(1) + (uintptr_t)in(2) + Opcode(); >> 622: } > > Can you explain why you do this instead of `Node::hash`? > I think you do it so that you get the same hash if you swap the operands. > I suggest you leave a comment in the code. I think it will be better to schedule this patch after https://github.com/openjdk/jdk/pull/22863 where these nodes are marked at commutative operations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r1907048571 From mli at openjdk.org Wed Jan 8 11:45:37 2025 From: mli at openjdk.org (Hamlin Li) Date: Wed, 8 Jan 2025 11:45:37 GMT Subject: RFR: 8346787: Fix two C2 IR matching tests for RISC-V [v2] In-Reply-To: References: Message-ID: On Mon, 6 Jan 2025 11:03:09 GMT, Fei Yang wrote: >> Two IR matching tests added by [JDK-8332268](https://bugs.openjdk.org/browse/JDK-8332268) are failing on RISC-V: >> TEST: compiler/c2/irTests/ModINodeIdealizationTests.java >> TEST: compiler/c2/irTests/ModLNodeIdealizationTests.java >> >> These two tests require conditional move support. See ModLNode::Ideal & ModLNode::Ideal [1][2]. But RISC-V base ISA (RV64GCV) does not support conditional move, so we set `ConditionalMoveLimit` to 0 for this CPU platform. This change simply skips these two tests for now. >> >> Some further information: >> An initial version of conditional move based on RISC-V `Zicond` extension has been added by: [JDK-8344306](https://bugs.openjdk.org/browse/JDK-8344306). But that still lacks performance tunning and we will reconsider and adjust this `ConditionalMoveLimit` parameter as we go. New issue: [JDK-8346786](https://bugs.openjdk.org/browse/JDK-8346786). >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L993 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L1253 > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Comment Thanks for fixing the failing tests. Some comments, seems only powerOf2Minus1 in both tests failed. Can we only disable the IR verification of powerOf2Minus1, and keep other test enabled? ------------- PR Review: https://git.openjdk.org/jdk/pull/22874#pullrequestreview-2536924520 From thartmann at openjdk.org Wed Jan 8 12:12:32 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 8 Jan 2025 12:12:32 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic Message-ID: C2's arraycopy intrinsic adds guards that check that the source and destination objects are arrays: https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5917-L5919 If these guards pass, the array length is loaded: https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5930-L5933 But since the `LoadRangeNode` is not pinned, it might float above the array guard: https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/graphKit.cpp#L1214 If the object is not an array, we will read garbage. That's usually fine because the result will not be used (the array guard will trigger) but with `-XX:+UseCompactObjectHeaders` it can happen that the memory right after the header is not mapped and we crash. The fix is to add a `CheckCastPPNode` to propagate the information that the operand is an array and prevent the load from floating. Thanks to @shipilev for identifying the root cause! Best regards, Tobias ------------- Commit messages: - 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic Changes: https://git.openjdk.org/jdk/pull/22967/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22967&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347006 Stats: 10 lines in 2 files changed: 8 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22967.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22967/head:pull/22967 PR: https://git.openjdk.org/jdk/pull/22967 From roland at openjdk.org Wed Jan 8 12:19:41 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 8 Jan 2025 12:19:41 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 12:07:16 GMT, Tobias Hartmann wrote: > C2's arraycopy intrinsic adds guards that check that the source and destination objects are arrays: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5917-L5919 > > If these guards pass, the array length is loaded: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5930-L5933 > > But since the `LoadRangeNode` is not pinned, it might float above the array guard: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/graphKit.cpp#L1214 > > If the object is not an array, we will read garbage. That's usually fine because the result will not be used (the array guard will trigger) but with `-XX:+UseCompactObjectHeaders` it can happen that the memory right after the header is not mapped and we crash. > > The fix is to add a `CheckCastPPNode` to propagate the information that the operand is an array and prevent the load from floating. > > Thanks to @shipilev for identifying the root cause! > > I was able to reliably reproduce the issue with `compiler/arraycopy/TestArrayCopyNoInit.java` and `-XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:-UseCompressedClassPointers` on Linux AArch64 and verified that the fix solves the problem. > > Best regards, > Tobias Looks good to me. src/hotspot/share/opto/library_call.cpp line 5920: > 5918: // Keep track of the information that src/dest are arrays to prevent below array specific accesses from floating above. > 5919: generate_non_array_guard(load_object_klass(src), slow_region); > 5920: const Type* tary = TypeAryPtr::make(TypePtr::BotPTR, TypeAry::make(Type::BOTTOM, TypeInt::POS), nullptr, false, Type::OffsetBot); Is this never used elsewhere? Should it a static field in `TypeAryPtr` same as `TypeAryPtr::BYTES` and friends? ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22967#pullrequestreview-2536996400 PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1907095997 From shade at openjdk.org Wed Jan 8 12:29:49 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 8 Jan 2025 12:29:49 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 12:07:16 GMT, Tobias Hartmann wrote: > C2's arraycopy intrinsic adds guards that check that the source and destination objects are arrays: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5917-L5919 > > If these guards pass, the array length is loaded: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5930-L5933 > > But since the `LoadRangeNode` is not pinned, it might float above the array guard: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/graphKit.cpp#L1214 > > If the object is not an array, we will read garbage. That's usually fine because the result will not be used (the array guard will trigger) but with `-XX:+UseCompactObjectHeaders` it can happen that the memory right after the header is not mapped and we crash. > > The fix is to add a `CheckCastPPNode` to propagate the information that the operand is an array and prevent the load from floating. > > Thanks to @shipilev for identifying the root cause! > > I was able to reliably reproduce the issue with `compiler/arraycopy/TestArrayCopyNoInit.java` and `-XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:-UseCompressedClassPointers` on Linux AArch64 and verified that the fix solves the problem. > > Best regards, > Tobias I have a question about the test :) test/hotspot/jtreg/compiler/arraycopy/TestArrayCopyNoInit.java line 32: > 30: * compiler.arraycopy.TestArrayCopyNoInit > 31: * @run main/othervm -XX:-BackgroundCompilation -XX:-UseOnStackReplacement -XX:TypeProfileLevel=020 > 32: * -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders -XX:-UseTLAB You have been able to reproduce this with `-UseCompressedClassPointers`, right? If so, I'd suggest we do a run config with `-UseCCP` instead of `+UseCOH`, because this gives us a cleaner way for backports, if we need one later. ------------- PR Review: https://git.openjdk.org/jdk/pull/22967#pullrequestreview-2537002862 PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1907099990 From thartmann at openjdk.org Wed Jan 8 12:29:49 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 8 Jan 2025 12:29:49 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 12:07:16 GMT, Tobias Hartmann wrote: > C2's arraycopy intrinsic adds guards that check that the source and destination objects are arrays: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5917-L5919 > > If these guards pass, the array length is loaded: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5930-L5933 > > But since the `LoadRangeNode` is not pinned, it might float above the array guard: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/graphKit.cpp#L1214 > > If the object is not an array, we will read garbage. That's usually fine because the result will not be used (the array guard will trigger) but with `-XX:+UseCompactObjectHeaders` it can happen that the memory right after the header is not mapped and we crash. > > The fix is to add a `CheckCastPPNode` to propagate the information that the operand is an array and prevent the load from floating. > > Thanks to @shipilev for identifying the root cause! > > I was able to reliably reproduce the issue with `compiler/arraycopy/TestArrayCopyNoInit.java` and `-XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:-UseCompressedClassPointers` on Linux AArch64 and verified that the fix solves the problem. > > Best regards, > Tobias Thanks for the review, Roland! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22967#issuecomment-2577545127 From thartmann at openjdk.org Wed Jan 8 12:29:50 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 8 Jan 2025 12:29:50 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 12:17:05 GMT, Roland Westrelin wrote: >> C2's arraycopy intrinsic adds guards that check that the source and destination objects are arrays: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5917-L5919 >> >> If these guards pass, the array length is loaded: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5930-L5933 >> >> But since the `LoadRangeNode` is not pinned, it might float above the array guard: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/graphKit.cpp#L1214 >> >> If the object is not an array, we will read garbage. That's usually fine because the result will not be used (the array guard will trigger) but with `-XX:+UseCompactObjectHeaders` it can happen that the memory right after the header is not mapped and we crash. >> >> The fix is to add a `CheckCastPPNode` to propagate the information that the operand is an array and prevent the load from floating. >> >> Thanks to @shipilev for identifying the root cause! >> >> I was able to reliably reproduce the issue with `compiler/arraycopy/TestArrayCopyNoInit.java` and `-XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:-UseCompressedClassPointers` on Linux AArch64 and verified that the fix solves the problem. >> >> Best regards, >> Tobias > > src/hotspot/share/opto/library_call.cpp line 5920: > >> 5918: // Keep track of the information that src/dest are arrays to prevent below array specific accesses from floating above. >> 5919: generate_non_array_guard(load_object_klass(src), slow_region); >> 5920: const Type* tary = TypeAryPtr::make(TypePtr::BotPTR, TypeAry::make(Type::BOTTOM, TypeInt::POS), nullptr, false, Type::OffsetBot); > > Is this never used elsewhere? Should it a static field in `TypeAryPtr` same as `TypeAryPtr::BYTES` and friends? I wondered as well and no, we don't use this type anywhere else (the closest would be `TypeAryPtr::RANGE`). We only create it when meeting arrays of primitive and non-primitive element type. Do you think this should still go to `TypeAryPtr::*`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1907103145 From roland at openjdk.org Wed Jan 8 12:29:50 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 8 Jan 2025 12:29:50 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 12:23:07 GMT, Tobias Hartmann wrote: >> src/hotspot/share/opto/library_call.cpp line 5920: >> >>> 5918: // Keep track of the information that src/dest are arrays to prevent below array specific accesses from floating above. >>> 5919: generate_non_array_guard(load_object_klass(src), slow_region); >>> 5920: const Type* tary = TypeAryPtr::make(TypePtr::BotPTR, TypeAry::make(Type::BOTTOM, TypeInt::POS), nullptr, false, Type::OffsetBot); >> >> Is this never used elsewhere? Should it a static field in `TypeAryPtr` same as `TypeAryPtr::BYTES` and friends? > > I wondered as well and no, we don't use this type anywhere else (the closest would be `TypeAryPtr::RANGE`). We only create it when meeting arrays of primitive and non-primitive element type. Do you think this should still go to `TypeAryPtr::*`? I would add it to `TypeAryPtr` (maybe as `TypeAryPtr::BOTTOM`) . The main benefit I see is that the new code would more readable if it referred to `TypeAryPtr::BOTTOM` rather than the long type creation expression. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1907107453 From thartmann at openjdk.org Wed Jan 8 12:29:51 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 8 Jan 2025 12:29:51 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 12:20:24 GMT, Aleksey Shipilev wrote: > You have been able to reproduce this with -UseCompressedClassPointers, right? No, I was never able to reproduce this with `-XX:-UseCompressedClassPointers`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1907103924 From shade at openjdk.org Wed Jan 8 12:29:51 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 8 Jan 2025 12:29:51 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic In-Reply-To: References: Message-ID: <3d7ezYP3-By8ArAoM6IkKGH8XHZ4VNVcviiMzMy2EPQ=.d82874c8-40e0-42ce-808c-527e44aac2dc@github.com> On Wed, 8 Jan 2025 12:23:50 GMT, Tobias Hartmann wrote: >> test/hotspot/jtreg/compiler/arraycopy/TestArrayCopyNoInit.java line 32: >> >>> 30: * compiler.arraycopy.TestArrayCopyNoInit >>> 31: * @run main/othervm -XX:-BackgroundCompilation -XX:-UseOnStackReplacement -XX:TypeProfileLevel=020 >>> 32: * -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders -XX:-UseTLAB >> >> You have been able to reproduce this with `-UseCompressedClassPointers`, right? If so, I'd suggest we do a run config with `-UseCCP` instead of `+UseCOH`, because this gives us a cleaner way for backports, if we need one later. > >> You have been able to reproduce this with -UseCompressedClassPointers, right? > > No, I was never able to reproduce this with `-XX:-UseCompressedClassPointers`. OK, I was confused by this in PR body then: > I was able to reliably reproduce the issue with compiler/arraycopy/TestArrayCopyNoInit.java and -XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:-UseCompressedClassPointers on Linux AArch64 and verified that the fix solves the problem. But fine, if it reproduces with +UCOH, let it be there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1907107122 From epeter at openjdk.org Wed Jan 8 12:43:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 12:43:37 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v2] In-Reply-To: References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> Message-ID: On Wed, 8 Jan 2025 11:37:32 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/vectornode.hpp line 622: >> >>> 620: virtual uint hash() const { >>> 621: return (uintptr_t)in(1) + (uintptr_t)in(2) + Opcode(); >>> 622: } >> >> Can you explain why you do this instead of `Node::hash`? >> I think you do it so that you get the same hash if you swap the operands. >> I suggest you leave a comment in the code. > > I think it will be better to schedule this patch after https://github.com/openjdk/jdk/pull/22863 > where these nodes are marked as commutative operations. Ah ok, I will put that one in my review queue! There is lots that have accumulated over the last 2 weeks ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r1907125031 From thartmann at openjdk.org Wed Jan 8 12:48:43 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 8 Jan 2025 12:48:43 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic In-Reply-To: <3d7ezYP3-By8ArAoM6IkKGH8XHZ4VNVcviiMzMy2EPQ=.d82874c8-40e0-42ce-808c-527e44aac2dc@github.com> References: <3d7ezYP3-By8ArAoM6IkKGH8XHZ4VNVcviiMzMy2EPQ=.d82874c8-40e0-42ce-808c-527e44aac2dc@github.com> Message-ID: On Wed, 8 Jan 2025 12:26:28 GMT, Aleksey Shipilev wrote: >>> You have been able to reproduce this with -UseCompressedClassPointers, right? >> >> No, I was never able to reproduce this with `-XX:-UseCompressedClassPointers`. > > OK, I was confused by this in PR body then: > >> I was able to reliably reproduce the issue with compiler/arraycopy/TestArrayCopyNoInit.java and -XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:-UseCompressedClassPointers on Linux AArch64 and verified that the fix solves the problem. > > But fine, if it reproduces with +UCOH, let it be there. Ah, that's actually a typo, good catch. Should be `-XX:+UseCompactObjectHeaders`. I'll fix it in the description. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1907130502 From shade at openjdk.org Wed Jan 8 12:52:35 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 8 Jan 2025 12:52:35 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 12:07:16 GMT, Tobias Hartmann wrote: > C2's arraycopy intrinsic adds guards that check that the source and destination objects are arrays: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5917-L5919 > > If these guards pass, the array length is loaded: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5930-L5933 > > But since the `LoadRangeNode` is not pinned, it might float above the array guard: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/graphKit.cpp#L1214 > > If the object is not an array, we will read garbage. That's usually fine because the result will not be used (the array guard will trigger) but with `-XX:+UseCompactObjectHeaders` it can happen that the memory right after the header is not mapped and we crash. > > The fix is to add a `CheckCastPPNode` to propagate the information that the operand is an array and prevent the load from floating. > > Thanks to @shipilev for identifying the root cause! > > I was able to reliably reproduce the issue with `compiler/arraycopy/TestArrayCopyNoInit.java` and `-XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders` on Linux AArch64 and verified that the fix solves the problem. > > Best regards, > Tobias I think bug should target 25 and then we backport it to 24. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22967#issuecomment-2577597242 From tweidmann at openjdk.org Wed Jan 8 13:02:24 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Wed, 8 Jan 2025 13:02:24 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v3] In-Reply-To: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: > This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: > >> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >> >> I would like to start a collection of such generators, that can then be used in testing. >> >> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >> >> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >> >> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 > > The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: Allow more restricting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22941/files - new: https://git.openjdk.org/jdk/pull/22941/files/bb5074d7..776afecd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=01-02 Stats: 278 lines in 5 files changed: 231 ins; 17 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/22941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22941/head:pull/22941 PR: https://git.openjdk.org/jdk/pull/22941 From epeter at openjdk.org Wed Jan 8 13:05:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 13:05:40 GMT Subject: RFR: 8342393: Promote commutative vector IR node sharing [v4] In-Reply-To: References: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> Message-ID: On Wed, 8 Jan 2025 10:45:41 GMT, Jatin Bhateja wrote: >> Patch promotes the sharing of commutative vector IR with the same inputs but different input ordering. >> Unlike scalar IR where we perform edge swapping by [sorting inputs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/addnode.cpp#L122) based on node indices during IR idealization, for vector IR we chose a simpler approach to decorate commutative operations with a special node-level flag during IR construction thus >> obviating any dependency on explicit idealization routines. This flag is later used during GVN hashing to enable node sharing. >> >> Following are the performance stats for JMH micro included with the patch. >> >> >> Granite Rapids (P-core Xeon Server) >> Baseline : >> Benchmark (size) Mode Cnt Score Error Units >> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 8982.549 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 6072.773 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2368.856 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 15215.087 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 11963.554 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 7036.088 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2906.731 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 17148.131 ops/ms >> >> Sierra Forest (E-core Xeon Server) >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 2444.359 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 1710.256 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 308.766 ops/ms >> VectorCommutativeOperSharingBenc... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > strict assertion check on commutative operation input count Looks promising, thanks for the work! src/hotspot/share/opto/phaseX.cpp line 148: > 146: if (k->in(0) == n->in(0) && > 147: (k->in(1) == n->in(1) || k->in(1) == n->in(2)) && > 148: (k->in(2) == n->in(1) || k->in(2) == n->in(2))) { Would be good to have IR tests that verify the commoning of all permutations. And that wrong patterns do not common: `add(x,y)` and `add(x,x)`. src/hotspot/share/opto/phaseX.cpp line 157: > 155: if( n->in(i)!=k->in(i)) // Different inputs? > 156: goto collision; // "goto" is a speed hack... > 157: } I'm also not a fan of the `goto` hacks. Looks like bad design. I would vote for creating helper functions and calling those instead... test/hotspot/jtreg/compiler/vectorapi/VectorCommutativeOperSharingTest.java line 139: > 137: } > 138: } > 139: } As mentioned above, it would be good to have all combinations of 2 ops, with inputs x and y: add(x,x) with add(x,x) add(y,x) with add(x,x) add(x,y) with add(x,x) add(y,y) with add(x,x) add(x,x) with add(y,x) add(y,x) with add(y,x) add(x,y) with add(y,x) add(y,y) with add(y,x) add(x,x) with add(x,y) add(y,x) with add(x,y) add(x,y) with add(x,y) add(y,y) with add(x,y) add(x,x) with add(y,y) add(y,x) with add(y,y) add(x,y) with add(y,y) add(y,y) with add(y,y) At least do this for one operator. All may be a bit much to write... Templates would really be fantastic here... coming soon. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22863#pullrequestreview-2537058290 PR Review Comment: https://git.openjdk.org/jdk/pull/22863#discussion_r1907139452 PR Review Comment: https://git.openjdk.org/jdk/pull/22863#discussion_r1907141030 PR Review Comment: https://git.openjdk.org/jdk/pull/22863#discussion_r1907149667 From epeter at openjdk.org Wed Jan 8 13:05:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 13:05:41 GMT Subject: RFR: 8342393: Promote commutative vector IR node sharing [v2] In-Reply-To: References: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> <9Mx8rZTEbpj0Cxbxl6qYDSipvRA0wWYhjcw2BI7gYzo=.2217a5a9-f031-43db-b0ed-3220a38de382@github.com> Message-ID: <3oh7LT4PookkaCAnTkEFlD52r5Qtik9_UsE5lMt-q5w=.1158bb15-0ad6-4f53-91f0-fc9ddef24596@github.com> On Wed, 8 Jan 2025 10:42:03 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/vectornode.hpp line 78: >> >>> 76: virtual uint hash() const { >>> 77: if (is_commutative_operation()) { >>> 78: return (uintptr_t)in(1) + (uintptr_t)in(2) + Opcode(); >> >> Commutative implies the operation is binary (`req == 3`). Should there be an assert somewhere to ensure it's always the case (ideally, when marking a node as commutative during construction)? > > Assertion check added. Where did you add it? I don't see it in the constructor or `add_flag`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22863#discussion_r1907132372 From epeter at openjdk.org Wed Jan 8 13:05:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 13:05:42 GMT Subject: RFR: 8342393: Promote commutative vector IR node sharing [v2] In-Reply-To: <3oh7LT4PookkaCAnTkEFlD52r5Qtik9_UsE5lMt-q5w=.1158bb15-0ad6-4f53-91f0-fc9ddef24596@github.com> References: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> <9Mx8rZTEbpj0Cxbxl6qYDSipvRA0wWYhjcw2BI7gYzo=.2217a5a9-f031-43db-b0ed-3220a38de382@github.com> <3oh7LT4PookkaCAnTkEFlD52r5Qtik9_UsE5lMt-q5w=.1158bb15-0ad6-4f53-91f0-fc9ddef24596@github.com> Message-ID: On Wed, 8 Jan 2025 12:47:23 GMT, Emanuel Peter wrote: >> Assertion check added. > > Where did you add it? I don't see it in the constructor or `add_flag`. I also wonder: if the flag `is_commutative_operation` is available at the `Node` level, and not just `VectorNode`, then should this logic here not be at `Node::hash` rather than `VectorNode::hash`? Otherwise, the flag should probably be called `is_commutative_vector_operation`, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22863#discussion_r1907134128 From thartmann at openjdk.org Wed Jan 8 13:08:53 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 8 Jan 2025 13:08:53 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 12:50:14 GMT, Aleksey Shipilev wrote: > I think bug should target 25 and then we backport it to 24. Since we should fix the issue in JDK 24, the bug should remain targeted to JDK 24. The Skara bot will then take care of updating the fix version to JDK 25 and creating a backport to JDK 24 once we push this into master. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22967#issuecomment-2577622654 From thartmann at openjdk.org Wed Jan 8 13:08:54 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 8 Jan 2025 13:08:54 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v2] In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 12:26:44 GMT, Roland Westrelin wrote: >> I wondered as well and no, we don't use this type anywhere else (the closest would be `TypeAryPtr::RANGE`). We only create it when meeting arrays of primitive and non-primitive element type. Do you think this should still go to `TypeAryPtr::*`? > > I would add it to `TypeAryPtr` (maybe as `TypeAryPtr::BOTTOM`) . The main benefit I see is that the new code would be more readable if it referred to `TypeAryPtr::BOTTOM` rather than the long type creation expression. Sounds good, I'll update the patch accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1907151707 From thartmann at openjdk.org Wed Jan 8 13:08:52 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 8 Jan 2025 13:08:52 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v2] In-Reply-To: References: Message-ID: > C2's arraycopy intrinsic adds guards that check that the source and destination objects are arrays: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5917-L5919 > > If these guards pass, the array length is loaded: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5930-L5933 > > But since the `LoadRangeNode` is not pinned, it might float above the array guard: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/graphKit.cpp#L1214 > > If the object is not an array, we will read garbage. That's usually fine because the result will not be used (the array guard will trigger) but with `-XX:+UseCompactObjectHeaders` it can happen that the memory right after the header is not mapped and we crash. > > The fix is to add a `CheckCastPPNode` to propagate the information that the operand is an array and prevent the load from floating. > > Thanks to @shipilev for identifying the root cause! > > I was able to reliably reproduce the issue with `compiler/arraycopy/TestArrayCopyNoInit.java` and `-XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders` on Linux AArch64 and verified that the fix solves the problem. > > Best regards, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Added missing stopped checks, refactoring and updated copyright dates ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22967/files - new: https://git.openjdk.org/jdk/pull/22967/files/425bbb6a..5c0292a8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22967&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22967&range=00-01 Stats: 12 lines in 3 files changed: 6 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22967.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22967/head:pull/22967 PR: https://git.openjdk.org/jdk/pull/22967 From thartmann at openjdk.org Wed Jan 8 13:08:54 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 8 Jan 2025 13:08:54 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 12:07:16 GMT, Tobias Hartmann wrote: > C2's arraycopy intrinsic adds guards that check that the source and destination objects are arrays: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5917-L5919 > > If these guards pass, the array length is loaded: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5930-L5933 > > But since the `LoadRangeNode` is not pinned, it might float above the array guard: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/graphKit.cpp#L1214 > > If the object is not an array, we will read garbage. That's usually fine because the result will not be used (the array guard will trigger) but with `-XX:+UseCompactObjectHeaders` it can happen that the memory right after the header is not mapped and we crash. > > The fix is to add a `CheckCastPPNode` to propagate the information that the operand is an array and prevent the load from floating. > > Thanks to @shipilev for identifying the root cause! > > I was able to reliably reproduce the issue with `compiler/arraycopy/TestArrayCopyNoInit.java` and `-XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders` on Linux AArch64 and verified that the fix solves the problem. > > Best regards, > Tobias I updated the fix according to Roland's suggestions and also added missing stopped checks (otherwise, the Cast will become TOP and below code does not like that). ------------- PR Comment: https://git.openjdk.org/jdk/pull/22967#issuecomment-2577627210 From tweidmann at openjdk.org Wed Jan 8 13:10:07 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Wed, 8 Jan 2025 13:10:07 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v4] In-Reply-To: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: > This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: > >> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >> >> I would like to start a collection of such generators, that can then be used in testing. >> >> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >> >> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >> >> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 > > The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: Improve MockRandomnessSource documentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22941/files - new: https://git.openjdk.org/jdk/pull/22941/files/776afecd..5bb3fe9c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=02-03 Stats: 304 lines in 3 files changed: 133 ins; 122 del; 49 mod Patch: https://git.openjdk.org/jdk/pull/22941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22941/head:pull/22941 PR: https://git.openjdk.org/jdk/pull/22941 From tweidmann at openjdk.org Wed Jan 8 13:11:39 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Wed, 8 Jan 2025 13:11:39 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v4] In-Reply-To: References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: On Wed, 8 Jan 2025 09:30:28 GMT, Emanuel Peter wrote: >> Do you mean specifically how this test here works or how the mocking works? Or both? > > Maybe just say what values you feed in, and why it produces the results. > That should hopefully help a future person who tries to extend the test for their own type. I documented MockRandomnessSource and added comments to this test case. I hope that provides enough clarifications to understand the other tests as it is always the same pattern (and this is definitely the most complicated one). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1907158053 From roland at openjdk.org Wed Jan 8 13:15:43 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 8 Jan 2025 13:15:43 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v2] In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 13:08:52 GMT, Tobias Hartmann wrote: >> C2's arraycopy intrinsic adds guards that check that the source and destination objects are arrays: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5917-L5919 >> >> If these guards pass, the array length is loaded: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5930-L5933 >> >> But since the `LoadRangeNode` is not pinned, it might float above the array guard: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/graphKit.cpp#L1214 >> >> If the object is not an array, we will read garbage. That's usually fine because the result will not be used (the array guard will trigger) but with `-XX:+UseCompactObjectHeaders` it can happen that the memory right after the header is not mapped and we crash. >> >> The fix is to add a `CheckCastPPNode` to propagate the information that the operand is an array and prevent the load from floating. >> >> Thanks to @shipilev for identifying the root cause! >> >> I was able to reliably reproduce the issue with `compiler/arraycopy/TestArrayCopyNoInit.java` and `-XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders` on Linux AArch64 and verified that the fix solves the problem. >> >> Best regards, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Added missing stopped checks, refactoring and updated copyright dates src/hotspot/share/opto/library_call.cpp line 5920: > 5918: // Keep track of the information that src/dest are arrays to prevent below array specific accesses from floating above. > 5919: generate_non_array_guard(load_object_klass(src), slow_region); > 5920: if (!stopped()) { Shouldn't we simply return then? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1907160417 From thartmann at openjdk.org Wed Jan 8 13:15:43 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 8 Jan 2025 13:15:43 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v2] In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 13:10:51 GMT, Roland Westrelin wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Added missing stopped checks, refactoring and updated copyright dates > > src/hotspot/share/opto/library_call.cpp line 5920: > >> 5918: // Keep track of the information that src/dest are arrays to prevent below array specific accesses from floating above. >> 5919: generate_non_array_guard(load_object_klass(src), slow_region); >> 5920: if (!stopped()) { > > Shouldn't we simply return then? But we need to set up the `slow_region` path, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1907162736 From tweidmann at openjdk.org Wed Jan 8 13:18:04 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Wed, 8 Jan 2025 13:18:04 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v5] In-Reply-To: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: <9GyszeY-Ky1NNGqjoDuY4pWWLmTykh-D0JE4RUw7wv4=.337ca244-452d-48b0-bb69-ccafad633474@github.com> > This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: > >> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >> >> I would like to start a collection of such generators, that can then be used in testing. >> >> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >> >> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >> >> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 > > The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: Rename ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22941/files - new: https://git.openjdk.org/jdk/pull/22941/files/5bb3fe9c..b3408c02 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=03-04 Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/22941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22941/head:pull/22941 PR: https://git.openjdk.org/jdk/pull/22941 From roland at openjdk.org Wed Jan 8 13:20:40 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 8 Jan 2025 13:20:40 GMT Subject: RFR: 8346184: C2: assert(has_node(i)) failed during split thru phi In-Reply-To: References: Message-ID: On Thu, 19 Dec 2024 14:12:34 GMT, Christian Hagedorn wrote: > Looks reasonable. Does the new node come from here? > > https://github.com/openjdk/jdk/blob/f6e7713bb653811423eeb2515c2f69b437750326/src/hotspot/share/opto/memnode.cpp#L1211-L1215 Yes, it does. > Should we generally add some verification code that `Identity()` does not create new nodes? Hard to do it at all places but it could, for example, be done in the `transform()` methods of GVN and IGVN when calling `Identity()`. But of course, should probably be done in a separate RFE. That sounds like a good idea. I filed: https://bugs.openjdk.org/browse/JDK-8347266 ------------- PR Comment: https://git.openjdk.org/jdk/pull/22818#issuecomment-2577652447 From epeter at openjdk.org Wed Jan 8 13:25:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 13:25:37 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v4] In-Reply-To: References: Message-ID: On Thu, 12 Dec 2024 09:55:24 GMT, Theo Weidmann wrote: >> Extends stringopts to also recognize non-fluid uses of StringBuilder and optimize them the same way. >> >> For example, this basic case was not optimized before and is optimized with this PR: >> >> >> StringBuilder sb = new StringBuilder(); >> sb.append("a"); >> sb.append(a); >> return sb.toString(); > > Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: > > Fix test name Looks promising. Though I did not work with string-opts before, so a little hard for me to give a proper review. If you want/need me to review, a few more annotations with github comments would help. Otherwise I'll just leave it at the drive-by comments ;) src/hotspot/share/opto/stringopts.cpp line 414: > 412: } > 413: > 414: PhaseStringOpts::CheckAppendResult PhaseStringOpts::check_append_candidate(CallStaticJavaNode* cnode, Using two verbs in succession is a little confusing. At least write `check_and_append_candidate`. Or maybe `append_candidate_if_`. test/hotspot/jtreg/compiler/stringopts/TestFluidAndNonFluid.java line 27: > 25: * @test > 26: * @bug 8341696 > 27: * @requires vm.compiler2.enabled Not sure if I asked about this already: do we need this C2 restriction? The IR framework only checks IR rules for C2, but the test could still do value verification for other settings where C2 is not available. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22537#pullrequestreview-2537099237 PR Review Comment: https://git.openjdk.org/jdk/pull/22537#discussion_r1907161513 PR Review Comment: https://git.openjdk.org/jdk/pull/22537#discussion_r1907155428 From tweidmann at openjdk.org Wed Jan 8 13:29:47 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Wed, 8 Jan 2025 13:29:47 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v4] In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 13:11:41 GMT, Emanuel Peter wrote: >> Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix test name > > src/hotspot/share/opto/stringopts.cpp line 414: > >> 412: } >> 413: >> 414: PhaseStringOpts::CheckAppendResult PhaseStringOpts::check_append_candidate(CallStaticJavaNode* cnode, > > Using two verbs in succession is a little confusing. At least write `check_and_append_candidate`. Or maybe `append_candidate_if_`. It's not supposed to be a verb. This methods checks a potential `append` call (a call that might be a call to StringBuilder::append), so it's an *append candidate*. Do you have any suggestions to clarify that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22537#discussion_r1907179981 From epeter at openjdk.org Wed Jan 8 13:32:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 13:32:45 GMT Subject: RFR: 8345580: Remove const from Node::_idx which is modified [v2] In-Reply-To: <9Y-j_kyLeTWwraDrmfZWu1vle6fFxiwczcOcSNLyfMs=.a8ef5382-6237-4edd-b272-c5328f0884bc@github.com> References: <9Y-j_kyLeTWwraDrmfZWu1vle6fFxiwczcOcSNLyfMs=.a8ef5382-6237-4edd-b272-c5328f0884bc@github.com> Message-ID: On Wed, 18 Dec 2024 14:36:11 GMT, Yagmur Eren wrote: >> `Node::_idx` is declared as `const`, however, it is modified by `Node::set_idx`. Please see: https://github.com/openjdk/jdk/blob/166c12771d9d8c466e73a9490c4eb1fc9a5f6c24/src/hotspot/share/opto/node.hpp#L588 >> >> As already stated in [JDK-8345580](https://bugs.openjdk.org/browse/JDK-8345580) issue, this behavior is counterintuitive because a const variable is expected to remain unmodified throughout its lifetime. Additionally, C++ International Standard states that "_...any attempt to modify a `const` object during its lifetime results in undefined behavior._ ". >> To address this, `const` should be removed from the declaration of `Node::_idx` to align with its intended use and avoid violating the C++ standard. Tested with tier1,2,3,4,5. > > Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: > > remove casting trick in set_idx @nelanbu Looks good to me, testing is passing. I would like a second reviewer to approve it though. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22646#pullrequestreview-2537151678 From roland at openjdk.org Wed Jan 8 13:39:39 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 8 Jan 2025 13:39:39 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v2] In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 13:12:37 GMT, Tobias Hartmann wrote: > But we need to set up the `slow_region` path, right? By that you mean have the `slow_region` feed into the uncommon trap that's only created later. It does feel weird that we know we have reached a dead end and we keep trying to add stuff, but ok then. The other thing is shouldn't the cast be added in `generate_non_array_guard()`? I see it's used elsewhere (`LibraryCallKit::inline_native_getLength()`): couldn't the same bug occur there? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1907195835 From epeter at openjdk.org Wed Jan 8 13:40:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 13:40:40 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v4] In-Reply-To: References: Message-ID: <50-9rOI3nUDDlWuEy2HDy4VQAGEHZqCNEegZ5TuuAXk=.35e220fe-c3d5-47a0-ac81-ca5d598f83a2@github.com> On Wed, 8 Jan 2025 13:25:52 GMT, Theo Weidmann wrote: >> src/hotspot/share/opto/stringopts.cpp line 414: >> >>> 412: } >>> 413: >>> 414: PhaseStringOpts::CheckAppendResult PhaseStringOpts::check_append_candidate(CallStaticJavaNode* cnode, >> >> Using two verbs in succession is a little confusing. At least write `check_and_append_candidate`. Or maybe `append_candidate_if_`. > > It's not supposed to be a verb. This methods checks a potential `append` call (a call that might be a call to StringBuilder::append), so it's an *append candidate*. Do you have any suggestions to clarify that? Hmm... Ok. In my opinion a `check` method should be "pure", so it shoud not really have a side-effect... but you do some `sc->add_control(cnode);` and `sc->push_string(arg);` etc. I'm not familiar enough with the code yet, but those side-effects. Ah, the comment in the hpp seems to suggest you `add` it. So maybe: `add_the_append_candidate_to_sc_if_` A bit long, but would be more helpful I think ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22537#discussion_r1907196154 From roland at openjdk.org Wed Jan 8 13:42:18 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 8 Jan 2025 13:42:18 GMT Subject: RFR: 8346184: C2: assert(has_node(i)) failed during split thru phi [v2] In-Reply-To: References: Message-ID: > The assert fires during split thru phi because a call to `Identity` > returns a new node (a constant null pointer). That happens because a > `Load`, once pushed thru phi, can be constant folded because it loads > from a newly allocated array. `Identity` shouldn't return new > nodes. When split thru phi runs, in this case, `Value` should be the > one returning constant null, not `Identity`. There is logic for that > in `LoadNode::Value` but it's after some other checks that cause > `Value` to return too early. > > To fix this, I propose reordering checks in `LoadNode::Value`. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - more - Merge branch 'master' into JDK-8346184 - more - test - fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22818/files - new: https://git.openjdk.org/jdk/pull/22818/files/64c538b2..f768e40b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22818&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22818&range=00-01 Stats: 24483 lines in 1154 files changed: 17153 ins; 4126 del; 3204 mod Patch: https://git.openjdk.org/jdk/pull/22818.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22818/head:pull/22818 PR: https://git.openjdk.org/jdk/pull/22818 From tweidmann at openjdk.org Wed Jan 8 13:43:43 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Wed, 8 Jan 2025 13:43:43 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v4] In-Reply-To: <50-9rOI3nUDDlWuEy2HDy4VQAGEHZqCNEegZ5TuuAXk=.35e220fe-c3d5-47a0-ac81-ca5d598f83a2@github.com> References: <50-9rOI3nUDDlWuEy2HDy4VQAGEHZqCNEegZ5TuuAXk=.35e220fe-c3d5-47a0-ac81-ca5d598f83a2@github.com> Message-ID: On Wed, 8 Jan 2025 13:37:45 GMT, Emanuel Peter wrote: >> It's not supposed to be a verb. This methods checks a potential `append` call (a call that might be a call to StringBuilder::append), so it's an *append candidate*. Do you have any suggestions to clarify that? > > Hmm... Ok. In my opinion a `check` method should be "pure", so it shoud not really have a side-effect... but you do some > `sc->add_control(cnode);` and `sc->push_string(arg);` etc. > > I'm not familiar enough with the code yet, but those side-effects. Ah, the comment in the hpp seems to suggest you `add` it. So maybe: > `add_the_append_candidate_to_sc_if_` > A bit long, but would be more helpful I think I think then I will just call it `process_append_candidate` then. There's no point in trying to explain the complex logic in the method name. This method together with its callee would better be moved into another class anyhow ? but I gave up this idea because of all the codependencies. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22537#discussion_r1907201016 From roland at openjdk.org Wed Jan 8 13:47:39 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 8 Jan 2025 13:47:39 GMT Subject: RFR: 8346184: C2: assert(has_node(i)) failed during split thru phi In-Reply-To: <1YrrUWOx9bqfFZyjHx17SFivrlXcqsV_yaEE5Cux6rI=.716d5c8b-ac06-4843-ae62-8d4fe91f9747@github.com> References: <1YrrUWOx9bqfFZyjHx17SFivrlXcqsV_yaEE5Cux6rI=.716d5c8b-ac06-4843-ae62-8d4fe91f9747@github.com> Message-ID: On Thu, 19 Dec 2024 15:14:11 GMT, Christian Hagedorn wrote: > The new test is failing with the following flags on linux-x64-debug: > > ``` > -XX:-TieredCompilation -XX:+StressReflectiveCode -XX:-ReduceInitialCardMarks -XX:-ReduceBulkZeroing -XX:-ReduceFieldZeroing > ``` The problem here is that the calls to `can_see_stored_value()` from `Identity` and `Value` are not guarded by the same conditions. The one from `Value` only happens if `ReduceFieldZeroing` is true but `Identity` doesn't check that flag. I fixed this by making `Identity` and `Value` call `can_see_stored_value()` unconditionally. (see new commits) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22818#issuecomment-2577707132 From duke at openjdk.org Wed Jan 8 13:52:40 2025 From: duke at openjdk.org (Yagmur Eren) Date: Wed, 8 Jan 2025 13:52:40 GMT Subject: RFR: 8345580: Remove const from Node::_idx which is modified [v2] In-Reply-To: References: <9Y-j_kyLeTWwraDrmfZWu1vle6fFxiwczcOcSNLyfMs=.a8ef5382-6237-4edd-b272-c5328f0884bc@github.com> Message-ID: On Wed, 8 Jan 2025 08:44:19 GMT, Emanuel Peter wrote: >> Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: >> >> remove casting trick in set_idx > > I reran testing for commit 2. Patch looks good though. Ping me again once the tests are complete. @eme64, thanks a lot for the review and testing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22646#issuecomment-2577718900 From thartmann at openjdk.org Wed Jan 8 13:56:46 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 8 Jan 2025 13:56:46 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v2] In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 13:37:31 GMT, Roland Westrelin wrote: > By that you mean have the slow_region feed into the uncommon trap that's only created later. Right. It's a bit weird but probably still the best solution in terms of complexity. The trap will lead to recompilation and then the `too_many_traps` check will trigger. > The other thing is shouldn't the cast be added in generate_non_array_guard()? I see it's used elsewhere (LibraryCallKit::inline_native_getLength()): couldn't the same bug occur there? Right, good catch. I think the use in `LibraryCallKit::inline_native_getLength` has the same problem. We can't easily put the cast into `generate_non_array_guard` though because it operates on the Klass and not on the object. The other `generate*array*guard` methods potentially have the same issue but current uses look good. I guess it's best to fix the `LibraryCallKit::inline_native_getLength` as well, i.e., make it the caller's responsibility to add a cast. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1907220074 From epeter at openjdk.org Wed Jan 8 13:56:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 13:56:41 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v4] In-Reply-To: References: <50-9rOI3nUDDlWuEy2HDy4VQAGEHZqCNEegZ5TuuAXk=.35e220fe-c3d5-47a0-ac81-ca5d598f83a2@github.com> Message-ID: <2zG09KrUcK4sOmDMLQBGPGAA8G7An6LA-nyuVfS0Bn0=.806fe899-bdb2-4a2d-b556-78eeec8df9a0@github.com> On Wed, 8 Jan 2025 13:41:23 GMT, Theo Weidmann wrote: >> Hmm... Ok. In my opinion a `check` method should be "pure", so it shoud not really have a side-effect... but you do some >> `sc->add_control(cnode);` and `sc->push_string(arg);` etc. >> >> I'm not familiar enough with the code yet, but those side-effects. Ah, the comment in the hpp seems to suggest you `add` it. So maybe: >> `add_the_append_candidate_to_sc_if_` >> A bit long, but would be more helpful I think > > I think then I will just call it `process_append_candidate` then. There's no point in trying to explain the complex logic in the method name. This method together with its callee would better be moved into another class anyhow ? but I gave up this idea because of all the codependencies. Ok, so if I understand this right, this fully "processes" the `add`. That is why you call it `CheckAppendResult`. Then maybe the enum tags could be a bit more descriptive... `GoodAppend` -> `AddedAppendToStringConcat` `GiveUp` implies that the algo outside is supposed to give up... but then it does continue to do something out there... so who is giving up? Maybe there could be a better name here. `NotAppend` -> does this mean it is not an append, so you do nothing? -> `DidNothingBecauseNotAppend`... You probably have even better ideas. Good names are hard ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22537#discussion_r1907220250 From epeter at openjdk.org Wed Jan 8 13:56:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 13:56:43 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v4] In-Reply-To: References: Message-ID: On Thu, 12 Dec 2024 09:55:24 GMT, Theo Weidmann wrote: >> Extends stringopts to also recognize non-fluid uses of StringBuilder and optimize them the same way. >> >> For example, this basic case was not optimized before and is optimized with this PR: >> >> >> StringBuilder sb = new StringBuilder(); >> sb.append("a"); >> sb.append(a); >> return sb.toString(); > > Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: > > Fix test name src/hotspot/share/opto/stringopts.cpp line 613: > 611: } else if (use != nullptr && > 612: check_append_candidate(use, sc, m, string_sig, int_sig, char_sig) == CheckAppendResult::GiveUp) { > 613: return nullptr; What happens here in the two other cases `GoodAppend` and `NotAppend`? src/hotspot/share/opto/stringopts.cpp line 643: > 641: > 642: if (result == CheckAppendResult::GiveUp) { > 643: break; Can you put a comment here where this is supposed to jump, and why? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22537#discussion_r1907211458 PR Review Comment: https://git.openjdk.org/jdk/pull/22537#discussion_r1907213010 From epeter at openjdk.org Wed Jan 8 14:02:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 14:02:45 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2] In-Reply-To: References: Message-ID: On Tue, 7 Jan 2025 18:03:21 GMT, Daniel Lund?n wrote: >> When searching for load anti dependences in GCM, it is not always sufficient to just search starting at the direct initial memory input to the load. Specifically, there are cases when we must also search for anti dependences starting at relevant Phi memory nodes in between the load's early block and the initial memory input's block. Here, "in between" refers to blocks in the dominator tree in between the early and initial memory blocks. >> >> #### Example 1 >> >> Consider the ideal graph below. The initial memory for 183 loadI is 107 Phi and there is an important anti dependency for node 64 membar_release. To discover this anti dependency, we must rather search from 119 Phi which contains overlapping memory slices with 107 Phi. Looking at the ideal graph block view, we see that both 107 Phi and 119 Phi are in the initial memory block (B7) and thus dominate the early block (B20). If we only search from 107 Phi, we fail to add the anti dependency to 64 membar_release and do not force the load to schedule before 64 membar_release as we should. In the block view, we see that the load is actually scheduled in B24 _after_ a number of anti-dependent stores, the first of which is in block B20 (corresponding to the anti dependency on 64 membar_release). The result is the failure we see in this issue (we load the wrong value). >> >> ![failure-graph-1](https://github.com/user-attachments/assets/e5458646-7a5c-40e1-b1d8-e3f101e29b73) >> ![failure-blocks-1](https://github.com/user-attachments/assets/a0b1f724-0809-4b2f-9feb-93e9c59a5d6a) >> >> #### Example 2 >> >> There are also situations when we need to start searching from Phis that are strictly in between the initial memory block and early block. Consider the ideal graph below. The initial memory for 100 loadI is 18 MachProj, but we also need to search from 76 Phi to find that we must raise the LCA to the last block on the path between 76 Phi and 75 Phi: B9 (= the load's early block). If we do not search from 76 Phi, the load is again likely scheduled too late (in B11 in the example) after anti-dependent stores (the first of which corresponds to 58 membar_release in B10). Note that the block B6 for 76 Phi is strictly dominated by the initial memory block B2 and also strictly dominates the early block B9. >> >> ![failure-graph-2](https://github.com/user-attachments/assets/ede0c299-6251-4ff8-8b84-af40a1ee9e8c) >> ![failure-blocks-2](https://github.com/user-attachments/assets/e5a87e43-b6fe-4fa3-8961-54752f63633e) >> >> ### Cha... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Updates after comments src/hotspot/share/opto/gcm.cpp line 757: > 755: // In some cases, there are other relevant initial memory states besides > 756: // initial_mem. In such cases, we are rather dealing with multiple trees and > 757: // their fringes. If I look at these comments here (I reviewed a change by Roland a few months back, so my memory is coming back)... I see that the load is supposed to be scheduled before any `Memory state modifying nodes include Store and Phi` that is (transitively via any MergeMem) below the `initial_mem`. In you example1, why do we therefore not put an anti-dependency edge betweeen the `183 load`, and the `106 Phi`? Would that not be enough to ensure the load is scheduled before the other memory affecting nodes further below `106 Phi`? Or is the issue that this traversal is somehow restricted to blocks - I don't remember that from last time... I'll keep reading the changes now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1907230672 From thartmann at openjdk.org Wed Jan 8 14:03:37 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 8 Jan 2025 14:03:37 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v2] In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 13:53:20 GMT, Tobias Hartmann wrote: >>> But we need to set up the `slow_region` path, right? >> >> By that you mean have the `slow_region` feed into the uncommon trap that's only created later. It does feel weird that we know we have reached a dead end and we keep trying to add stuff, but ok then. >> The other thing is shouldn't the cast be added in `generate_non_array_guard()`? I see it's used elsewhere (`LibraryCallKit::inline_native_getLength()`): couldn't the same bug occur there? > >> By that you mean have the slow_region feed into the uncommon trap that's only created later. > > Right. It's a bit weird but probably still the best solution in terms of complexity. The trap will lead to recompilation and then the `too_many_traps` check will trigger. > >> The other thing is shouldn't the cast be added in generate_non_array_guard()? I see it's used elsewhere (LibraryCallKit::inline_native_getLength()): couldn't the same bug occur there? > > Right, good catch. I think the use in `LibraryCallKit::inline_native_getLength` has the same problem. We can't easily put the cast into `generate_non_array_guard` though because it operates on the Klass and not on the object. The other `generate*array*guard` methods potentially have the same issue but current uses look good. I guess it's best to fix the `LibraryCallKit::inline_native_getLength` as well, i.e., make it the caller's responsibility to add a cast. What do you think? Hmm, maybe `inline_getObjectSize` is affected as well: https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L8535-L8543 And `LibraryCallKit::inline_native_clone` as well: https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5257-L5262 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1907228186 From qamai at openjdk.org Wed Jan 8 14:04:41 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 8 Jan 2025 14:04:41 GMT Subject: RFR: 8345580: Remove const from Node::_idx which is modified [v2] In-Reply-To: <9Y-j_kyLeTWwraDrmfZWu1vle6fFxiwczcOcSNLyfMs=.a8ef5382-6237-4edd-b272-c5328f0884bc@github.com> References: <9Y-j_kyLeTWwraDrmfZWu1vle6fFxiwczcOcSNLyfMs=.a8ef5382-6237-4edd-b272-c5328f0884bc@github.com> Message-ID: <-J84pRRV-7TDwl8CG8lBgLGWYarVHtBmfbbHjq5W9C4=.8c732404-0c2d-473f-9563-dd7ad55c4923@github.com> On Wed, 18 Dec 2024 14:36:11 GMT, Yagmur Eren wrote: >> `Node::_idx` is declared as `const`, however, it is modified by `Node::set_idx`. Please see: https://github.com/openjdk/jdk/blob/166c12771d9d8c466e73a9490c4eb1fc9a5f6c24/src/hotspot/share/opto/node.hpp#L588 >> >> As already stated in [JDK-8345580](https://bugs.openjdk.org/browse/JDK-8345580) issue, this behavior is counterintuitive because a const variable is expected to remain unmodified throughout its lifetime. Additionally, C++ International Standard states that "_...any attempt to modify a `const` object during its lifetime results in undefined behavior._ ". >> To address this, `const` should be removed from the declaration of `Node::_idx` to align with its intended use and avoid violating the C++ standard. Tested with tier1,2,3,4,5. > > Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: > > remove casting trick in set_idx The original intent is to avoid mistaken modification of the field, so I think it may be beneficial to change all accesses to a getter and limit the accessibility of the setter using friend. It should be done as a separate patch, I'm quite scared of the aggressiveness of the C++ compiler and not sure we always compile correctly. ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/22646#pullrequestreview-2537219699 From tweidmann at openjdk.org Wed Jan 8 14:04:49 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Wed, 8 Jan 2025 14:04:49 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v4] In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 13:48:29 GMT, Emanuel Peter wrote: >> Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix test name > > src/hotspot/share/opto/stringopts.cpp line 613: > >> 611: } else if (use != nullptr && >> 612: check_append_candidate(use, sc, m, string_sig, int_sig, char_sig) == CheckAppendResult::GiveUp) { >> 613: return nullptr; > > What happens here in the two other cases `GoodAppend` and `NotAppend`? We don't really care here if it was an append or not. The only important thing is to exit this algorithm if the processing of the append candidate detected a reason to definitely give up. > src/hotspot/share/opto/stringopts.cpp line 643: > >> 641: >> 642: if (result == CheckAppendResult::GiveUp) { >> 643: break; > > Can you put a comment here where this is supposed to jump, and why? I was already considering to replace all the `break` with `return nullptr` here before, because all of them just exit from the entire algorithm because we detected something that completely prevents this optimization. But then I thought I will rather stick with the existing pattern and not change code that does not really need to be changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22537#discussion_r1907226706 PR Review Comment: https://git.openjdk.org/jdk/pull/22537#discussion_r1907229909 From tweidmann at openjdk.org Wed Jan 8 14:04:49 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Wed, 8 Jan 2025 14:04:49 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v4] In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 13:58:02 GMT, Theo Weidmann wrote: >> src/hotspot/share/opto/stringopts.cpp line 613: >> >>> 611: } else if (use != nullptr && >>> 612: check_append_candidate(use, sc, m, string_sig, int_sig, char_sig) == CheckAppendResult::GiveUp) { >>> 613: return nullptr; >> >> What happens here in the two other cases `GoodAppend` and `NotAppend`? > > We don't really care here if it was an append or not. The only important thing is to exit this algorithm if the processing of the append candidate detected a reason to definitely give up. If we had exceptions, the method could just return true and false, and throw to abort the optimization... >> src/hotspot/share/opto/stringopts.cpp line 643: >> >>> 641: >>> 642: if (result == CheckAppendResult::GiveUp) { >>> 643: break; >> >> Can you put a comment here where this is supposed to jump, and why? > > I was already considering to replace all the `break` with `return nullptr` here before, because all of them just exit from the entire algorithm because we detected something that completely prevents this optimization. But then I thought I will rather stick with the existing pattern and not change code that does not really need to be changed. Of course the way this entire loop works is a little confusing, but I didn't want to really refactor/rewrite all of this as it seems quite delicate and not trivial to test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22537#discussion_r1907233513 PR Review Comment: https://git.openjdk.org/jdk/pull/22537#discussion_r1907232574 From epeter at openjdk.org Wed Jan 8 14:04:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 14:04:42 GMT Subject: RFR: 8345580: Remove const from Node::_idx which is modified [v2] In-Reply-To: <-J84pRRV-7TDwl8CG8lBgLGWYarVHtBmfbbHjq5W9C4=.8c732404-0c2d-473f-9563-dd7ad55c4923@github.com> References: <9Y-j_kyLeTWwraDrmfZWu1vle6fFxiwczcOcSNLyfMs=.a8ef5382-6237-4edd-b272-c5328f0884bc@github.com> <-J84pRRV-7TDwl8CG8lBgLGWYarVHtBmfbbHjq5W9C4=.8c732404-0c2d-473f-9563-dd7ad55c4923@github.com> Message-ID: On Wed, 8 Jan 2025 13:57:39 GMT, Quan Anh Mai wrote: >> Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: >> >> remove casting trick in set_idx > > The original intent is to avoid mistaken modification of the field, so I think it may be beneficial to change all accesses to a getter and limit the accessibility of the setter using friend. It should be done as a separate patch, I'm quite scared of the aggressiveness of the C++ compiler and not sure we always compile correctly. @merykitty > The original intent is to avoid mistaken modification of the field Correct > I think it may be beneficial to change all accesses to a getter and limit the accessibility of the setter using friend I agree in principle - but we access `_idx` everywhere. That would really make backports much harder ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22646#issuecomment-2577746934 From epeter at openjdk.org Wed Jan 8 14:09:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 14:09:48 GMT Subject: RFR: 8345580: Remove const from Node::_idx which is modified [v2] In-Reply-To: <9Y-j_kyLeTWwraDrmfZWu1vle6fFxiwczcOcSNLyfMs=.a8ef5382-6237-4edd-b272-c5328f0884bc@github.com> References: <9Y-j_kyLeTWwraDrmfZWu1vle6fFxiwczcOcSNLyfMs=.a8ef5382-6237-4edd-b272-c5328f0884bc@github.com> Message-ID: On Wed, 18 Dec 2024 14:36:11 GMT, Yagmur Eren wrote: >> `Node::_idx` is declared as `const`, however, it is modified by `Node::set_idx`. Please see: https://github.com/openjdk/jdk/blob/166c12771d9d8c466e73a9490c4eb1fc9a5f6c24/src/hotspot/share/opto/node.hpp#L588 >> >> As already stated in [JDK-8345580](https://bugs.openjdk.org/browse/JDK-8345580) issue, this behavior is counterintuitive because a const variable is expected to remain unmodified throughout its lifetime. Additionally, C++ International Standard states that "_...any attempt to modify a `const` object during its lifetime results in undefined behavior._ ". >> To address this, `const` should be removed from the declaration of `Node::_idx` to align with its intended use and avoid violating the C++ standard. Tested with tier1,2,3,4,5. > > Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: > > remove casting trick in set_idx Ok, just discussed it with some other engineers: Please file a Follow-up RFE. Make `Node::_idx` private, and replace all uses with an accessor. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22646#issuecomment-2577758647 From epeter at openjdk.org Wed Jan 8 14:11:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 14:11:46 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2] In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 14:00:24 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates after comments > > src/hotspot/share/opto/gcm.cpp line 757: > >> 755: // In some cases, there are other relevant initial memory states besides >> 756: // initial_mem. In such cases, we are rather dealing with multiple trees and >> 757: // their fringes. > > If I look at these comments here (I reviewed a change by Roland a few months back, so my memory is coming back)... > I see that the load is supposed to be scheduled before any `Memory state modifying nodes include Store and Phi` that is (transitively via any MergeMem) below the `initial_mem`. > > In you example1, why do we therefore not put an anti-dependency edge betweeen the `183 load`, and the `106 Phi`? Would that not be enough to ensure the load is scheduled before the other memory affecting nodes further below `106 Phi`? > > Or is the issue that this traversal is somehow restricted to blocks - I don't remember that from last time... > I'll keep reading the changes now. And in example 2, we should schedule before the Phi as well: ![image](https://github.com/user-attachments/assets/3d035602-fe4b-4c34-98fe-d2935fed92e0) Why don't we do that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1907242566 From duke at openjdk.org Wed Jan 8 14:12:36 2025 From: duke at openjdk.org (Yagmur Eren) Date: Wed, 8 Jan 2025 14:12:36 GMT Subject: RFR: 8345580: Remove const from Node::_idx which is modified [v2] In-Reply-To: <-J84pRRV-7TDwl8CG8lBgLGWYarVHtBmfbbHjq5W9C4=.8c732404-0c2d-473f-9563-dd7ad55c4923@github.com> References: <9Y-j_kyLeTWwraDrmfZWu1vle6fFxiwczcOcSNLyfMs=.a8ef5382-6237-4edd-b272-c5328f0884bc@github.com> <-J84pRRV-7TDwl8CG8lBgLGWYarVHtBmfbbHjq5W9C4=.8c732404-0c2d-473f-9563-dd7ad55c4923@github.com> Message-ID: On Wed, 8 Jan 2025 13:57:39 GMT, Quan Anh Mai wrote: >> Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: >> >> remove casting trick in set_idx > > The original intent is to avoid mistaken modification of the field, so I think it may be beneficial to change all accesses to a getter and limit the accessibility of the setter using friend. It should be done as a separate patch, I'm quite scared of the aggressiveness of the C++ compiler and not sure we always compile correctly. @merykitty thanks for the review! > Please file a Follow-up RFE. Make Node::_idx private, and replace all uses with an accessor. @eme64, sounds like a good idea. I'll file an enhancement. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22646#issuecomment-2577766517 From epeter at openjdk.org Wed Jan 8 14:13:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 14:13:40 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v9] In-Reply-To: <03ozC1NfpoBMN8fyLJY6gt2_7GZQpDtTHEj8cgxD_dU=.dd851537-820d-4b72-acf9-b170aa756e4b@github.com> References: <03ozC1NfpoBMN8fyLJY6gt2_7GZQpDtTHEj8cgxD_dU=.dd851537-820d-4b72-acf9-b170aa756e4b@github.com> Message-ID: On Mon, 16 Dec 2024 14:19:49 GMT, Jatin Bhateja wrote: >>> > Can you quickly summarize what tests you have, and what they test? >>> >>> Patch includes functional and performance tests, as per your suggestions IR framework-based tests now cover various special cases for constant folding transformation. Let me know if you see any gaps. >> >> I was hoping that you could make a list of all optimizations that are included here, and tell me where the tests are for it. That would significantly reduce the review time on my end. Otherwise I have to correlate everything myself, and that will take me hours. > >> > > Can you quickly summarize what tests you have, and what they test? >> > >> > >> > Patch includes functional and performance tests, as per your suggestions IR framework-based tests now cover various special cases for constant folding transformation. Let me know if you see any gaps. >> >> I was hoping that you could make a list of all optimizations that are included here, and tell me where the tests are for it. That would significantly reduce the review time on my end. Otherwise I have to correlate everything myself, and that will take me hours. > > > Validations details:- > > A) x86 backend changes > - new assembler instruction > - macro assembly routines. > Test point:- test/jdk/jdk/incubator/vector/ScalarFloat16OperationsTest.java > - This test is based on a testng framework and includes new DataProviders to generate test vectors. > - Test vectors cover the entire float16 value range and also special floating point values (NaN, +Int, -Inf, 0.0 and -0.0) > B) GVN transformations:- > - Value Transforms > Test point:- test test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java > - Covers all the constant folding scenarios for add, sub, mul, div, sqrt, fma, min, and max operations addressed by this patch. > - It also tests special case scenarios for each operation as specified by Java language specification. > - identity Transforms > Test point:- test test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java > - Covers identity transformation for ReinterpretS2HFNode, DivHFNode > - idealization Transforms > Test points:- test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java > :- test test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java > - Contains test point for the following transform > MulHF idealization i.e. MulHF * 2 => AddHF > - Contains test point for the following transform > DivHF SRC , PoT(constant) => MulHF SRC * reciprocal (constant) > - Contains idealization test points for the following transform > ConvF2HF(FP32BinOp(ConvHF2F(x), ConvHF2F(y))) => > ReinterpretHF2S(FP16BinOp(ReinterpretS2HF(x), ReinterpretS2HF(y))) @jatin-bhateja Is this ready for another review pass? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22754#issuecomment-2577768041 From roland at openjdk.org Wed Jan 8 14:16:37 2025 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 8 Jan 2025 14:16:37 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v2] In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 13:59:00 GMT, Tobias Hartmann wrote: >>> By that you mean have the slow_region feed into the uncommon trap that's only created later. >> >> Right. It's a bit weird but probably still the best solution in terms of complexity. The trap will lead to recompilation and then the `too_many_traps` check will trigger. >> >>> The other thing is shouldn't the cast be added in generate_non_array_guard()? I see it's used elsewhere (LibraryCallKit::inline_native_getLength()): couldn't the same bug occur there? >> >> Right, good catch. I think the use in `LibraryCallKit::inline_native_getLength` has the same problem. We can't easily put the cast into `generate_non_array_guard` though because it operates on the Klass and not on the object. The other `generate*array*guard` methods potentially have the same issue but current uses look good. I guess it's best to fix the `LibraryCallKit::inline_native_getLength` as well, i.e., make it the caller's responsibility to add a cast. What do you think? > > Hmm, maybe `inline_getObjectSize` is affected as well: > > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L8535-L8543 > > And `LibraryCallKit::inline_native_clone` as well: > > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5257-L5262 > I guess it's best to fix the `LibraryCallKit::inline_native_getLength` as well, i.e., make it the caller's responsibility to add a cast. What do you think? Maybe the methods need to take an extra parameter (the object to cast)? Having the cast in the method would lead to less code duplication and a lower risk of forgetting the cast when new calls of the method are added so that's what I would go with unless it's really a pain. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1907249957 From qamai at openjdk.org Wed Jan 8 14:48:37 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 8 Jan 2025 14:48:37 GMT Subject: RFR: 8345580: Remove const from Node::_idx which is modified [v2] In-Reply-To: References: <9Y-j_kyLeTWwraDrmfZWu1vle6fFxiwczcOcSNLyfMs=.a8ef5382-6237-4edd-b272-c5328f0884bc@github.com> Message-ID: <_yc20IczkSNEitOGPYHMSMdzV2gKrPdDQKk4c_Nlb1w=.8c4cbe5b-7da5-481b-be92-0ca06374a20b@github.com> On Wed, 8 Jan 2025 14:06:53 GMT, Emanuel Peter wrote: >> Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: >> >> remove casting trick in set_idx > > Ok, just discussed it with some other engineers: > Please file a Follow-up RFE. Make `Node::_idx` private, and replace all uses with an accessor. You need 2 people with the reviewer role for this, not sure if it is the intention of @eme64 ------------- PR Comment: https://git.openjdk.org/jdk/pull/22646#issuecomment-2577850073 From duke at openjdk.org Wed Jan 8 14:54:38 2025 From: duke at openjdk.org (Yagmur Eren) Date: Wed, 8 Jan 2025 14:54:38 GMT Subject: RFR: 8345580: Remove const from Node::_idx which is modified [v2] In-Reply-To: References: <9Y-j_kyLeTWwraDrmfZWu1vle6fFxiwczcOcSNLyfMs=.a8ef5382-6237-4edd-b272-c5328f0884bc@github.com> Message-ID: <0_9qnnULKUSiN2W82J-1C9w6ipDAMLpRounLQJgMl30=.04c093b1-3868-4747-9382-2530198971b6@github.com> On Wed, 8 Jan 2025 14:06:53 GMT, Emanuel Peter wrote: >> Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: >> >> remove casting trick in set_idx > > Ok, just discussed it with some other engineers: > Please file a Follow-up RFE. Make `Node::_idx` private, and replace all uses with an accessor. > You need 2 people with the reviewer role for this, not sure if it is the intention of @eme64 Yes, I realized a bit late that the second one was a committer approval... Thanks for highlighting it @merykitty. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22646#issuecomment-2577863922 From tweidmann at openjdk.org Wed Jan 8 14:56:47 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Wed, 8 Jan 2025 14:56:47 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v6] In-Reply-To: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: <6dCasXkVmDYEy3mhQ88IdgWLUttipBz1LRPhIRJRYgA=.dff6df33-efa7-4dfa-a7f6-9897ca96f23c@github.com> > This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: > >> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >> >> I would like to start a collection of such generators, that can then be used in testing. >> >> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >> >> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >> >> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 > > The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: Add safe restrict ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22941/files - new: https://git.openjdk.org/jdk/pull/22941/files/b3408c02..6e247dcd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=04-05 Stats: 86 lines in 3 files changed: 85 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22941/head:pull/22941 PR: https://git.openjdk.org/jdk/pull/22941 From tweidmann at openjdk.org Wed Jan 8 14:56:47 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Wed, 8 Jan 2025 14:56:47 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v6] In-Reply-To: References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> <2RYMpkapuoewCcYdF8OoT94EZGN1cLGgJnf7oozRQrg=.cf85bf87-10ac-4ee8-946d-e0496b0a5c83@github.com> Message-ID: <9moUX9NPEqVdDk8tlAEl965VGVFbeYnbBbIlClI59aY=.8e2c0aab-5b08-46e2-901a-164184905182@github.com> On Tue, 7 Jan 2025 14:22:53 GMT, Emanuel Peter wrote: >> But what is it good for to draw from a range with no elements? What is supposed to happen then? > > Well, it is more about this: > I want to be able to draw from restricted ranges in the Templates. But the distribution should be random. Maybe the solution is just to make sure that for `Generators.ints`, we always mix in uniform, but at a very low weight. That way, if all other sub-distribution of a mixed distribution fall away (empty), we at least still have the uniform distribution. > Because if the template wants a range, then we must sample something from that range. > Does that make sense? As discussed I have added G.safeRestrict* Methods that will try to call restrict but instead return a uniform generator of the same range in case the restriction is impossible. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1907308740 From epeter at openjdk.org Wed Jan 8 15:09:42 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 15:09:42 GMT Subject: RFR: 8345580: Remove const from Node::_idx which is modified [v2] In-Reply-To: <_yc20IczkSNEitOGPYHMSMdzV2gKrPdDQKk4c_Nlb1w=.8c4cbe5b-7da5-481b-be92-0ca06374a20b@github.com> References: <9Y-j_kyLeTWwraDrmfZWu1vle6fFxiwczcOcSNLyfMs=.a8ef5382-6237-4edd-b272-c5328f0884bc@github.com> <_yc20IczkSNEitOGPYHMSMdzV2gKrPdDQKk4c_Nlb1w=.8c4cbe5b-7da5-481b-be92-0ca06374a20b@github.com> Message-ID: On Wed, 8 Jan 2025 14:45:33 GMT, Quan Anh Mai wrote: >> Ok, just discussed it with some other engineers: >> Please file a Follow-up RFE. Make `Node::_idx` private, and replace all uses with an accessor. > > You need 2 people with the reviewer role for this, not sure if it is the intention of @eme64 I am lowering it, @merykitty is quite competent at reviewing this :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22646#issuecomment-2577903214 From duke at openjdk.org Wed Jan 8 15:13:42 2025 From: duke at openjdk.org (duke) Date: Wed, 8 Jan 2025 15:13:42 GMT Subject: RFR: 8345580: Remove const from Node::_idx which is modified [v2] In-Reply-To: <9Y-j_kyLeTWwraDrmfZWu1vle6fFxiwczcOcSNLyfMs=.a8ef5382-6237-4edd-b272-c5328f0884bc@github.com> References: <9Y-j_kyLeTWwraDrmfZWu1vle6fFxiwczcOcSNLyfMs=.a8ef5382-6237-4edd-b272-c5328f0884bc@github.com> Message-ID: On Wed, 18 Dec 2024 14:36:11 GMT, Yagmur Eren wrote: >> `Node::_idx` is declared as `const`, however, it is modified by `Node::set_idx`. Please see: https://github.com/openjdk/jdk/blob/166c12771d9d8c466e73a9490c4eb1fc9a5f6c24/src/hotspot/share/opto/node.hpp#L588 >> >> As already stated in [JDK-8345580](https://bugs.openjdk.org/browse/JDK-8345580) issue, this behavior is counterintuitive because a const variable is expected to remain unmodified throughout its lifetime. Additionally, C++ International Standard states that "_...any attempt to modify a `const` object during its lifetime results in undefined behavior._ ". >> To address this, `const` should be removed from the declaration of `Node::_idx` to align with its intended use and avoid violating the C++ standard. Tested with tier1,2,3,4,5. > > Yagmur Eren has updated the pull request incrementally with one additional commit since the last revision: > > remove casting trick in set_idx @nelanbu Your change (at version 2322aa9c7ed161e75fe42731be9cadef6fead459) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22646#issuecomment-2577912816 From duke at openjdk.org Wed Jan 8 15:17:58 2025 From: duke at openjdk.org (Yagmur Eren) Date: Wed, 8 Jan 2025 15:17:58 GMT Subject: Integrated: 8345580: Remove const from Node::_idx which is modified In-Reply-To: References: Message-ID: On Mon, 9 Dec 2024 15:16:32 GMT, Yagmur Eren wrote: > `Node::_idx` is declared as `const`, however, it is modified by `Node::set_idx`. Please see: https://github.com/openjdk/jdk/blob/166c12771d9d8c466e73a9490c4eb1fc9a5f6c24/src/hotspot/share/opto/node.hpp#L588 > > As already stated in [JDK-8345580](https://bugs.openjdk.org/browse/JDK-8345580) issue, this behavior is counterintuitive because a const variable is expected to remain unmodified throughout its lifetime. Additionally, C++ International Standard states that "_...any attempt to modify a `const` object during its lifetime results in undefined behavior._ ". > To address this, `const` should be removed from the declaration of `Node::_idx` to align with its intended use and avoid violating the C++ standard. Tested with tier1,2,3,4,5. This pull request has now been integrated. Changeset: ae3fc464 Author: Yagmur Eren Committer: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/ae3fc464563ad1ba59883ccf60d235b42f5ad7fa Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod 8345580: Remove const from Node::_idx which is modified Reviewed-by: epeter, qamai ------------- PR: https://git.openjdk.org/jdk/pull/22646 From epeter at openjdk.org Wed Jan 8 15:22:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 8 Jan 2025 15:22:43 GMT Subject: RFR: 8345580: Remove const from Node::_idx which is modified [v2] In-Reply-To: <0_9qnnULKUSiN2W82J-1C9w6ipDAMLpRounLQJgMl30=.04c093b1-3868-4747-9382-2530198971b6@github.com> References: <9Y-j_kyLeTWwraDrmfZWu1vle6fFxiwczcOcSNLyfMs=.a8ef5382-6237-4edd-b272-c5328f0884bc@github.com> <0_9qnnULKUSiN2W82J-1C9w6ipDAMLpRounLQJgMl30=.04c093b1-3868-4747-9382-2530198971b6@github.com> Message-ID: On Wed, 8 Jan 2025 14:51:34 GMT, Yagmur Eren wrote: >> Ok, just discussed it with some other engineers: >> Please file a Follow-up RFE. Make `Node::_idx` private, and replace all uses with an accessor. > >> You need 2 people with the reviewer role for this, not sure if it is the intention of @eme64 > > Yes, I realized a bit late that the second one was a committer approval... Thanks for highlighting it @merykitty. @nelanbu do you want to work on https://bugs.openjdk.org/browse/JDK-8347275 yourself, or should I find a new-hire / beginner external to work on this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22646#issuecomment-2577937823 From dlunden at openjdk.org Wed Jan 8 15:24:39 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 8 Jan 2025 15:24:39 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2] In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 14:08:40 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/gcm.cpp line 757: >> >>> 755: // In some cases, there are other relevant initial memory states besides >>> 756: // initial_mem. In such cases, we are rather dealing with multiple trees and >>> 757: // their fringes. >> >> If I look at these comments here (I reviewed a change by Roland a few months back, so my memory is coming back)... >> I see that the load is supposed to be scheduled before any `Memory state modifying nodes include Store and Phi` that is (transitively via any MergeMem) below the `initial_mem`. >> >> In you example1, why do we therefore not put an anti-dependency edge betweeen the `183 load`, and the `106 Phi`? Would that not be enough to ensure the load is scheduled before the other memory affecting nodes further below `106 Phi`? >> >> Or is the issue that this traversal is somehow restricted to blocks - I don't remember that from last time... >> I'll keep reading the changes now. > > And in example 2, we should schedule before the Phi as well: > ![image](https://github.com/user-attachments/assets/3d035602-fe4b-4c34-98fe-d2935fed92e0) > > Why don't we do that? Thanks for the comments @eme64! > In you example1, why do we therefore not put an anti-dependency edge betweeen the 183 load, and the 106 Phi? Would that not be enough to ensure the load is scheduled before the other memory affecting nodes further below 106 Phi? > > Or is the issue that this traversal is somehow restricted to blocks - I don't remember that from last time... I'll keep reading the changes now. Yes, Phis only result in LCA changes at the block level and we never add anti-dependence edges directly between the load and Phi nodes. In example 1, we do mark the last block in between the path from 107 Phi to 106 Phi (which is B27) for raising the LCA. However, when doing the joint LCA raising operation later on (`raise_LCA_above_marks`), we start at the original LCA and stop when we reach the early block (B20). Therefore, we never even consider B27. My very first attempt at solving this issue was to try and identify some dominance relation between the early block and the blocks for and in between 107 Phi and 106 Phi and use this information to force the LCA to the early block. This kind of worked at the block level, but we still need to identify somehow that we need an anti-dependence edge to 64 membar_release. Otherwise, it can happen that the load is scheduled correctly in the early block, but incorrectly (after an overwriting store) within the block (easily verified with `-XX: +StressLCM`). > And in example 2, we should schedule before the Phi as well: Why don't we do that? Same here as above, we do tag both B19 and B21 for raising the LCA, but never consider them in `raise_LCA_above_marks` since they are above the early block B9. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1907363858 From syan at openjdk.org Wed Jan 8 15:26:44 2025 From: syan at openjdk.org (SendaoYan) Date: Wed, 8 Jan 2025 15:26:44 GMT Subject: [jdk24] RFR: 8346965: Multiple compiler/ciReplay test fails with -XX:+SegmentedCodeCache In-Reply-To: References: Message-ID: <7ZXEQSQc3S6-tTAcesNlXXwLDvauM3OF6ZFNhGvjKpI=.76d6efce-1f61-40d6-b215-2900daabe58f@github.com> On Tue, 7 Jan 2025 15:14:34 GMT, SendaoYan wrote: > Hi all, > > This pull request contains a backport of commit [cf3e48e7](https://github.com/openjdk/jdk/commit/cf3e48e77172db7e27530af9754e1ead8d493f52) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by SendaoYan on 7 Jan 2025 and was reviewed by Vladimir Kozlov. > > Thanks! Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22950#issuecomment-2577945479 From syan at openjdk.org Wed Jan 8 15:26:44 2025 From: syan at openjdk.org (SendaoYan) Date: Wed, 8 Jan 2025 15:26:44 GMT Subject: [jdk24] Integrated: 8346965: Multiple compiler/ciReplay test fails with -XX:+SegmentedCodeCache In-Reply-To: References: Message-ID: On Tue, 7 Jan 2025 15:14:34 GMT, SendaoYan wrote: > Hi all, > > This pull request contains a backport of commit [cf3e48e7](https://github.com/openjdk/jdk/commit/cf3e48e77172db7e27530af9754e1ead8d493f52) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by SendaoYan on 7 Jan 2025 and was reviewed by Vladimir Kozlov. > > Thanks! This pull request has now been integrated. Changeset: c3b52089 Author: SendaoYan URL: https://git.openjdk.org/jdk/commit/c3b52089f6d504370e10105bda781513c1f66246 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8346965: Multiple compiler/ciReplay test fails with -XX:+SegmentedCodeCache Reviewed-by: epeter Backport-of: cf3e48e77172db7e27530af9754e1ead8d493f52 ------------- PR: https://git.openjdk.org/jdk/pull/22950 From thartmann at openjdk.org Wed Jan 8 15:55:49 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 8 Jan 2025 15:55:49 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v2] In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 14:13:32 GMT, Roland Westrelin wrote: >> Hmm, maybe `inline_getObjectSize` is affected as well: >> >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L8535-L8543 >> >> And `LibraryCallKit::inline_native_clone` as well: >> >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5257-L5262 > >> I guess it's best to fix the `LibraryCallKit::inline_native_getLength` as well, i.e., make it the caller's responsibility to add a cast. What do you think? > > Maybe the methods need to take an extra parameter (the object to cast)? > Having the cast in the method would lead to less code duplication and a lower risk of forgetting the cast when new calls of the method are added so that's what I would go with unless it's really a pain. Right, I'll give that a try. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1907411409 From qamai at openjdk.org Wed Jan 8 16:17:04 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 8 Jan 2025 16:17:04 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v2] In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 13:08:52 GMT, Tobias Hartmann wrote: >> C2's arraycopy intrinsic adds guards that check that the source and destination objects are arrays: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5917-L5919 >> >> If these guards pass, the array length is loaded: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5930-L5933 >> >> But since the `LoadRangeNode` is not pinned, it might float above the array guard: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/graphKit.cpp#L1214 >> >> If the object is not an array, we will read garbage. That's usually fine because the result will not be used (the array guard will trigger) but with `-XX:+UseCompactObjectHeaders` it can happen that the memory right after the header is not mapped and we crash. >> >> The fix is to add a `CheckCastPPNode` to propagate the information that the operand is an array and prevent the load from floating. >> >> Thanks to @shipilev for identifying the root cause! >> >> I was able to reliably reproduce the issue with `compiler/arraycopy/TestArrayCopyNoInit.java` and `-XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders` on Linux AArch64 and verified that the fix solves the problem. >> >> Best regards, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Added missing stopped checks, refactoring and updated copyright dates src/hotspot/share/opto/library_call.cpp line 5921: > 5919: generate_non_array_guard(load_object_klass(src), slow_region); > 5920: if (!stopped()) { > 5921: src = _gvn.transform(new CheckCastPPNode(control(), src, TypeAryPtr::BOTTOM)); Why is this a `CheckCastPP` and not a `CastPP`? My understanding is that a `CheckCastPP` is used when we force changing the type of a node (e.g a raw pointer of `Allocate` into a typed pointer), so we do not join the type of the input with that of the output. src/hotspot/share/opto/type.hpp line 1476: > 1474: > 1475: // Convenience common pre-built types. > 1476: static const TypeAryPtr* BOTTOM; While you are here it may be better to change the other constant to `TypeAryPtr*` instead of `TypeAryPtr *` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1907435011 PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1907440010 From dlong at openjdk.org Thu Jan 9 01:26:39 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 9 Jan 2025 01:26:39 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v6] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Mon, 16 Dec 2024 16:59:57 GMT, Boris Ulasevich wrote: >> This change relocates mutable data (such as relocations, oops, and metadata) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data from the CodeCache. >> >> The core idea remains the same: use the CodeCache for executable code while moving additional data to the C heap. The primary motivations are improving security and enhancing code density. >> >> Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. >> >> The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): >> - nmethod_count:134000, total_compilation_time: 510460ms >> - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, >> - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB >> >> Functional testing: jtreg on arm/aarch/x86. >> Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. >> >> Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > removing dead code This looks good now, but I just noticed the old code would have replaced the oop_Relocation with a internal_word_Relocation for the NearCpool==false case. How did that ever work correctly? ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21276#pullrequestreview-2538594542 From fyang at openjdk.org Thu Jan 9 02:01:05 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 9 Jan 2025 02:01:05 GMT Subject: RFR: 8346787: Fix two C2 IR matching tests for RISC-V [v3] In-Reply-To: References: Message-ID: <8behYr4lJA2W-WCl2Vi3wSbOvCubxu27rIPIXV_hiCw=.15410d14-1847-41b2-9c3b-309df605cd29@github.com> > Two IR matching tests added by [JDK-8332268](https://bugs.openjdk.org/browse/JDK-8332268) are failing on RISC-V: > TEST: compiler/c2/irTests/ModINodeIdealizationTests.java > TEST: compiler/c2/irTests/ModLNodeIdealizationTests.java > > These two tests require conditional move support. See ModLNode::Ideal & ModLNode::Ideal [1][2]. But RISC-V base ISA (RV64GCV) does not support conditional move, so we set `ConditionalMoveLimit` to 0 for this CPU platform. This change simply skips these two tests for now. > > Some further information: > An initial version of conditional move based on RISC-V `Zicond` extension has been added by: [JDK-8344306](https://bugs.openjdk.org/browse/JDK-8344306). But that still lacks performance tunning and we will reconsider and adjust this `ConditionalMoveLimit` parameter as we go. New issue: [JDK-8346786](https://bugs.openjdk.org/browse/JDK-8346786). > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L993 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L1253 Fei Yang has updated the pull request incrementally with one additional commit since the last revision: Comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22874/files - new: https://git.openjdk.org/jdk/pull/22874/files/27884e81..c6302209 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22874&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22874&range=01-02 Stats: 10 lines in 2 files changed: 4 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/22874.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22874/head:pull/22874 PR: https://git.openjdk.org/jdk/pull/22874 From fyang at openjdk.org Thu Jan 9 02:10:35 2025 From: fyang at openjdk.org (Fei Yang) Date: Thu, 9 Jan 2025 02:10:35 GMT Subject: RFR: 8346787: Fix two C2 IR matching tests for RISC-V [v2] In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 11:43:00 GMT, Hamlin Li wrote: > Thanks for fixing the failing tests. Some comments, seems only powerOf2Minus1 in both tests failed. Can we only disable the IR verification of powerOf2Minus1, and keep other test enabled? Good suggestion! I have updated accordingly and rerun the two tests. Both of them are now selected and pass. And IR verification of powerOf2Minus1 is skipped for riscv64 platform. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22874#issuecomment-2579024775 From fjiang at openjdk.org Thu Jan 9 02:32:37 2025 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 9 Jan 2025 02:32:37 GMT Subject: RFR: 8346787: Fix two C2 IR matching tests for RISC-V [v3] In-Reply-To: <8behYr4lJA2W-WCl2Vi3wSbOvCubxu27rIPIXV_hiCw=.15410d14-1847-41b2-9c3b-309df605cd29@github.com> References: <8behYr4lJA2W-WCl2Vi3wSbOvCubxu27rIPIXV_hiCw=.15410d14-1847-41b2-9c3b-309df605cd29@github.com> Message-ID: On Thu, 9 Jan 2025 02:01:05 GMT, Fei Yang wrote: >> Two IR matching tests added by [JDK-8332268](https://bugs.openjdk.org/browse/JDK-8332268) are failing on RISC-V: >> TEST: compiler/c2/irTests/ModINodeIdealizationTests.java >> TEST: compiler/c2/irTests/ModLNodeIdealizationTests.java >> >> These two tests require conditional move support. See ModLNode::Ideal & ModLNode::Ideal [1][2]. But RISC-V base ISA (RV64GCV) does not support conditional move, so we set `ConditionalMoveLimit` to 0 for this CPU platform. This change simply skips these two tests for now. >> >> Some further information: >> An initial version of conditional move based on RISC-V `Zicond` extension has been added by: [JDK-8344306](https://bugs.openjdk.org/browse/JDK-8344306). But that still lacks performance tunning and we will reconsider and adjust this `ConditionalMoveLimit` parameter as we go. New issue: [JDK-8346786](https://bugs.openjdk.org/browse/JDK-8346786). >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L993 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L1253 > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Comment Marked as reviewed by fjiang (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22874#pullrequestreview-2538623413 From amitkumar at openjdk.org Thu Jan 9 05:45:08 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 9 Jan 2025 05:45:08 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation [v8] In-Reply-To: References: Message-ID: > Lazy computation of TypeFunc. > > Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - Merge branch 'master' into tf_v2 - fix test case - new year - fixes code style, restores gc changes - [wip] renamed caches, changed type factories, refactored code - Merge branch 'master' into tf_v2 - test fix - Merge branch 'master' into tf_v2 - fixing the merge conflict - cover more TypeFunc objects - ... and 7 more: https://git.openjdk.org/jdk/compare/a46ae703...22d97e4c ------------- Changes: https://git.openjdk.org/jdk/pull/21782/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21782&range=07 Stats: 765 lines in 8 files changed: 510 ins; 91 del; 164 mod Patch: https://git.openjdk.org/jdk/pull/21782.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21782/head:pull/21782 PR: https://git.openjdk.org/jdk/pull/21782 From amitkumar at openjdk.org Thu Jan 9 05:45:08 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 9 Jan 2025 05:45:08 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation [v5] In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 03:18:44 GMT, Vladimir Ivanov wrote: > Otherwise, you need to guard it with INCLUDE_SHENANDOAHGC there. @iwanowww Should I do this ? Because at least we have to have make a call, initialization call, from shared space otherwise we can't cache it. I can do some refactoring and that first call I can guard with `INCLUDE_SHENANDOAHGC` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1896423212 From vlivanov at openjdk.org Thu Jan 9 05:45:08 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 9 Jan 2025 05:45:08 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation [v5] In-Reply-To: References: Message-ID: On Tue, 24 Dec 2024 05:52:39 GMT, Amit Kumar wrote: >> src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp line 438: >> >>> 436: >>> 437: const TypeFunc* ShenandoahBarrierSetC2::write_ref_field_pre_Type() { >>> 438: return OptoRuntime::write_ref_field_pre_Type(); >> >> Please, keep them local to `ShenandoahBarrierSetC2`. Otherwise, you need to guard it with `INCLUDE_SHENANDOAHGC` there. > >> Otherwise, you need to guard it with INCLUDE_SHENANDOAHGC there. > > @iwanowww Should I do this ? Because at least we have to have make a call, initialization call, from shared space otherwise we can't cache it. > > I can do some refactoring and that first call I can guard with `INCLUDE_SHENANDOAHGC` Good point. I still think it's cleaner to keep type factories local (including `BarrierSetC2::clone_type()`). GC interface could provide an API point to call from `Type::Initialize_shared()` to trigger GC-specific initialization. If it turns out to be too cumbersome to implement, I'd just leave GC-specific code intact and handle it as a separate RFE. `OptoRuntime` changes already justify the enhancement and look good to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1906013441 From amitkumar at openjdk.org Thu Jan 9 05:45:09 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 9 Jan 2025 05:45:09 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation [v5] In-Reply-To: References: Message-ID: On Tue, 7 Jan 2025 20:39:32 GMT, Vladimir Ivanov wrote: > I still think it's cleaner to keep type factories local (including BarrierSetC2::clone_type()). GC interface could provide an API point to call from Type::Initialize_shared() to trigger GC-specific initialization. I can follow the same approach I followed for `CallNode` and `ArrayCopyNode` structure, if that would be fine ? I also feel that same change can be done for `CallNode` and `ArrayCopyNode` as they also need their own initializer. I am fine with both, fixing it here or fixing with separate RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1906535686 From vlivanov at openjdk.org Thu Jan 9 05:45:09 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 9 Jan 2025 05:45:09 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation [v7] In-Reply-To: References: Message-ID: On Tue, 24 Dec 2024 05:39:50 GMT, Amit Kumar wrote: >> Lazy computation of TypeFunc. >> >> Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > [wip] renamed caches, changed type factories, refactored code src/hotspot/share/opto/callnode.hpp line 1193: > 1191: // > 1192: class LockNode : public AbstractLockNode { > 1193: static const TypeFunc *_lock_type_Type; Comment on code style: `TypeFunc* _lock_type_Type`. src/hotspot/share/opto/callnode.hpp line 1196: > 1194: public: > 1195: > 1196: static inline const TypeFunc *lock_type() { Suggestion: `static inline const TypeFunc* lock_type() {` src/hotspot/share/opto/type.cpp line 716: > 714: mreg2type[Op_VecZ] = TypeVect::VECTZ; > 715: > 716: LockNode::lock_type_init(); Would be nice to have consistent naming here (e.g., `initialize_types` or even `initialize_c2_types`). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1906000714 PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1906001829 PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1906003717 From amitkumar at openjdk.org Thu Jan 9 05:45:09 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 9 Jan 2025 05:45:09 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation [v7] In-Reply-To: References: Message-ID: On Tue, 7 Jan 2025 20:28:20 GMT, Vladimir Ivanov wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> [wip] renamed caches, changed type factories, refactored code > > src/hotspot/share/opto/callnode.hpp line 1196: > >> 1194: public: >> 1195: >> 1196: static inline const TypeFunc *lock_type() { > > Suggestion: `static inline const TypeFunc* lock_type() {` Sorry, that was my editor's default behaviour. Updated. > src/hotspot/share/opto/type.cpp line 716: > >> 714: mreg2type[Op_VecZ] = TypeVect::VECTZ; >> 715: >> 716: LockNode::lock_type_init(); > > Would be nice to have consistent naming here (e.g., `initialize_types` or even `initialize_c2_types`). I have updated name, can you take another look if they look fine now ? Thanks ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1906533137 PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1906532013 From jbhateja at openjdk.org Thu Jan 9 06:10:13 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 9 Jan 2025 06:10:13 GMT Subject: RFR: 8342393: Promote commutative vector IR node sharing [v5] In-Reply-To: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> References: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> Message-ID: > Patch promotes the sharing of commutative vector IR with the same inputs but different input ordering. > Unlike scalar IR where we perform edge swapping by [sorting inputs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/addnode.cpp#L122) based on node indices during IR idealization, for vector IR we chose a simpler approach to decorate commutative operations with a special node-level flag during IR construction thus > obviating any dependency on explicit idealization routines. This flag is later used during GVN hashing to enable node sharing. > > Following are the performance stats for JMH micro included with the patch. > > > Granite Rapids (P-core Xeon Server) > Baseline : > Benchmark (size) Mode Cnt Score Error Units > VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 8982.549 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 6072.773 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2368.856 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 15215.087 ops/ms > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 11963.554 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 7036.088 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2906.731 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 17148.131 ops/ms > > Sierra Forest (E-core Xeon Server) > Baseline: > Benchmark (size) Mode Cnt Score Error Units > VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 2444.359 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 1710.256 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 308.766 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt ... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22863/files - new: https://git.openjdk.org/jdk/pull/22863/files/39184960..e9be0de1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22863&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22863&range=03-04 Stats: 347 lines in 5 files changed: 296 ins; 21 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/22863.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22863/head:pull/22863 PR: https://git.openjdk.org/jdk/pull/22863 From jbhateja at openjdk.org Thu Jan 9 06:10:14 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 9 Jan 2025 06:10:14 GMT Subject: RFR: 8342393: Promote commutative vector IR node sharing [v4] In-Reply-To: References: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> Message-ID: On Wed, 8 Jan 2025 13:02:14 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> strict assertion check on commutative operation input count > > test/hotspot/jtreg/compiler/vectorapi/VectorCommutativeOperSharingTest.java line 139: > >> 137: } >> 138: } >> 139: } > > As mentioned above, it would be good to have all combinations of 2 ops, with inputs x and y: > > add(x,x) with add(x,x) > add(y,x) with add(x,x) > add(x,y) with add(x,x) > add(y,y) with add(x,x) > add(x,x) with add(y,x) > add(y,x) with add(y,x) > add(x,y) with add(y,x) > add(y,y) with add(y,x) > add(x,x) with add(x,y) > add(y,x) with add(x,y) > add(x,y) with add(x,y) > add(y,y) with add(x,y) > add(x,x) with add(y,y) > add(y,x) with add(y,y) > add(x,y) with add(y,y) > add(y,y) with add(y,y) > > At least do this for one operator. All may be a bit much to write... Templates would really be fantastic here... coming soon. Done, I think templatization will add value in some cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22863#discussion_r1908211690 From jbhateja at openjdk.org Thu Jan 9 06:10:13 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 9 Jan 2025 06:10:13 GMT Subject: RFR: 8342393: Promote commutative vector IR node sharing [v2] In-Reply-To: References: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> <9Mx8rZTEbpj0Cxbxl6qYDSipvRA0wWYhjcw2BI7gYzo=.2217a5a9-f031-43db-b0ed-3220a38de382@github.com> <3oh7LT4PookkaCAnTkEFlD52r5Qtik9_UsE5lMt-q5w=.1158bb15-0ad6-4f53-91f0-fc9ddef24596@github.com> Message-ID: On Wed, 8 Jan 2025 12:48:54 GMT, Emanuel Peter wrote: >> Where did you add it? I don't see it in the constructor or `add_flag`. > > I also wonder: if the flag `is_commutative_operation` is available at the `Node` level, and not just `VectorNode`, then should this logic here not be at `Node::hash` rather than `VectorNode::hash`? Otherwise, the flag should probably be called `is_commutative_vector_operation`, right? Yes, let's limit this flag to vector IR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22863#discussion_r1908213214 From jbhateja at openjdk.org Thu Jan 9 06:20:15 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 9 Jan 2025 06:20:15 GMT Subject: RFR: 8342393: Promote commutative vector IR node sharing [v6] In-Reply-To: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> References: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> Message-ID: > Patch promotes the sharing of commutative vector IR with the same inputs but different input ordering. > Unlike scalar IR where we perform edge swapping by [sorting inputs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/addnode.cpp#L122) based on node indices during IR idealization, for vector IR we chose a simpler approach to decorate commutative operations with a special node-level flag during IR construction thus > obviating any dependency on explicit idealization routines. This flag is later used during GVN hashing to enable node sharing. > > Following are the performance stats for JMH micro included with the patch. > > > Granite Rapids (P-core Xeon Server) > Baseline : > Benchmark (size) Mode Cnt Score Error Units > VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 8982.549 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 6072.773 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2368.856 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 15215.087 ops/ms > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 11963.554 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 7036.088 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2906.731 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 17148.131 ops/ms > > Sierra Forest (E-core Xeon Server) > Baseline: > Benchmark (size) Mode Cnt Score Error Units > VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 2444.359 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 1710.256 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 308.766 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt ... Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: Review comments resolutions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22863/files - new: https://git.openjdk.org/jdk/pull/22863/files/e9be0de1..32919318 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22863&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22863&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22863.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22863/head:pull/22863 PR: https://git.openjdk.org/jdk/pull/22863 From qamai at openjdk.org Thu Jan 9 06:36:51 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 9 Jan 2025 06:36:51 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v6] In-Reply-To: References: Message-ID: On Tue, 7 Jan 2025 22:47:58 GMT, Vladimir Ivanov wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Implement apply_identity > > Were there any experiments conducted to port existing lowering transformations to the new pass? > > As we discussed before, there are multiple places in the code where lowering takes place. It is still not clear to me how much proposed solution unifies across existing use cases. What I'd really like to avoid is yet another peculiar way to perform lowering transformations in C2. Hi @iwanowww, to share my thoughts on this, there are 2 places when we do lowering: 1. Macro transform: - This place does lowering in a machine-independent manner. This makes it really awkward try to lower something that is highly dependent on the exact architecture. For example, we want to lower a `MulVL` with a constant into `AddVL`s and `LShiftVL`s. On Arm, long vector multiplication can be done pretty efficiently so we want to be conservative. However, on x86, long multiplication is multiple uops and has a massive latency. As a result, we want to be more aggressive in this transformation. Even worse, `vpmullq` is only available on AVX512, so for AVX2, we want to be even more aggressive, maybe even to the point of unconditionally doing the transformation. - It still does machine-independent idealisation on all the nodes. This is the opposite of machine-dependent lowering purposes. Idealisation tries to simplify the graph so we can do analysis and transformation more easily, while lowering tries to complicate the graph so that the final code can get smaller. For example, let's consider an unsigned vector comparison. During idealisation, we want to keep it as is so that we have an easier time moving it around. However, if the machine does not support unsigned vector comparison, we want to break it down to `x + MIN_VALUE <=> y + MIN_VALUE`. 2. Matching: - This place does not do GVN so we do not have much versatility here. Really this should only lower node in a one-to-one manner if we have `PhaseLowering` from before. - Even worse, the matcher uses a custom grammar, which makes it awkward to work with. This leads to some confusing constructs such as `Matcher::pd_clone_node` and `Matcher::pd_clone_address_expressions`. Furthermore, as it can be seen, there are several patches and to-do work that can benefit from this pass and have mentioned this PR. As a result, I think `PhaseLowering` is a beneficial and necessary addition. Cheers, Quan Anh ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2579276134 From duke at openjdk.org Thu Jan 9 07:29:39 2025 From: duke at openjdk.org (erifan) Date: Thu, 9 Jan 2025 07:29:39 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v6] In-Reply-To: References: Message-ID: On Thu, 9 Jan 2025 06:34:16 GMT, Quan Anh Mai wrote: >> Were there any experiments conducted to port existing lowering transformations to the new pass? >> >> As we discussed before, there are multiple places in the code where lowering takes place. It is still not clear to me how much proposed solution unifies across existing use cases. What I'd really like to avoid is yet another peculiar way to perform lowering transformations in C2. > > Hi @iwanowww, to share my thoughts on this, there are 2 places when we do lowering: > > 1. Macro transform: > - This place does lowering in a machine-independent manner. This makes it really awkward try to lower something that is highly dependent on the exact architecture. For example, we want to lower a `MulVL` with a constant into `AddVL`s and `LShiftVL`s. On Arm, long vector multiplication can be done pretty efficiently so we want to be conservative. However, on x86, long multiplication is multiple uops and has a massive latency. As a result, we want to be more aggressive in this transformation. Even worse, `vpmullq` is only available on AVX512, so for AVX2, we want to be even more aggressive, maybe even to the point of unconditionally doing the transformation. > - It still does machine-independent idealisation on all the nodes. This is the opposite of machine-dependent lowering purposes. Idealisation tries to simplify the graph so we can do analysis and transformation more easily, while lowering tries to complicate the graph so that the final code can get smaller. For example, let's consider an unsigned vector comparison. During idealisation, we want to keep it as is so that we have an easier time moving it around. However, if the machine does not support unsigned vector comparison, we want to break it down to `x + MIN_VALUE <=> y + MIN_VALUE`. > > 2. Matching: > - This place does not do GVN so we do not have much versatility here. Really this should only lower node in a one-to-one manner if we have `PhaseLowering` from before. > - Even worse, the matcher uses a custom grammar, which makes it awkward to work with. This leads to some confusing constructs such as `Matcher::pd_clone_node` and `Matcher::pd_clone_address_expressions`. > > Furthermore, as it can be seen, there are several patches and to-do work that can benefit from this pass and have mentioned this PR. As a result, I think `PhaseLowering` is a beneficial and necessary addition. > > Cheers, > Quan Anh Hi @merykitty I noticed you mentioned the optimization of vector multiplication to shift add. Since I am working on this recently, in order to avoid duplication of work, I'd like to ask if you have any plans to do this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2579337022 From mli at openjdk.org Thu Jan 9 08:00:50 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 Jan 2025 08:00:50 GMT Subject: RFR: 8346787: Fix two C2 IR matching tests for RISC-V [v3] In-Reply-To: <8behYr4lJA2W-WCl2Vi3wSbOvCubxu27rIPIXV_hiCw=.15410d14-1847-41b2-9c3b-309df605cd29@github.com> References: <8behYr4lJA2W-WCl2Vi3wSbOvCubxu27rIPIXV_hiCw=.15410d14-1847-41b2-9c3b-309df605cd29@github.com> Message-ID: On Thu, 9 Jan 2025 02:01:05 GMT, Fei Yang wrote: >> Two IR matching tests added by [JDK-8332268](https://bugs.openjdk.org/browse/JDK-8332268) are failing on RISC-V: >> TEST: compiler/c2/irTests/ModINodeIdealizationTests.java >> TEST: compiler/c2/irTests/ModLNodeIdealizationTests.java >> >> These two tests require conditional move support. See ModLNode::Ideal & ModLNode::Ideal [1][2]. But RISC-V base ISA (RV64GCV) does not support conditional move, so we set `ConditionalMoveLimit` to 0 for this CPU platform. This change simply skips these two tests for now. >> >> Some further information: >> An initial version of conditional move based on RISC-V `Zicond` extension has been added by: [JDK-8344306](https://bugs.openjdk.org/browse/JDK-8344306). But that still lacks performance tunning and we will reconsider and adjust this `ConditionalMoveLimit` parameter as we go. New issue: [JDK-8346786](https://bugs.openjdk.org/browse/JDK-8346786). >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L993 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L1253 > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Comment Thanks for updating, looks good! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22874#pullrequestreview-2538966235 From mli at openjdk.org Thu Jan 9 08:14:40 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 9 Jan 2025 08:14:40 GMT Subject: [jdk24] RFR: 8346868: RISC-V: compiler/sharedstubs tests fail after JDK-8332689 In-Reply-To: References: Message-ID: On Tue, 7 Jan 2025 10:57:48 GMT, Fei Yang wrote: > Hi all, > > Same issue is there in jdk24 repo. > > This pull request contains a backport of commit [3f7052ed](https://github.com/openjdk/jdk/commit/3f7052ed7af89efd1c6977df0b4f3b95fcfec764) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Fei Yang on 7 Jan 2025 and was reviewed by Robbin Ehn and Hamlin Li. > > Thanks! Looks good, Thanks! ------------- Marked as reviewed by mli (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22945#pullrequestreview-2538990372 From epeter at openjdk.org Thu Jan 9 08:21:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 9 Jan 2025 08:21:40 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2] In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 15:21:36 GMT, Daniel Lund?n wrote: > we never add anti-dependence edges directly between the load and Phi nodes Ah interesting. Do you know why we do not do that? Would that generate worse code? Because it seems to me that would add fewer edges, and would probably require a smaller traversal. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1908330988 From duke at openjdk.org Thu Jan 9 08:55:41 2025 From: duke at openjdk.org (Yagmur Eren) Date: Thu, 9 Jan 2025 08:55:41 GMT Subject: RFR: 8345580: Remove const from Node::_idx which is modified [v2] In-Reply-To: <0_9qnnULKUSiN2W82J-1C9w6ipDAMLpRounLQJgMl30=.04c093b1-3868-4747-9382-2530198971b6@github.com> References: <9Y-j_kyLeTWwraDrmfZWu1vle6fFxiwczcOcSNLyfMs=.a8ef5382-6237-4edd-b272-c5328f0884bc@github.com> <0_9qnnULKUSiN2W82J-1C9w6ipDAMLpRounLQJgMl30=.04c093b1-3868-4747-9382-2530198971b6@github.com> Message-ID: On Wed, 8 Jan 2025 14:51:34 GMT, Yagmur Eren wrote: >> Ok, just discussed it with some other engineers: >> Please file a Follow-up RFE. Make `Node::_idx` private, and replace all uses with an accessor. > >> You need 2 people with the reviewer role for this, not sure if it is the intention of @eme64 > > Yes, I realized a bit late that the second one was a committer approval... Thanks for highlighting it @merykitty. > @nelanbu do you want to work on https://bugs.openjdk.org/browse/JDK-8347275 yourself, or should I find a new-hire / beginner external to work on this? I have seen that @merykitty took over it. Thanks for taking care of it! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22646#issuecomment-2579486284 From dlunden at openjdk.org Thu Jan 9 08:58:41 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 9 Jan 2025 08:58:41 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2] In-Reply-To: References: Message-ID: <-JSWKd-qu_IPFR7xS_epeyPiqVg7K_aAYSQCD6BubBI=.9381a704-662d-4ca8-b191-a60f3c30f460@github.com> On Thu, 9 Jan 2025 08:18:34 GMT, Emanuel Peter wrote: >> Thanks for the comments @eme64! >> >>> In you example1, why do we therefore not put an anti-dependency edge betweeen the 183 load, and the 106 Phi? Would that not be enough to ensure the load is scheduled before the other memory affecting nodes further below 106 Phi? >>> >>> Or is the issue that this traversal is somehow restricted to blocks - I don't remember that from last time... >> I'll keep reading the changes now. >> >> Yes, Phis only result in LCA changes at the block level and we never add anti-dependence edges directly between the load and Phi nodes. In example 1, we do mark the last block in between the path from 107 Phi to 106 Phi (which is B27) for raising the LCA. However, when doing the joint LCA raising operation later on (`raise_LCA_above_marks`), we start at the original LCA and stop when we reach the early block (B20). Therefore, we never even consider B27. My very first attempt at solving this issue was to try and identify some dominance relation between the early block and the blocks for and in between 107 Phi and 106 Phi and use this information to force the LCA to the early block. This kind of worked at the block level, but we still need to identify somehow that we need an anti-dependence edge to 64 membar_release. Otherwise, it can happen that the load is scheduled correctly in the early block, but incorrectly (after an overwriting store) within the block (easily verified with `- XX:+StressLCM`). >> >>> And in example 2, we should schedule before the Phi as well: >> Why don't we do that? >> >> Same here as above, we do tag both B19 and B21 for raising the LCA, but never consider them in `raise_LCA_above_marks` since they are above the early block B9. > >> we never add anti-dependence edges directly between the load and Phi nodes > > Ah interesting. Do you know why we do not do that? Would that generate worse code? Because it seems to me that would add fewer edges, and would probably require a smaller traversal. What do you think? My understanding is that anti-dependence edges are only relevant for local scheduling (within blocks). Because Phis merge control-flow paths (by definition at the start of blocks), I would say it makes little sense to add an anti-dependence edge to Phi nodes. Does it make sense semantically to schedule loads before Phi nodes within a block? I don't think so, but I may be wrong. I think what you are getting at is, at the block level, whether or not it is possible to raise the LCA above the Phi itself, rather than before the relevant inputs. That would make the scheduling less conservative. Have a look at [this comment](https://github.com/openjdk/jdk/pull/22852/files#diff-13dc4f80ba6ccaa27b0612318074e35200ffe9314405e30ace331807e56b5f60L870-L876) in the source. There are previous attempts at making the LCA raising less conservative in this manner (see, e.g., [JDK-8192992](https://bugs.openjdk.org/browse/JDK-8192992)), but it turns out to be quite tricky to get right. It is definitely an issue separate from the present one! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1908378186 From tweidmann at openjdk.org Thu Jan 9 09:10:17 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 09:10:17 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v7] In-Reply-To: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: <6z8EHqIn0_1n9AFCIBd9DTz0JC3pMoHReg2ARmmWiUU=.055c378d-437b-4a55-9ac7-f552ed26dd2f@github.com> > This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: > >> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >> >> I would like to start a collection of such generators, that can then be used in testing. >> >> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >> >> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >> >> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 > > The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: Add ExampleTest ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22941/files - new: https://git.openjdk.org/jdk/pull/22941/files/6e247dcd..1e08d990 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=05-06 Stats: 82 lines in 1 file changed: 82 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22941/head:pull/22941 PR: https://git.openjdk.org/jdk/pull/22941 From tweidmann at openjdk.org Thu Jan 9 09:12:37 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 09:12:37 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v7] In-Reply-To: References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: On Tue, 7 Jan 2025 13:25:55 GMT, Emanuel Peter wrote: >> Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Add ExampleTest > > test/hotspot/jtreg/compiler/lib/generators/Generators.java line 39: > >> 37: * optimizations. >> 38: *

>> 39: * Normally, clients get the default Generators instance by referring to the static variable {@link #G}. > > It would be nice to have an example test that uses it as you would expect. I added ExampleTest.java ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1908396784 From epeter at openjdk.org Thu Jan 9 09:28:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 9 Jan 2025 09:28:36 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2] In-Reply-To: <-JSWKd-qu_IPFR7xS_epeyPiqVg7K_aAYSQCD6BubBI=.9381a704-662d-4ca8-b191-a60f3c30f460@github.com> References: <-JSWKd-qu_IPFR7xS_epeyPiqVg7K_aAYSQCD6BubBI=.9381a704-662d-4ca8-b191-a60f3c30f460@github.com> Message-ID: <7Ug2vWAiJTao6xkZxWb1KKiGCAjkoJPCHnNML4t7H2w=.a8896ec0-83ca-4019-9b3c-3ea3b06d4a98@github.com> On Thu, 9 Jan 2025 08:56:28 GMT, Daniel Lund?n wrote: >>> we never add anti-dependence edges directly between the load and Phi nodes >> >> Ah interesting. Do you know why we do not do that? Would that generate worse code? Because it seems to me that would add fewer edges, and would probably require a smaller traversal. What do you think? > > My understanding is that anti-dependence edges are only relevant for local scheduling (within blocks). Because Phis merge control-flow paths (by definition at the start of blocks), I would say it makes little sense to add an anti-dependence edge to Phi nodes. Does it make sense semantically to schedule loads before Phi nodes within a block? I don't think so, but I may be wrong. > > I think what you are getting at is, at the block level, whether or not it is possible to raise the LCA above the Phi itself, rather than before the relevant inputs. That would make the scheduling less conservative. Have a look at [this comment](https://github.com/openjdk/jdk/pull/22852/files#diff-13dc4f80ba6ccaa27b0612318074e35200ffe9314405e30ace331807e56b5f60L870-L876) in the source. There are previous attempts at making the LCA raising less conservative in this manner (see, e.g., [JDK-8192992](https://bugs.openjdk.org/browse/JDK-8192992)), but it turns out to be quite tricky to get right. It is definitely an issue separate from the present one! Hmm, ok. I think the description here should be made more clear then, and explain what the strategy is, and attempt a kind of informal proof why this is an ok approach. What do you think? I mean your comment says there are other cases, but it does not really reassure me that we have all cases covered ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1908420509 From tweidmann at openjdk.org Thu Jan 9 09:32:44 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 09:32:44 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v8] In-Reply-To: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: > This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: > >> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >> >> I would like to start a collection of such generators, that can then be used in testing. >> >> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >> >> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >> >> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 > > The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: Improve javadoc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22941/files - new: https://git.openjdk.org/jdk/pull/22941/files/1e08d990..aa42debe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=06-07 Stats: 10 lines in 1 file changed: 6 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22941/head:pull/22941 PR: https://git.openjdk.org/jdk/pull/22941 From tweidmann at openjdk.org Thu Jan 9 09:36:42 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 09:36:42 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v9] In-Reply-To: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: > This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: > >> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >> >> I would like to start a collection of such generators, that can then be used in testing. >> >> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >> >> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >> >> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 > > The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: Clarify example test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22941/files - new: https://git.openjdk.org/jdk/pull/22941/files/aa42debe..818d0386 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=07-08 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22941/head:pull/22941 PR: https://git.openjdk.org/jdk/pull/22941 From tweidmann at openjdk.org Thu Jan 9 09:59:05 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 09:59:05 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing [v2] In-Reply-To: References: Message-ID: > C2 currently emits runtime calls if the platform rules do not support lowering floating point remainder operations. For example, for float: > > https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L2305-L2318 > > https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L1099-L1109 > > The only platform, which currently supports this, however, is x86_32. On all other platforms, runtime calls are generated directly during parsing, which prevent any constant folding or other idealizations. Even C1 can perform these optimizations, resulting in significantly lower C2 performance compared to C1 for simple test cases. This function was observed to be around 15x slower with C2 compared to C1 due to redundant runtime calls: > > > public static double process(final double x) { > double w = (double) 0.1; > double p = 0; > p = (double) (3.109615012413746E307 % (w % Z)); > p = (double) (7.614949555185036E307 / (x % x)); // <- return value only dependends on this line > return (double) (x * p); > } > > > To fix this, this PR turns ModFNode and ModDNode into macros, which are always created during parsing. They support idealization (constant folding) and are lowered to runtime calls during macro expansion. For simplicity, these operations will now also call into the runtime on x86_32, as this platform is deprecated. Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/callnode.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22786/files - new: https://git.openjdk.org/jdk/pull/22786/files/b4e9838c..974f64b3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22786&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22786&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22786.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22786/head:pull/22786 PR: https://git.openjdk.org/jdk/pull/22786 From tweidmann at openjdk.org Thu Jan 9 09:59:05 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 09:59:05 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing [v2] In-Reply-To: References: <3T6TjR9TRTlU6nvFobZ5zepPOUuejca18NlXbh5AnyE=.943900a8-296b-4376-b8c0-7275c68f7e49@github.com> Message-ID: On Wed, 8 Jan 2025 07:57:25 GMT, Emanuel Peter wrote: >> Suggestion: >> >> if (in(0) == nullptr || phase->type(in(0)) == Type::TOP) { >> >> >> `Do not use ints or pointers as (implicit) booleans with &&, ||, if, while. Instead, compare explicitly, i.e. if (x != 0) or if (ptr != nullptr), etc.` >> >> See >> https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md > > Why can this now be nullptr? This can happen due to the idealizations implemented for ModF/ModD nodes. The same code is also in Valhalla already, btw. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1908466767 From tweidmann at openjdk.org Thu Jan 9 10:02:35 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 10:02:35 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing [v2] In-Reply-To: References: <3T6TjR9TRTlU6nvFobZ5zepPOUuejca18NlXbh5AnyE=.943900a8-296b-4376-b8c0-7275c68f7e49@github.com> Message-ID: On Wed, 8 Jan 2025 08:20:50 GMT, Emanuel Peter wrote: >> He has a call from `Bytecodes::_frem:` and from `Bytecodes::_drem:`. >> >> Why not make it a `BasicType bt` instead of `dbl`, and then switch on that? Might be more readable than true / false. >> I read `floating_point_mod(a, b, true)`, and am not sure what the `true` does. > > Why do you need the `static_cast`? I mean why not use the common type `ModFloatingNode*`, which is a subtype of `CallNode*`, right? > Why you need the case for ModDNode? As @eme64 already explained this function is both for float and double. "floating point" here is supposed to mean both float and double. > Why not make it a BasicType bt instead of dbl, and then switch on that? Might be more readable than true / false. I read floating_point_mod(a, b, true), and am not sure what the true does. Good point. I will change it. > Why do you need the static_cast? I mean why not use the common type ModFloatingNode*, which is a subtype of CallNode*, right? The cast is necessary because of the ternary operator but you are right that ModFloatingNode could be used as a more concrete subtype here. I will change it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1908472641 From tweidmann at openjdk.org Thu Jan 9 10:05:36 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 10:05:36 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing [v2] In-Reply-To: References: <3T6TjR9TRTlU6nvFobZ5zepPOUuejca18NlXbh5AnyE=.943900a8-296b-4376-b8c0-7275c68f7e49@github.com> Message-ID: On Thu, 9 Jan 2025 09:59:29 GMT, Theo Weidmann wrote: >> Why do you need the `static_cast`? I mean why not use the common type `ModFloatingNode*`, which is a subtype of `CallNode*`, right? > >> Why you need the case for ModDNode? > > As @eme64 already explained this function is both for float and double. "floating point" here is supposed to mean both float and double. > >> Why not make it a BasicType bt instead of dbl, and then switch on that? Might be more readable than true / false. > I read floating_point_mod(a, b, true), and am not sure what the true does. > > Good point. I will change it. > >> Why do you need the static_cast? I mean why not use the common type ModFloatingNode*, which is a subtype of CallNode*, right? > > The cast is necessary because of the ternary operator but you are right that ModFloatingNode could be used as a more concrete subtype here. I will change it. Actually the assignment further down from `as_Call` fails, so I'll leave it with CallNode. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1908477564 From thartmann at openjdk.org Thu Jan 9 10:13:21 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 9 Jan 2025 10:13:21 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v3] In-Reply-To: References: Message-ID: > C2's arraycopy intrinsic adds guards that check that the source and destination objects are arrays: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5917-L5919 > > If these guards pass, the array length is loaded: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5930-L5933 > > But since the `LoadRangeNode` is not pinned, it might float above the array guard: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/graphKit.cpp#L1214 > > If the object is not an array, we will read garbage. That's usually fine because the result will not be used (the array guard will trigger) but with `-XX:+UseCompactObjectHeaders` it can happen that the memory right after the header is not mapped and we crash. > > The fix is to add a `CheckCastPPNode` to propagate the information that the operand is an array and prevent the load from floating. > > Thanks to @shipilev for identifying the root cause! > > I was able to reliably reproduce the issue with `compiler/arraycopy/TestArrayCopyNoInit.java` and `-XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders` on Linux AArch64 and verified that the fix solves the problem. > > Best regards, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Moved cast into guard ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22967/files - new: https://git.openjdk.org/jdk/pull/22967/files/5c0292a8..3b465a4b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22967&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22967&range=01-02 Stats: 55 lines in 4 files changed: 8 ins; 8 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/22967.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22967/head:pull/22967 PR: https://git.openjdk.org/jdk/pull/22967 From thartmann at openjdk.org Thu Jan 9 10:22:12 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 9 Jan 2025 10:22:12 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v4] In-Reply-To: References: Message-ID: > C2's arraycopy intrinsic adds guards that check that the source and destination objects are arrays: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5917-L5919 > > If these guards pass, the array length is loaded: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5930-L5933 > > But since the `LoadRangeNode` is not pinned, it might float above the array guard: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/graphKit.cpp#L1214 > > If the object is not an array, we will read garbage. That's usually fine because the result will not be used (the array guard will trigger) but with `-XX:+UseCompactObjectHeaders` it can happen that the memory right after the header is not mapped and we crash. > > The fix is to add a `CheckCastPPNode` to propagate the information that the operand is an array and prevent the load from floating. > > Thanks to @shipilev for identifying the root cause! > > I was able to reliably reproduce the issue with `compiler/arraycopy/TestArrayCopyNoInit.java` and `-XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders` on Linux AArch64 and verified that the fix solves the problem. > > Best regards, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Copyright date ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22967/files - new: https://git.openjdk.org/jdk/pull/22967/files/3b465a4b..0a1fe387 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22967&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22967&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22967.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22967/head:pull/22967 PR: https://git.openjdk.org/jdk/pull/22967 From thartmann at openjdk.org Thu Jan 9 10:22:13 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 9 Jan 2025 10:22:13 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v3] In-Reply-To: References: Message-ID: <7Sz-NJmdqr3NavPPGz7eNcUbCjybbVoFVA6OKll1t8I=.7686c684-71fa-4e37-8dd6-bb722b044d82@github.com> On Thu, 9 Jan 2025 10:13:21 GMT, Tobias Hartmann wrote: >> C2's arraycopy intrinsic adds guards that check that the source and destination objects are arrays: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5917-L5919 >> >> If these guards pass, the array length is loaded: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5930-L5933 >> >> But since the `LoadRangeNode` is not pinned, it might float above the array guard: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/graphKit.cpp#L1214 >> >> If the object is not an array, we will read garbage. That's usually fine because the result will not be used (the array guard will trigger) but with `-XX:+UseCompactObjectHeaders` it can happen that the memory right after the header is not mapped and we crash. >> >> The fix is to add a `CheckCastPPNode` to propagate the information that the operand is an array and prevent the load from floating. >> >> Thanks to @shipilev for identifying the root cause! >> >> I was able to reliably reproduce the issue with `compiler/arraycopy/TestArrayCopyNoInit.java` and `-XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders` on Linux AArch64 and verified that the fix solves the problem. >> >> Best regards, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Moved cast into guard Roland, Quan Anh, thanks for the reviews! I pushed a new version that should address all comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22967#issuecomment-2579707064 From thartmann at openjdk.org Thu Jan 9 10:22:13 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 9 Jan 2025 10:22:13 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v2] In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 16:09:28 GMT, Quan Anh Mai wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Added missing stopped checks, refactoring and updated copyright dates > > src/hotspot/share/opto/library_call.cpp line 5921: > >> 5919: generate_non_array_guard(load_object_klass(src), slow_region); >> 5920: if (!stopped()) { >> 5921: src = _gvn.transform(new CheckCastPPNode(control(), src, TypeAryPtr::BOTTOM)); > > Why is this a `CheckCastPP` and not a `CastPP`? My understanding is that a `CheckCastPP` is used when we force changing the type of a node (e.g a raw pointer of `Allocate` into a typed pointer), so we do not join the type of the input with that of the output. Good point, I changed that. > src/hotspot/share/opto/type.hpp line 1476: > >> 1474: >> 1475: // Convenience common pre-built types. >> 1476: static const TypeAryPtr* BOTTOM; > > While you are here it may be better to change the other constant to `TypeAryPtr*` instead of `TypeAryPtr *` Right, done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1908496196 PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1908496011 From tweidmann at openjdk.org Thu Jan 9 10:33:02 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 10:33:02 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing [v3] In-Reply-To: References: Message-ID: > C2 currently emits runtime calls if the platform rules do not support lowering floating point remainder operations. For example, for float: > > https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L2305-L2318 > > https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L1099-L1109 > > The only platform, which currently supports this, however, is x86_32. On all other platforms, runtime calls are generated directly during parsing, which prevent any constant folding or other idealizations. Even C1 can perform these optimizations, resulting in significantly lower C2 performance compared to C1 for simple test cases. This function was observed to be around 15x slower with C2 compared to C1 due to redundant runtime calls: > > > public static double process(final double x) { > double w = (double) 0.1; > double p = 0; > p = (double) (3.109615012413746E307 % (w % Z)); > p = (double) (7.614949555185036E307 / (x % x)); // <- return value only dependends on this line > return (double) (x * p); > } > > > To fix this, this PR turns ModFNode and ModDNode into macros, which are always created during parsing. They support idealization (constant folding) and are lowered to runtime calls during macro expansion. For simplicity, these operations will now also call into the runtime on x86_32, as this platform is deprecated. Theo Weidmann has updated the pull request incrementally with two additional commits since the last revision: - Merge branch '8345766-floating-mod-macro' of https://github.com/theoweidmannoracle/jdk into 8345766-floating-mod-macro - Use basic type instead ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22786/files - new: https://git.openjdk.org/jdk/pull/22786/files/974f64b3..01b69b47 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22786&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22786&range=01-02 Stats: 6 lines in 2 files changed: 1 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/22786.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22786/head:pull/22786 PR: https://git.openjdk.org/jdk/pull/22786 From djelinski at openjdk.org Thu Jan 9 10:35:42 2025 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Thu, 9 Jan 2025 10:35:42 GMT Subject: RFR: 8345471: Clean up compiler/intrinsics/sha/cli tests In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 15:48:52 GMT, Daniel Jeli?ski wrote: > Merge all the GenericTestCaseForUnsupportedXXXCPU and GenericTestCaseForOtherCPU into GenericTestCaseForUnsupportedCPU.java. > > The CPU-specific files are almost identical; I chose to resolve the differences in favor of the AArch64 version. The OtherCPU version looks wrong, and it wasn't executed on any supported platform. > > The tests continue to pass on linux-aarch64/x64, windows-x64 and mac-aarch64. I didn't test other platforms. > > After the change, the tests will start running on PPC and S390. They will also automatically run on any new architectures. > > For those interested in historical background, when the tests were introduced, there were only 2 supported CPU architectures. X86 did not support any of the intrinsics, and the X86 test case did not even call `getPredicateForOption`. The call to `getPredicateForOption` was added in f2e9b827d699115f8683e9def06c249e5476fd50, and since then all the cases are the same. I'd like to merge this early next week; if anyone still wants to review this but needs more time, please leave a note. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22517#issuecomment-2579764838 From tweidmann at openjdk.org Thu Jan 9 11:12:43 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 11:12:43 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing [v3] In-Reply-To: <26SaQSwYmaVrCRI7OdE7LSanlixD2iV5zvaNqAwfz8Y=.9c8d450e-7453-4f15-8b91-dc612632a931@github.com> References: <26SaQSwYmaVrCRI7OdE7LSanlixD2iV5zvaNqAwfz8Y=.9c8d450e-7453-4f15-8b91-dc612632a931@github.com> Message-ID: On Wed, 8 Jan 2025 07:59:53 GMT, Emanuel Peter wrote: >> Theo Weidmann has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch '8345766-floating-mod-macro' of https://github.com/theoweidmannoracle/jdk into 8345766-floating-mod-macro >> - Use basic type instead > > src/hotspot/share/opto/divnode.cpp line 61: > >> 59: init_req(TypeFunc::Parms + 0, a); >> 60: init_req(TypeFunc::Parms + 1, b); >> 61: } > > Is there a reason to put this in the cpp file? I think I usually see constructors for Nodes in the hpp file. Nitpicky sorry ? Yes, I wanted to avoid importing the runtime header (OptoRuntime) into the divnode header. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1908579896 From tweidmann at openjdk.org Thu Jan 9 11:12:44 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 11:12:44 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing [v3] In-Reply-To: <3T6TjR9TRTlU6nvFobZ5zepPOUuejca18NlXbh5AnyE=.943900a8-296b-4376-b8c0-7275c68f7e49@github.com> References: <3T6TjR9TRTlU6nvFobZ5zepPOUuejca18NlXbh5AnyE=.943900a8-296b-4376-b8c0-7275c68f7e49@github.com> Message-ID: On Tue, 7 Jan 2025 17:00:44 GMT, Vladimir Kozlov wrote: >> Theo Weidmann has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch '8345766-floating-mod-macro' of https://github.com/theoweidmannoracle/jdk into 8345766-floating-mod-macro >> - Use basic type instead > > src/hotspot/share/opto/parse2.cpp line 1103: > >> 1101: >> 1102: Node* prev_mem = set_predefined_input_for_runtime_call(mod); >> 1103: mod = _gvn.transform(mod)->as_Call(); > > Is `as_Call()` used to check with assert? Yes, as_Call does check with assert ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1908577319 From jbhateja at openjdk.org Thu Jan 9 11:33:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 9 Jan 2025 11:33:17 GMT Subject: RFR: 8342393: Promote commutative vector IR node sharing [v7] In-Reply-To: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> References: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> Message-ID: <4cKmKYlejtKkkUOV9NPh4V6qeaPyIczG3zrddCC9CxU=.0c3503a8-72d4-4911-94fc-0a30ac9fed29@github.com> > Patch promotes the sharing of commutative vector IR with the same inputs but different input ordering. > Unlike scalar IR where we perform edge swapping by [sorting inputs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/addnode.cpp#L122) based on node indices during IR idealization, for vector IR we chose a simpler approach to decorate commutative operations with a special node-level flag during IR construction thus > obviating any dependency on explicit idealization routines. This flag is later used during GVN hashing to enable node sharing. > > Following are the performance stats for JMH micro included with the patch. > > > Granite Rapids (P-core Xeon Server) > Baseline : > Benchmark (size) Mode Cnt Score Error Units > VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 8982.549 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 6072.773 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2368.856 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 15215.087 ops/ms > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 11963.554 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 7036.088 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2906.731 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 17148.131 ops/ms > > Sierra Forest (E-core Xeon Server) > Baseline: > Benchmark (size) Mode Cnt Score Error Units > VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 2444.359 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 1710.256 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 308.766 ops/ms > VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt ... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: GHA fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22863/files - new: https://git.openjdk.org/jdk/pull/22863/files/32919318..86d0145c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22863&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22863&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22863.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22863/head:pull/22863 PR: https://git.openjdk.org/jdk/pull/22863 From chagedorn at openjdk.org Thu Jan 9 11:34:46 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 9 Jan 2025 11:34:46 GMT Subject: RFR: 8346184: C2: assert(has_node(i)) failed during split thru phi [v2] In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 13:42:18 GMT, Roland Westrelin wrote: >> The assert fires during split thru phi because a call to `Identity` >> returns a new node (a constant null pointer). That happens because a >> `Load`, once pushed thru phi, can be constant folded because it loads >> from a newly allocated array. `Identity` shouldn't return new >> nodes. When split thru phi runs, in this case, `Value` should be the >> one returning constant null, not `Identity`. There is logic for that >> in `LoadNode::Value` but it's after some other checks that cause >> `Value` to return too early. >> >> To fix this, I propose reordering checks in `LoadNode::Value`. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - more > - Merge branch 'master' into JDK-8346184 > - more > - test > - fix The updated fix looks good. Let me run some testing again. >> Should we generally add some verification code that Identity() does not create new nodes? Hard to do it at all places but it could, for example, be done in the transform() methods of GVN and IGVN when calling Identity(). But of course, should probably be done in a separate RFE. > > That sounds like a good idea. I filed: https://bugs.openjdk.org/browse/JDK-8347266 Thanks! src/hotspot/share/opto/memnode.cpp line 2020: > 2018: // if the load is provably beyond the header of the object. > 2019: // (Also allow a variable load from a fresh array to produce zero.) > 2020: const TypeOopPtr *tinst = tp->isa_oopptr(); While at it: Suggestion: const TypeOopPtr* tinst = tp->isa_oopptr(); ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22818#pullrequestreview-2539669516 PR Review Comment: https://git.openjdk.org/jdk/pull/22818#discussion_r1908613661 From tweidmann at openjdk.org Thu Jan 9 11:37:37 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 11:37:37 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing [v3] In-Reply-To: <26SaQSwYmaVrCRI7OdE7LSanlixD2iV5zvaNqAwfz8Y=.9c8d450e-7453-4f15-8b91-dc612632a931@github.com> References: <26SaQSwYmaVrCRI7OdE7LSanlixD2iV5zvaNqAwfz8Y=.9c8d450e-7453-4f15-8b91-dc612632a931@github.com> Message-ID: On Wed, 8 Jan 2025 08:09:10 GMT, Emanuel Peter wrote: >> Theo Weidmann has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch '8345766-floating-mod-macro' of https://github.com/theoweidmannoracle/jdk into 8345766-floating-mod-macro >> - Use basic type instead > > src/hotspot/share/opto/divnode.cpp line 1409: > >> 1407: } >> 1408: >> 1409: Node* ModFNode::Ideal(PhaseGVN* phase, bool can_reshape) { > > Can you quickly say why you congerted this from a `Value` to an `Ideal` method? I guess it is because before it used to be a simple `Node` with a single output, but now it is a `Call` with multiple outputs... Ok makes sense. Yes, because it's a call now with multiple outputs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1908622479 From tweidmann at openjdk.org Thu Jan 9 11:38:03 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 11:38:03 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v10] In-Reply-To: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: <3RdTHsd8xielC9SojAPzQvcBJrL3pYzdcoghS32hTvE=.1d107ef2-973b-4c2d-ac1d-20dea821e94b@github.com> > This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: > >> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >> >> I would like to start a collection of such generators, that can then be used in testing. >> >> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >> >> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >> >> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 > > The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: Update ExampleTest.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22941/files - new: https://git.openjdk.org/jdk/pull/22941/files/818d0386..4fc2012b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=08-09 Stats: 26 lines in 1 file changed: 6 ins; 7 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/22941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22941/head:pull/22941 PR: https://git.openjdk.org/jdk/pull/22941 From tweidmann at openjdk.org Thu Jan 9 11:45:59 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 11:45:59 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing [v4] In-Reply-To: References: Message-ID: > C2 currently emits runtime calls if the platform rules do not support lowering floating point remainder operations. For example, for float: > > https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L2305-L2318 > > https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L1099-L1109 > > The only platform, which currently supports this, however, is x86_32. On all other platforms, runtime calls are generated directly during parsing, which prevent any constant folding or other idealizations. Even C1 can perform these optimizations, resulting in significantly lower C2 performance compared to C1 for simple test cases. This function was observed to be around 15x slower with C2 compared to C1 due to redundant runtime calls: > > > public static double process(final double x) { > double w = (double) 0.1; > double p = 0; > p = (double) (3.109615012413746E307 % (w % Z)); > p = (double) (7.614949555185036E307 / (x % x)); // <- return value only dependends on this line > return (double) (x * p); > } > > > To fix this, this PR turns ModFNode and ModDNode into macros, which are always created during parsing. They support idealization (constant folding) and are lowered to runtime calls during macro expansion. For simplicity, these operations will now also call into the runtime on x86_32, as this platform is deprecated. Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/macro.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22786/files - new: https://git.openjdk.org/jdk/pull/22786/files/01b69b47..469714b6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22786&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22786&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22786.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22786/head:pull/22786 PR: https://git.openjdk.org/jdk/pull/22786 From tweidmann at openjdk.org Thu Jan 9 11:45:59 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 11:45:59 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing [v4] In-Reply-To: References: <26SaQSwYmaVrCRI7OdE7LSanlixD2iV5zvaNqAwfz8Y=.9c8d450e-7453-4f15-8b91-dc612632a931@github.com> Message-ID: On Wed, 8 Jan 2025 08:04:57 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/divnode.cpp line 1401: >> >>> 1399: //============================================================================= >>> 1400: //------------------------------Idealize--------------------------------------- >>> 1401: Node *UModLNode::Ideal(PhaseGVN *phase, bool can_reshape) { >> >> Suggestion: >> >> Node* UModLNode::Ideal(PhaseGVN* phase, bool can_reshape) { >> >> Ah, maybe you did not mean to touch it, but on GitHub it looks like you did... Maybe you just reordered things. > > Makes it a little trickier to review though. Yes, I think I just re-ordered this without realizing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1908634798 From tweidmann at openjdk.org Thu Jan 9 11:48:57 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 11:48:57 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing [v5] In-Reply-To: References: Message-ID: > C2 currently emits runtime calls if the platform rules do not support lowering floating point remainder operations. For example, for float: > > https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L2305-L2318 > > https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L1099-L1109 > > The only platform, which currently supports this, however, is x86_32. On all other platforms, runtime calls are generated directly during parsing, which prevent any constant folding or other idealizations. Even C1 can perform these optimizations, resulting in significantly lower C2 performance compared to C1 for simple test cases. This function was observed to be around 15x slower with C2 compared to C1 due to redundant runtime calls: > > > public static double process(final double x) { > double w = (double) 0.1; > double p = 0; > p = (double) (3.109615012413746E307 % (w % Z)); > p = (double) (7.614949555185036E307 / (x % x)); // <- return value only dependends on this line > return (double) (x * p); > } > > > To fix this, this PR turns ModFNode and ModDNode into macros, which are always created during parsing. They support idealization (constant folding) and are lowered to runtime calls during macro expansion. For simplicity, these operations will now also call into the runtime on x86_32, as this platform is deprecated. Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/macro.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22786/files - new: https://git.openjdk.org/jdk/pull/22786/files/469714b6..5f0abf21 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22786&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22786&range=03-04 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22786.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22786/head:pull/22786 PR: https://git.openjdk.org/jdk/pull/22786 From epeter at openjdk.org Thu Jan 9 12:08:40 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 9 Jan 2025 12:08:40 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v10] In-Reply-To: References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> <3RdTHsd8xielC9SojAPzQvcBJrL3pYzdcoghS32hTvE=.1d107ef2-973b-4c2d-ac1d-20dea821e94b@github.com> Message-ID: On Thu, 9 Jan 2025 12:03:22 GMT, Emanuel Peter wrote: >> Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Update ExampleTest.java > > test/hotspot/jtreg/testlibrary_tests/generators/tests/TestGenerators.java line 370: > >> 368: Asserts.assertGreaterThanOrEqual(x, lo); >> 369: Asserts.assertLessThanOrEqual(x, hi); >> 370: } > > It might be a bit of work, but it could be nice if we had these range checks for all of the restrictable generators. Well at least those that are safely restrictable without exception. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1908668069 From epeter at openjdk.org Thu Jan 9 12:08:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 9 Jan 2025 12:08:39 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v10] In-Reply-To: <3RdTHsd8xielC9SojAPzQvcBJrL3pYzdcoghS32hTvE=.1d107ef2-973b-4c2d-ac1d-20dea821e94b@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> <3RdTHsd8xielC9SojAPzQvcBJrL3pYzdcoghS32hTvE=.1d107ef2-973b-4c2d-ac1d-20dea821e94b@github.com> Message-ID: On Thu, 9 Jan 2025 11:38:03 GMT, Theo Weidmann wrote: >> This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: >> >>> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >>> >>> I would like to start a collection of such generators, that can then be used in testing. >>> >>> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >>> >>> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >>> >>> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 >> >> The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. > > Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: > > Update ExampleTest.java A few more comments about the tests. test/hotspot/jtreg/testlibrary_tests/generators/tests/ExampleTest.java line 81: > 79: underTest.doIt(g.next()); > 80: } > 81: } Nitpick: why the inner loop, and not just a longer outer loop? test/hotspot/jtreg/testlibrary_tests/generators/tests/TestGenerators.java line 370: > 368: Asserts.assertGreaterThanOrEqual(x, lo); > 369: Asserts.assertLessThanOrEqual(x, hi); > 370: } It might be a bit of work, but it could be nice if we had these range checks for all of the restrictable generators. test/hotspot/jtreg/testlibrary_tests/generators/tests/TestGenerators.java line 480: > 478: mockSource.checkEmpty().enqueueFloat(2, 5, 4); > 479: var f4 = mockGS.safeRestrictFloat(mockGS.uniformFloats(0, 1),2, 5); > 480: Asserts.assertEQ(f4.next(), 4f); You could test `safeRestrict...` on all generators from `G`, and also the random ones like `G.ints()` Pick random restriction bounds, and just check that the results are in bounds ------------- PR Review: https://git.openjdk.org/jdk/pull/22941#pullrequestreview-2539733716 PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1908661169 PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1908667086 PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1908670387 From tweidmann at openjdk.org Thu Jan 9 12:49:45 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 12:49:45 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v11] In-Reply-To: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: > This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: > >> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >> >> I would like to start a collection of such generators, that can then be used in testing. >> >> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >> >> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >> >> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 > > The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. Theo Weidmann has updated the pull request incrementally with two additional commits since the last revision: - Update test/hotspot/jtreg/compiler/lib/generators/UniformIntersectionRestrictableGenerator.java Co-authored-by: Emanuel Peter - Explain MixedGenerator constructor better ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22941/files - new: https://git.openjdk.org/jdk/pull/22941/files/4fc2012b..536f7448 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=09-10 Stats: 12 lines in 2 files changed: 9 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22941/head:pull/22941 PR: https://git.openjdk.org/jdk/pull/22941 From tweidmann at openjdk.org Thu Jan 9 12:49:47 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 12:49:47 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v10] In-Reply-To: References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> <3RdTHsd8xielC9SojAPzQvcBJrL3pYzdcoghS32hTvE=.1d107ef2-973b-4c2d-ac1d-20dea821e94b@github.com> Message-ID: On Thu, 9 Jan 2025 12:05:21 GMT, Emanuel Peter wrote: >> Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Update ExampleTest.java > > test/hotspot/jtreg/testlibrary_tests/generators/tests/TestGenerators.java line 480: > >> 478: mockSource.checkEmpty().enqueueFloat(2, 5, 4); >> 479: var f4 = mockGS.safeRestrictFloat(mockGS.uniformFloats(0, 1),2, 5); >> 480: Asserts.assertEQ(f4.next(), 4f); > > You could test `safeRestrict...` on all generators from `G`, and also the random ones like `G.ints()` > > Pick random restriction bounds, and just check that the results are in bounds I don't think it's necessary to test safeRestrict more extensively. These test cases already exercise all branches and paths through safeRestrict. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1908729605 From tweidmann at openjdk.org Thu Jan 9 12:49:46 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 12:49:46 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v9] In-Reply-To: References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: <8IQFyv5Jelk_OI-IKJAydhTkRvEwPE2xuHjuvd4cVUo=.a925ffc5-2694-4c13-ba27-ffbfa264a49c@github.com> On Thu, 9 Jan 2025 11:32:07 GMT, Emanuel Peter wrote: >> Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Clarify example test > > test/hotspot/jtreg/compiler/lib/generators/Generators.java line 293: > >> 291: public RestrictableGenerator uniformIntsMixedWithSpecials(int weightA, int weightB, int rangeSpecial) { >> 292: return mixed(uniformInts(), specialInts(rangeSpecial), weightA, weightB); >> 293: } > > I think you should give the weights a better name -> weightUniform and weightSpecial ? Yup, that really needs to be changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1908732708 From roland at openjdk.org Thu Jan 9 12:55:50 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 9 Jan 2025 12:55:50 GMT Subject: RFR: 8344035: Replace predicate walking code in Loop Unswitching with a predicate visitor [v2] In-Reply-To: References: Message-ID: On Mon, 6 Jan 2025 10:08:13 GMT, Christian Hagedorn wrote: >> This patch is a follow up to the clean-ups done with [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945) and introduces a new predicate visitor for Loop Unswitching to update the last remaining custom predicate cloning code. >> >> This patch includes the following: >> >> - New `CloneUnswitchedLoopPredicatesVisitor` class which delegates the cloning work to a new `ClonePredicateToTargetLoop` class. >> - We walk the predicate chain in the `PredicateIterator` and call the `CloneUnswitchedLoopPredicatesVisitor` for each visited predicate. Then we clone the predicate on the fly to the target loop. >> - New `ClonePredicateToTargetLoop` class: >> - Clones Parse Predicates >> - Clones Template Assertion Predicates >> - Includes rewiring of control dependent data nodes >> - Rewires the cloned predicates to the target loop with new `TargetLoopPredicateChain` class: >> - Keeps track of the current chain head, which is the target loop itself when the chain is still empty. >> - Each time a new predicate is inserted at the target loop, the old predicate chain head is set as output of the new predicate. >> - An example is shown as class comment at `TargetLoopPredicateChain`. >> - I plan to reuse this class later again when also updating `CreateAssertionPredicatesVisitor` which is done when we tackle the actual still remaining Assertion Predicate bugs. >> - Removal of custom predicate cloning code found in `PhaseIdealLoop`. >> - Changed steps performed in Loop Unswitching from: >> 1. Clone loop >> 2. Clone predicates and insert them below the unswitched loop selector If projections >> 3. Connect the cloned predicates to the unswitched loops >> >> to: >> >> 1. Clone loop >> 2. Connect unswitched loop selector If projections to unswitched loops such that they are now the new loop entries >> 3. Clone predicates and insert them between the unswitched loop selector If projections and the unswitched loops >> - Rename/update `get_template_assertion_predicates()`/`TemplateAssertionPredicateCollector` to reflect the only use left. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8344035 > - 8344035: Replace predicate walking code in Loop Unswitching with a predicate visitor Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22828#pullrequestreview-2539842846 From epeter at openjdk.org Thu Jan 9 12:58:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 9 Jan 2025 12:58:57 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2] In-Reply-To: <-YCKs9iKEe2_h6gglJivRSNDvKbyPcCWU5CkfTP_XqA=.6ec2cac7-d494-4f24-99e5-b86090889b0e@github.com> References: <7LvJ-Qnd5pZ1cXZdg1rg-jXzZDKuxL_RKt7DVZKE9S4=.073b470d-0814-4d24-bad1-e76271fe6dfd@github.com> <-YCKs9iKEe2_h6gglJivRSNDvKbyPcCWU5CkfTP_XqA=.6ec2cac7-d494-4f24-99e5-b86090889b0e@github.com> Message-ID: On Thu, 9 Jan 2025 12:44:30 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/gcm.cpp line 793: >> >>> 791: if (b == initial_mem_block && !initial_mem->is_Phi()) { >>> 792: // If we are in the initial memory block, and initial_mem is not itself >>> 793: // a Phi, no Phis in the block can be initial memory states. >> >> I'm confused when I read this. As said above, we need a clear definition of `initial`. > > Can you explain why `no Phis in the block can be initial memory states.`? I'm probably missing something obvious. Is it that Phi's could exist for aliasing memory, but it would be "above" `initial_mem`, and therefore irrelevant? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1908738281 From epeter at openjdk.org Thu Jan 9 12:58:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 9 Jan 2025 12:58:56 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2] In-Reply-To: <7Ug2vWAiJTao6xkZxWb1KKiGCAjkoJPCHnNML4t7H2w=.a8896ec0-83ca-4019-9b3c-3ea3b06d4a98@github.com> References: <-JSWKd-qu_IPFR7xS_epeyPiqVg7K_aAYSQCD6BubBI=.9381a704-662d-4ca8-b191-a60f3c30f460@github.com> <7Ug2vWAiJTao6xkZxWb1KKiGCAjkoJPCHnNML4t7H2w=.a8896ec0-83ca-4019-9b3c-3ea3b06d4a98@github.com> Message-ID: On Thu, 9 Jan 2025 09:25:31 GMT, Emanuel Peter wrote: >> My understanding is that anti-dependence edges are only relevant for local scheduling (within blocks). Because Phis merge control-flow paths (by definition at the start of blocks), I would say it makes little sense to add an anti-dependence edge to Phi nodes. Does it make sense semantically to schedule loads before Phi nodes within a block? I don't think so, but I may be wrong. >> >> I think what you are getting at is, at the block level, whether or not it is possible to raise the LCA above the Phi itself, rather than before the relevant inputs. That would make the scheduling less conservative. Have a look at [this comment](https://github.com/openjdk/jdk/pull/22852/files#diff-13dc4f80ba6ccaa27b0612318074e35200ffe9314405e30ace331807e56b5f60L870-L876) in the source. There are previous attempts at making the LCA raising less conservative in this manner (see, e.g., [JDK-8192992](https://bugs.openjdk.org/browse/JDK-8192992)), but it turns out to be quite tricky to get right. It is definitely an issue separate from the present one! > > Hmm, ok. I think the description here should be made more clear then, and explain what the strategy is, and attempt a kind of informal proof why this is an ok approach. What do you think? > > I mean your comment says there are other cases, but it does not really reassure me that we have all cases covered ? Another naming question: `other relevant initial memory states` What does the `initial` mean to you? To me, it is just the memory state of the load. `initial` only because we also look at other memory states. If my definition is correct, then I wonder if talking about "other initial memory states" makes sense, or if they should have a different name for clarity. Maybe `root_memory_state` or alike. That would make sense: We have multiple trees, starting at `root_memory_state` each. `initial_mem` is one of them. `initial` only refers to things in the initial block, where the `initial_mem` is located. Maybe you have a different definition - but it would be good to have a clear one stated in the comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1908724253 From epeter at openjdk.org Thu Jan 9 12:58:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 9 Jan 2025 12:58:55 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2] In-Reply-To: References: Message-ID: <7LvJ-Qnd5pZ1cXZdg1rg-jXzZDKuxL_RKt7DVZKE9S4=.073b470d-0814-4d24-bad1-e76271fe6dfd@github.com> On Tue, 7 Jan 2025 18:03:21 GMT, Daniel Lund?n wrote: >> When searching for load anti dependences in GCM, it is not always sufficient to just search starting at the direct initial memory input to the load. Specifically, there are cases when we must also search for anti dependences starting at relevant Phi memory nodes in between the load's early block and the initial memory input's block. Here, "in between" refers to blocks in the dominator tree in between the early and initial memory blocks. >> >> #### Example 1 >> >> Consider the ideal graph below. The initial memory for 183 loadI is 107 Phi and there is an important anti dependency for node 64 membar_release. To discover this anti dependency, we must rather search from 119 Phi which contains overlapping memory slices with 107 Phi. Looking at the ideal graph block view, we see that both 107 Phi and 119 Phi are in the initial memory block (B7) and thus dominate the early block (B20). If we only search from 107 Phi, we fail to add the anti dependency to 64 membar_release and do not force the load to schedule before 64 membar_release as we should. In the block view, we see that the load is actually scheduled in B24 _after_ a number of anti-dependent stores, the first of which is in block B20 (corresponding to the anti dependency on 64 membar_release). The result is the failure we see in this issue (we load the wrong value). >> >> ![failure-graph-1](https://github.com/user-attachments/assets/e5458646-7a5c-40e1-b1d8-e3f101e29b73) >> ![failure-blocks-1](https://github.com/user-attachments/assets/a0b1f724-0809-4b2f-9feb-93e9c59a5d6a) >> >> #### Example 2 >> >> There are also situations when we need to start searching from Phis that are strictly in between the initial memory block and early block. Consider the ideal graph below. The initial memory for 100 loadI is 18 MachProj, but we also need to search from 76 Phi to find that we must raise the LCA to the last block on the path between 76 Phi and 75 Phi: B9 (= the load's early block). If we do not search from 76 Phi, the load is again likely scheduled too late (in B11 in the example) after anti-dependent stores (the first of which corresponds to 58 membar_release in B10). Note that the block B6 for 76 Phi is strictly dominated by the initial memory block B2 and also strictly dominates the early block B9. >> >> ![failure-graph-2](https://github.com/user-attachments/assets/ede0c299-6251-4ff8-8b84-af40a1ee9e8c) >> ![failure-blocks-2](https://github.com/user-attachments/assets/e5a87e43-b6fe-4fa3-8961-54752f63633e) >> >> ### Cha... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Updates after comments I'm getting closer to understanding what you are doing ? I have some more questions and suggestions. I think @rwestrel should also have a look at this, he has recently fixed a bug in this code. src/hotspot/share/opto/gcm.cpp line 781: > 779: // If the load has an explicit control input, walk up the dominator tree > 780: // from the early block (inclusive) to the initial memory block > 781: // (inclusive). If we in a block find memory Phi(s) that can alias "If we in a block find" sounds a little strange. Suggestion: // (inclusive). When traversing the blocks, we look for Phi(s) that can alias src/hotspot/share/opto/gcm.cpp line 789: > 787: // initial_mem_block->_idom). The loop below always terminates because the > 788: // root block strictly dominates initial_mem_block. > 789: while (b != initial_mem_block->_idom) { Could you write a `for` instead? `for(Block* b = early; b != initial_mem_block->_idom; b = b->_idom) {` Having the initial, exit-check and iteration-step together makes it a little more readable, I think. src/hotspot/share/opto/gcm.cpp line 793: > 791: if (b == initial_mem_block && !initial_mem->is_Phi()) { > 792: // If we are in the initial memory block, and initial_mem is not itself > 793: // a Phi, no Phis in the block can be initial memory states. I'm confused when I read this. As said above, we need a clear definition of `initial`. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22852#pullrequestreview-2539792186 PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1908705090 PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1908737344 PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1908729513 From epeter at openjdk.org Thu Jan 9 12:58:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 9 Jan 2025 12:58:57 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2] In-Reply-To: <7LvJ-Qnd5pZ1cXZdg1rg-jXzZDKuxL_RKt7DVZKE9S4=.073b470d-0814-4d24-bad1-e76271fe6dfd@github.com> References: <7LvJ-Qnd5pZ1cXZdg1rg-jXzZDKuxL_RKt7DVZKE9S4=.073b470d-0814-4d24-bad1-e76271fe6dfd@github.com> Message-ID: <-YCKs9iKEe2_h6gglJivRSNDvKbyPcCWU5CkfTP_XqA=.6ec2cac7-d494-4f24-99e5-b86090889b0e@github.com> On Thu, 9 Jan 2025 12:43:31 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates after comments > > src/hotspot/share/opto/gcm.cpp line 793: > >> 791: if (b == initial_mem_block && !initial_mem->is_Phi()) { >> 792: // If we are in the initial memory block, and initial_mem is not itself >> 793: // a Phi, no Phis in the block can be initial memory states. > > I'm confused when I read this. As said above, we need a clear definition of `initial`. Can you explain why `no Phis in the block can be initial memory states.`? I'm probably missing something obvious. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1908731081 From roland at openjdk.org Thu Jan 9 13:04:25 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 9 Jan 2025 13:04:25 GMT Subject: RFR: 8346184: C2: assert(has_node(i)) failed during split thru phi [v3] In-Reply-To: References: Message-ID: > The assert fires during split thru phi because a call to `Identity` > returns a new node (a constant null pointer). That happens because a > `Load`, once pushed thru phi, can be constant folded because it loads > from a newly allocated array. `Identity` shouldn't return new > nodes. When split thru phi runs, in this case, `Value` should be the > one returning constant null, not `Identity`. There is logic for that > in `LoadNode::Value` but it's after some other checks that cause > `Value` to return too early. > > To fix this, I propose reordering checks in `LoadNode::Value`. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/memnode.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22818/files - new: https://git.openjdk.org/jdk/pull/22818/files/f768e40b..695d825f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22818&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22818&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22818.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22818/head:pull/22818 PR: https://git.openjdk.org/jdk/pull/22818 From tweidmann at openjdk.org Thu Jan 9 13:07:07 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 13:07:07 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v12] In-Reply-To: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: > This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: > >> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >> >> I would like to start a collection of such generators, that can then be used in testing. >> >> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >> >> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >> >> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 > > The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. Theo Weidmann has updated the pull request incrementally with three additional commits since the last revision: - Merge branch 'JDK-8346107-Generators' of https://github.com/theoweidmannoracle/jdk into JDK-8346107-Generators - Update Generators.java - Stylistic improvements ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22941/files - new: https://git.openjdk.org/jdk/pull/22941/files/536f7448..0750516b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=10-11 Stats: 15 lines in 3 files changed: 7 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22941/head:pull/22941 PR: https://git.openjdk.org/jdk/pull/22941 From qamai at openjdk.org Thu Jan 9 13:23:37 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 9 Jan 2025 13:23:37 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v4] In-Reply-To: References: Message-ID: On Thu, 9 Jan 2025 10:22:12 GMT, Tobias Hartmann wrote: >> C2's arraycopy intrinsic adds guards that check that the source and destination objects are arrays: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5917-L5919 >> >> If these guards pass, the array length is loaded: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5930-L5933 >> >> But since the `LoadRangeNode` is not pinned, it might float above the array guard: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/graphKit.cpp#L1214 >> >> If the object is not an array, we will read garbage. That's usually fine because the result will not be used (the array guard will trigger) but with `-XX:+UseCompactObjectHeaders` it can happen that the memory right after the header is not mapped and we crash. >> >> The fix is to add a `CheckCastPPNode` to propagate the information that the operand is an array and prevent the load from floating. >> >> Thanks to @shipilev for identifying the root cause! >> >> I was able to reliably reproduce the issue with `compiler/arraycopy/TestArrayCopyNoInit.java` and `-XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders` on Linux AArch64 and verified that the fix solves the problem. >> >> Best regards, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Copyright date Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22967#pullrequestreview-2539897229 From thartmann at openjdk.org Thu Jan 9 13:23:38 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 9 Jan 2025 13:23:38 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v4] In-Reply-To: References: Message-ID: On Thu, 9 Jan 2025 13:18:32 GMT, Quan Anh Mai wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Copyright date > > Marked as reviewed by qamai (Committer). Thanks again for the review, @merykitty! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22967#issuecomment-2580140315 From epeter at openjdk.org Thu Jan 9 13:28:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 9 Jan 2025 13:28:50 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v9] In-Reply-To: References: Message-ID: <_SCKY9fuTqNDfR6K1y-FuMvursDMuOx39sKrXMj0Tdg=.225da2f1-fcdc-4418-a753-6d7404b4a83e@github.com> On Fri, 3 Jan 2025 20:42:15 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 9. X86 backend implementation for all supported intrinsics. >> 10. Functional and Performance validation tests. >> >> Kindly review the patch and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > Updating copyright year of modified files. We are on the final approach. Just a few small comments / suggestions left. src/hotspot/share/opto/convertnode.cpp line 991: > 989: return Op_MinHF; > 990: default: > 991: return false; Is that a sane return value? Should we not assert here? src/hotspot/share/opto/library_call.cpp line 8665: > 8663: fatal_unexpected_iid(id); > 8664: break; > 8665: } Suggestion: switch (id) { // Unary operations case vmIntrinsics::_sqrt_float16: result = _gvn.transform(new SqrtHFNode(C, control(), fld1)); break; // Ternary operations case vmIntrinsics::_fma_float16: result = _gvn.transform(new FmaHFNode(control(), fld1, fld2, fld3)); break; default: fatal_unexpected_iid(id); break; } Formatting could be improved. In the other switch you indent the cases. The lines are also a little long. src/hotspot/share/opto/mulnode.cpp line 560: > 558: // Compute the product type of two half float ranges into this node. > 559: const Type* MulHFNode::mul_ring(const Type* t0, const Type* t1) const { > 560: if(t0 == Type::HALF_FLOAT || t1 == Type::HALF_FLOAT) return Type::HALF_FLOAT; Suggestion: if(t0 == Type::HALF_FLOAT || t1 == Type::HALF_FLOAT) { return Type::HALF_FLOAT; } src/hotspot/share/opto/superword.cpp line 2567: > 2565: // half float to float, in such a case back propagation of narrow type (SHORT) > 2566: // may not be possible. > 2567: if (n->Opcode() == Op_ConvF2HF || n->Opcode() == Op_ReinterpretHF2S) { Is this relevant, or does that belong to a different (vector) RFE? src/hotspot/share/opto/type.cpp line 460: > 458: RETURN_ADDRESS=make(Return_Address); > 459: FLOAT = make(FloatBot); // All floats > 460: HALF_FLOAT = make(HalfFloatBot); // All half floats Suggestion: HALF_FLOAT = make(HalfFloatBot); // All half floats src/hotspot/share/opto/type.cpp line 1092: > 1090: if (_base == DoubleTop || _base == DoubleBot) return Type::BOTTOM; > 1091: typerr(t); > 1092: return Type::BOTTOM; Please use curly-braces even for single-line ifs src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16.java line 1434: > 1432: return float16ToRawShortBits(valueOf(product + float16ToFloat(f16c))); > 1433: }); > 1434: return shortBitsToFloat16(res); I don't understand what is happening here. But I leave this to @PaulSandoz to review ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22754#pullrequestreview-2539863536 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1908759602 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1908769721 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1908771698 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1908776380 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1908777422 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1908779530 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1908792237 From epeter at openjdk.org Thu Jan 9 13:28:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 9 Jan 2025 13:28:51 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v9] In-Reply-To: <_SCKY9fuTqNDfR6K1y-FuMvursDMuOx39sKrXMj0Tdg=.225da2f1-fcdc-4418-a753-6d7404b4a83e@github.com> References: <_SCKY9fuTqNDfR6K1y-FuMvursDMuOx39sKrXMj0Tdg=.225da2f1-fcdc-4418-a753-6d7404b4a83e@github.com> Message-ID: On Thu, 9 Jan 2025 13:14:13 GMT, Emanuel Peter wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Updating copyright year of modified files. > > src/hotspot/share/opto/type.cpp line 460: > >> 458: RETURN_ADDRESS=make(Return_Address); >> 459: FLOAT = make(FloatBot); // All floats >> 460: HALF_FLOAT = make(HalfFloatBot); // All half floats > > Suggestion: > > HALF_FLOAT = make(HalfFloatBot); // All half floats If alignment is already broken, we might as well just use single spaces. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1908778435 From epeter at openjdk.org Thu Jan 9 13:28:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 9 Jan 2025 13:28:52 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v3] In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 18:42:48 GMT, Joe Darcy wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding more test points > > src/java.base/share/classes/jdk/internal/vm/vector/Float16Math.java line 35: > >> 33: * The class {@code Float16Math} constains intrinsic entry points corresponding >> 34: * to scalar numeric operations defined in Float16 class. >> 35: * @author > > Please remove all author tags. We haven't used them in new code in the JDK for some time. @jatin-bhateja did you remove them? I still see an `@author` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1908788132 From tweidmann at openjdk.org Thu Jan 9 13:30:54 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 13:30:54 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v13] In-Reply-To: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: <8dZM81pwiVXJQg2tpasXQimOw6uK7Hx1Gn59u1kz5J0=.5ca6e27f-298a-4a7b-bee1-826f8a009726@github.com> > This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: > >> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >> >> I would like to start a collection of such generators, that can then be used in testing. >> >> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >> >> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >> >> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 > > The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: Some more fuzzy tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22941/files - new: https://git.openjdk.org/jdk/pull/22941/files/0750516b..981afcf8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=11-12 Stats: 116 lines in 1 file changed: 84 ins; 32 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22941/head:pull/22941 PR: https://git.openjdk.org/jdk/pull/22941 From epeter at openjdk.org Thu Jan 9 13:42:38 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 9 Jan 2025 13:42:38 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v13] In-Reply-To: <8dZM81pwiVXJQg2tpasXQimOw6uK7Hx1Gn59u1kz5J0=.5ca6e27f-298a-4a7b-bee1-826f8a009726@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> <8dZM81pwiVXJQg2tpasXQimOw6uK7Hx1Gn59u1kz5J0=.5ca6e27f-298a-4a7b-bee1-826f8a009726@github.com> Message-ID: On Thu, 9 Jan 2025 13:30:54 GMT, Theo Weidmann wrote: >> This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: >> >>> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >>> >>> I would like to start a collection of such generators, that can then be used in testing. >>> >>> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >>> >>> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >>> >>> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 >> >> The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. > > Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: > > Some more fuzzy tests test/hotspot/jtreg/compiler/lib/generators/Generators.java line 339: > 337: public RestrictableGenerator uniformLongsMixedWithSpecials(int weightA, int weightB, int rangeSpecial) { > 338: return mixed(uniformLongs(), specialLongs(rangeSpecial), weightA, weightB); > 339: } You need to fix this one too ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1908818258 From epeter at openjdk.org Thu Jan 9 13:42:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 9 Jan 2025 13:42:39 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v10] In-Reply-To: References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> <3RdTHsd8xielC9SojAPzQvcBJrL3pYzdcoghS32hTvE=.1d107ef2-973b-4c2d-ac1d-20dea821e94b@github.com> Message-ID: On Thu, 9 Jan 2025 12:43:33 GMT, Theo Weidmann wrote: >> test/hotspot/jtreg/testlibrary_tests/generators/tests/TestGenerators.java line 480: >> >>> 478: mockSource.checkEmpty().enqueueFloat(2, 5, 4); >>> 479: var f4 = mockGS.safeRestrictFloat(mockGS.uniformFloats(0, 1),2, 5); >>> 480: Asserts.assertEQ(f4.next(), 4f); >> >> You could test `safeRestrict...` on all generators from `G`, and also the random ones like `G.ints()` >> >> Pick random restriction bounds, and just check that the results are in bounds > > I don't think it's necessary to test safeRestrict more extensively. These test cases already exercise all branches and paths through safeRestrict. Ok, fine with me ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1908819863 From tweidmann at openjdk.org Thu Jan 9 14:02:10 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 14:02:10 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v14] In-Reply-To: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: > This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: > >> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >> >> I would like to start a collection of such generators, that can then be used in testing. >> >> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >> >> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >> >> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 > > The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: Update Generators.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22941/files - new: https://git.openjdk.org/jdk/pull/22941/files/981afcf8..e90223b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=12-13 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22941/head:pull/22941 PR: https://git.openjdk.org/jdk/pull/22941 From epeter at openjdk.org Thu Jan 9 14:24:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 9 Jan 2025 14:24:46 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v14] In-Reply-To: References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: <0HCOKjgXG54f2baH25AhIxmapdGMrxoeV5fB5mEfo1E=.b504bdbe-7d24-4a0d-9f7a-0744135e17a9@github.com> On Thu, 9 Jan 2025 14:02:10 GMT, Theo Weidmann wrote: >> This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: >> >>> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >>> >>> I would like to start a collection of such generators, that can then be used in testing. >>> >>> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >>> >>> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >>> >>> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 >> >> The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. > > Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: > > Update Generators.java Perfect! Thank you very much for tackling this. Approved ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22941#pullrequestreview-2540058319 From chagedorn at openjdk.org Thu Jan 9 14:32:39 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 9 Jan 2025 14:32:39 GMT Subject: RFR: 8344035: Replace predicate walking code in Loop Unswitching with a predicate visitor [v2] In-Reply-To: References: Message-ID: On Mon, 6 Jan 2025 10:08:13 GMT, Christian Hagedorn wrote: >> This patch is a follow up to the clean-ups done with [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945) and introduces a new predicate visitor for Loop Unswitching to update the last remaining custom predicate cloning code. >> >> This patch includes the following: >> >> - New `CloneUnswitchedLoopPredicatesVisitor` class which delegates the cloning work to a new `ClonePredicateToTargetLoop` class. >> - We walk the predicate chain in the `PredicateIterator` and call the `CloneUnswitchedLoopPredicatesVisitor` for each visited predicate. Then we clone the predicate on the fly to the target loop. >> - New `ClonePredicateToTargetLoop` class: >> - Clones Parse Predicates >> - Clones Template Assertion Predicates >> - Includes rewiring of control dependent data nodes >> - Rewires the cloned predicates to the target loop with new `TargetLoopPredicateChain` class: >> - Keeps track of the current chain head, which is the target loop itself when the chain is still empty. >> - Each time a new predicate is inserted at the target loop, the old predicate chain head is set as output of the new predicate. >> - An example is shown as class comment at `TargetLoopPredicateChain`. >> - I plan to reuse this class later again when also updating `CreateAssertionPredicatesVisitor` which is done when we tackle the actual still remaining Assertion Predicate bugs. >> - Removal of custom predicate cloning code found in `PhaseIdealLoop`. >> - Changed steps performed in Loop Unswitching from: >> 1. Clone loop >> 2. Clone predicates and insert them below the unswitched loop selector If projections >> 3. Connect the cloned predicates to the unswitched loops >> >> to: >> >> 1. Clone loop >> 2. Connect unswitched loop selector If projections to unswitched loops such that they are now the new loop entries >> 3. Clone predicates and insert them between the unswitched loop selector If projections and the unswitched loops >> - Rename/update `get_template_assertion_predicates()`/`TemplateAssertionPredicateCollector` to reflect the only use left. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8344035 > - 8344035: Replace predicate walking code in Loop Unswitching with a predicate visitor Thanks Roland for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22828#issuecomment-2580380295 From tweidmann at openjdk.org Thu Jan 9 14:35:58 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 14:35:58 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing [v6] In-Reply-To: References: Message-ID: <6fqwGHPo8z6GSKCjH3i0rv9OvxAqKUTTLsdh8aktG0w=.d7a7e025-d7ed-4ef5-8c8d-2a2b3deb0ee3@github.com> > C2 currently emits runtime calls if the platform rules do not support lowering floating point remainder operations. For example, for float: > > https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L2305-L2318 > > https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L1099-L1109 > > The only platform, which currently supports this, however, is x86_32. On all other platforms, runtime calls are generated directly during parsing, which prevent any constant folding or other idealizations. Even C1 can perform these optimizations, resulting in significantly lower C2 performance compared to C1 for simple test cases. This function was observed to be around 15x slower with C2 compared to C1 due to redundant runtime calls: > > > public static double process(final double x) { > double w = (double) 0.1; > double p = 0; > p = (double) (3.109615012413746E307 % (w % Z)); > p = (double) (7.614949555185036E307 / (x % x)); // <- return value only dependends on this line > return (double) (x * p); > } > > > To fix this, this PR turns ModFNode and ModDNode into macros, which are always created during parsing. They support idealization (constant folding) and are lowered to runtime calls during macro expansion. For simplicity, these operations will now also call into the runtime on x86_32, as this platform is deprecated. Theo Weidmann has updated the pull request incrementally with three additional commits since the last revision: - Address comments - Actually return top - Update divnode.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22786/files - new: https://git.openjdk.org/jdk/pull/22786/files/5f0abf21..4c7f01e6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22786&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22786&range=04-05 Stats: 10 lines in 3 files changed: 0 ins; 4 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22786.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22786/head:pull/22786 PR: https://git.openjdk.org/jdk/pull/22786 From tweidmann at openjdk.org Thu Jan 9 14:39:43 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 14:39:43 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing [v6] In-Reply-To: <3T6TjR9TRTlU6nvFobZ5zepPOUuejca18NlXbh5AnyE=.943900a8-296b-4376-b8c0-7275c68f7e49@github.com> References: <3T6TjR9TRTlU6nvFobZ5zepPOUuejca18NlXbh5AnyE=.943900a8-296b-4376-b8c0-7275c68f7e49@github.com> Message-ID: On Tue, 7 Jan 2025 18:26:40 GMT, Vladimir Kozlov wrote: >> Theo Weidmann has updated the pull request incrementally with three additional commits since the last revision: >> >> - Address comments >> - Actually return top >> - Update divnode.cpp > > Most likely it will affect performance of 32-bit x86 since it was simple nodes and not Call nodes. But the platforma is going away so it should be fine. > > You need to treat new Mod nodes as leaf calls without side effects instead of general Call nodes. @vnkozlov @eme64 Thank you for your reviews. I think I addressed your comments and will run testing again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22786#issuecomment-2580408648 From tweidmann at openjdk.org Thu Jan 9 15:23:37 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Thu, 9 Jan 2025 15:23:37 GMT Subject: RFR: 8331717: C2: Crash with SIGFPE Because Loop Predication Wrongly Hoists Division Requiring Zero Check [v2] In-Reply-To: References: Message-ID: On Thu, 26 Dec 2024 13:28:29 GMT, Quan Anh Mai wrote: >> Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Combine test files > > I have looked more deeply into the issue with `depends_only_on_test` in general and I have a really deep concern regarding the current state of how it is handled. > > To start with, what is `depends_only_on_test`? Given a node `n` with the control input `c`, if `c` can be deduced from `c'` and `n->depends_only_on_test() == true`, then we can rewire the control input of `n` to `c'`. This means that `depends_only_on_test` does not mean that the node depends on a test, it means that the node depends on the test that is its control input. > > For example: > > if (y != 0) { > if (x > 0) { > if (y != 0) { > x / y; > } > } > } > > Then `x/y` `depends_only_on_test` because its control input is the test `y != 0`. Then, we can rewire the control input of the division to the outer `y != 0`, resulting in: > > if (y != 0) { > x / y; > if (x > 0) { > } > } > > On the other hand, consider this case: > > if (x > 0) { > if (y != 0) { > if (x > 0) { > x / y; > } > } > } > > Then `x/y` does not `depends_only_on_test` because its control input is the test `x > 0` which is unrelated, we can see that if we rewire the division to the outer `x > 0` test, the division floats above the actual test `y != 0`. This means that `depends_only_on_test` is a dynamic property of a node, and not a static property of the division operation. It can change when we transform the graph and it can be different for different nodes of the same kind. > > So, what is the issue in [JDK-8257822](https://bugs.openjdk.org/browse/JDK-8257822)? When the zero check of the division is split through `Phi` but the division is not, it is wired to the merge point, not the tests themselves. This means that the division no longer `depends_only_on_test` and should return `false`. However, it still reports that it `depends_only_on_test`, which makes `PhaseIdealLoop::dominated_by` move it out of the loop when its control input is moved. > > The fix was to make `PhaseIdealLoop::dominated_by` treat a division as if it does not `depends_only_on_test` when we cannot prove that the divisor is non-zero. This fixed the issue in that particular instance, as it achieved the same result as an actual correct fix of making `depends_only_on_test` return `false`. However, the node still reports itself as `depends_only_on_test`, and that opens more opportunities of miscompilation. > > One important consideration regarding nodes that `depen... @merykitty Thank you for your detailed write-up! @chhagedorn and I talked about it and we also agree with you that there seems to be some fundamental flaw here that needs to be addressed. We should definitely address it soon. It looks like a bigger endeavor though and we should probably file your write-up as RFE so it does not get buried here. Do you think we should give up on this point fix then? Or do you think it's fine if we merge it and address the underlying cause separately? We still believe that there's no harm in applying this band-aid patch in the way I proposed, while, of course, we have to address the underlying issue here too. @rwestrel added pin_array_access_node. Maybe you also want to weigh in on this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22666#issuecomment-2580546412 From kvn at openjdk.org Thu Jan 9 16:31:41 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 Jan 2025 16:31:41 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v4] In-Reply-To: References: Message-ID: <9OqXAax9IpkALXJHRnSuMqUSpo5VJbTVgR-REsMUT3o=.47dab670-dbac-4667-b746-c00992bdeb6a@github.com> On Thu, 9 Jan 2025 10:22:12 GMT, Tobias Hartmann wrote: >> C2's arraycopy intrinsic adds guards that check that the source and destination objects are arrays: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5917-L5919 >> >> If these guards pass, the array length is loaded: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5930-L5933 >> >> But since the `LoadRangeNode` is not pinned, it might float above the array guard: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/graphKit.cpp#L1214 >> >> If the object is not an array, we will read garbage. That's usually fine because the result will not be used (the array guard will trigger) but with `-XX:+UseCompactObjectHeaders` it can happen that the memory right after the header is not mapped and we crash. >> >> The fix is to add a `CheckCastPPNode` to propagate the information that the operand is an array and prevent the load from floating. >> >> Thanks to @shipilev for identifying the root cause! >> >> I was able to reliably reproduce the issue with `compiler/arraycopy/TestArrayCopyNoInit.java` and `-XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders` on Linux AArch64 and verified that the fix solves the problem. >> >> Best regards, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Copyright date src/hotspot/share/opto/library_call.cpp line 4307: > 4305: // Keep track of the fact that 'obj' is an array to prevent > 4306: // array specific accesses from floating above the guard. > 4307: *obj = _gvn.transform(new CastPPNode(is_array_ctrl, *obj, TypeAryPtr::BOTTOM)); Should we do this for above code when layout is known for compiler (`layout_con` is checked)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1909125151 From kvn at openjdk.org Thu Jan 9 16:33:38 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 Jan 2025 16:33:38 GMT Subject: RFR: 8344035: Replace predicate walking code in Loop Unswitching with a predicate visitor [v2] In-Reply-To: References: Message-ID: <0zTkSIudaGTzuLw35NxUX2PM9jxg4NK749g1EzaGxWA=.3250fb65-c65d-448a-a99a-95f82fe53a46@github.com> On Mon, 6 Jan 2025 10:08:13 GMT, Christian Hagedorn wrote: >> This patch is a follow up to the clean-ups done with [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945) and introduces a new predicate visitor for Loop Unswitching to update the last remaining custom predicate cloning code. >> >> This patch includes the following: >> >> - New `CloneUnswitchedLoopPredicatesVisitor` class which delegates the cloning work to a new `ClonePredicateToTargetLoop` class. >> - We walk the predicate chain in the `PredicateIterator` and call the `CloneUnswitchedLoopPredicatesVisitor` for each visited predicate. Then we clone the predicate on the fly to the target loop. >> - New `ClonePredicateToTargetLoop` class: >> - Clones Parse Predicates >> - Clones Template Assertion Predicates >> - Includes rewiring of control dependent data nodes >> - Rewires the cloned predicates to the target loop with new `TargetLoopPredicateChain` class: >> - Keeps track of the current chain head, which is the target loop itself when the chain is still empty. >> - Each time a new predicate is inserted at the target loop, the old predicate chain head is set as output of the new predicate. >> - An example is shown as class comment at `TargetLoopPredicateChain`. >> - I plan to reuse this class later again when also updating `CreateAssertionPredicatesVisitor` which is done when we tackle the actual still remaining Assertion Predicate bugs. >> - Removal of custom predicate cloning code found in `PhaseIdealLoop`. >> - Changed steps performed in Loop Unswitching from: >> 1. Clone loop >> 2. Clone predicates and insert them below the unswitched loop selector If projections >> 3. Connect the cloned predicates to the unswitched loops >> >> to: >> >> 1. Clone loop >> 2. Connect unswitched loop selector If projections to unswitched loops such that they are now the new loop entries >> 3. Clone predicates and insert them between the unswitched loop selector If projections and the unswitched loops >> - Rename/update `get_template_assertion_predicates()`/`TemplateAssertionPredicateCollector` to reflect the only use left. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8344035 > - 8344035: Replace predicate walking code in Loop Unswitching with a predicate visitor Looks good to me too. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22828#pullrequestreview-2540400730 From dlunden at openjdk.org Thu Jan 9 16:53:54 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 9 Jan 2025 16:53:54 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v3] In-Reply-To: References: Message-ID: > When searching for load anti dependences in GCM, it is not always sufficient to just search starting at the direct initial memory input to the load. Specifically, there are cases when we must also search for anti dependences starting at relevant Phi memory nodes in between the load's early block and the initial memory input's block. Here, "in between" refers to blocks in the dominator tree in between the early and initial memory blocks. > > #### Example 1 > > Consider the ideal graph below. The initial memory for 183 loadI is 107 Phi and there is an important anti dependency for node 64 membar_release. To discover this anti dependency, we must rather search from 119 Phi which contains overlapping memory slices with 107 Phi. Looking at the ideal graph block view, we see that both 107 Phi and 119 Phi are in the initial memory block (B7) and thus dominate the early block (B20). If we only search from 107 Phi, we fail to add the anti dependency to 64 membar_release and do not force the load to schedule before 64 membar_release as we should. In the block view, we see that the load is actually scheduled in B24 _after_ a number of anti-dependent stores, the first of which is in block B20 (corresponding to the anti dependency on 64 membar_release). The result is the failure we see in this issue (we load the wrong value). > > ![failure-graph-1](https://github.com/user-attachments/assets/e5458646-7a5c-40e1-b1d8-e3f101e29b73) > ![failure-blocks-1](https://github.com/user-attachments/assets/a0b1f724-0809-4b2f-9feb-93e9c59a5d6a) > > #### Example 2 > > There are also situations when we need to start searching from Phis that are strictly in between the initial memory block and early block. Consider the ideal graph below. The initial memory for 100 loadI is 18 MachProj, but we also need to search from 76 Phi to find that we must raise the LCA to the last block on the path between 76 Phi and 75 Phi: B9 (= the load's early block). If we do not search from 76 Phi, the load is again likely scheduled too late (in B11 in the example) after anti-dependent stores (the first of which corresponds to 58 membar_release in B10). Note that the block B6 for 76 Phi is strictly dominated by the initial memory block B2 and also strictly dominates the early block B9. > > ![failure-graph-2](https://github.com/user-attachments/assets/ede0c299-6251-4ff8-8b84-af40a1ee9e8c) > ![failure-blocks-2](https://github.com/user-attachments/assets/e5a87e43-b6fe-4fa3-8961-54752f63633e) > > ### Changeset > > - Update `PhaseCFG::insert... Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/gcm.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22852/files - new: https://git.openjdk.org/jdk/pull/22852/files/9ec33d53..fcd8bae3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22852&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22852&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22852.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22852/head:pull/22852 PR: https://git.openjdk.org/jdk/pull/22852 From dlunden at openjdk.org Thu Jan 9 16:53:55 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 9 Jan 2025 16:53:55 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2] In-Reply-To: References: <-JSWKd-qu_IPFR7xS_epeyPiqVg7K_aAYSQCD6BubBI=.9381a704-662d-4ca8-b191-a60f3c30f460@github.com> <7Ug2vWAiJTao6xkZxWb1KKiGCAjkoJPCHnNML4t7H2w=.a8896ec0-83ca-4019-9b3c-3ea3b06d4a98@github.com> Message-ID: <6nEJ2QFOab7ny1vcjD9UZwz0MouPxR6IkCIRJL2mqcE=.12fef553-2a89-4061-97a7-58c9c8315f46@github.com> On Thu, 9 Jan 2025 12:40:10 GMT, Emanuel Peter wrote: >> Hmm, ok. I think the description here should be made more clear then, and explain what the strategy is, and attempt a kind of informal proof why this is an ok approach. What do you think? >> >> I mean your comment says there are other cases, but it does not really reassure me that we have all cases covered ? > > Another naming question: > `other relevant initial memory states` > What does the `initial` mean to you? To me, it is just the memory state of the load. `initial` only because we also look at other memory states. > > If my definition is correct, then I wonder if talking about "other initial memory states" makes sense, or if they should have a different name for clarity. Maybe `root_memory_state` or alike. That would make sense: We have multiple trees, starting at `root_memory_state` each. `initial_mem` is one of them. `initial` only refers to things in the initial block, where the `initial_mem` is located. > > Maybe you have a different definition - but it would be good to have a clear one stated in the comments. > Hmm, ok. I think the description here should be made more clear then, and explain what the strategy is, and attempt a kind of informal proof why this is an ok approach. What do you think? Do you mean a general proof sketch of the approach in `PhaseCFG::insert_anti_dependences`? This changeset does not really change the approach itself, but just extends the search to more initial memory states. > but it does not really reassure me that we have all cases covered I am not fully reassured we cover all cases either, but we at least now cover more cases than before (and specifically the cases that appear for this issue). See also the earlier comment by Roberto on investigating whether we can somehow enfore in earlier phases the invariant that the original version of `PhaseCFG::insert_anti_dependences` assumed: that we only need to search from one initial memory state. I would suggest doing this in a separate RFE. > What does the initial mean to you? To me it means the initial memory states that we start our searches at. I'm fine with your renaming suggestion, calling them roots. I would perhaps suggest that we then also say `input_mem` instead of `initial_mem` to signify that it is the actual input to the load. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1909152976 From dlunden at openjdk.org Thu Jan 9 16:56:39 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 9 Jan 2025 16:56:39 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2] In-Reply-To: <7LvJ-Qnd5pZ1cXZdg1rg-jXzZDKuxL_RKt7DVZKE9S4=.073b470d-0814-4d24-bad1-e76271fe6dfd@github.com> References: <7LvJ-Qnd5pZ1cXZdg1rg-jXzZDKuxL_RKt7DVZKE9S4=.073b470d-0814-4d24-bad1-e76271fe6dfd@github.com> Message-ID: On Thu, 9 Jan 2025 12:48:30 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Updates after comments > > src/hotspot/share/opto/gcm.cpp line 789: > >> 787: // initial_mem_block->_idom). The loop below always terminates because the >> 788: // root block strictly dominates initial_mem_block. >> 789: while (b != initial_mem_block->_idom) { > > Could you write a `for` instead? > `for(Block* b = early; b != initial_mem_block->_idom; b = b->_idom) {` > > Having the initial, exit-check and iteration-step together makes it a little more readable, I think. Sure, sounds good. I'll update it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1909159855 From dlunden at openjdk.org Thu Jan 9 16:56:39 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 9 Jan 2025 16:56:39 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v2] In-Reply-To: References: <7LvJ-Qnd5pZ1cXZdg1rg-jXzZDKuxL_RKt7DVZKE9S4=.073b470d-0814-4d24-bad1-e76271fe6dfd@github.com> <-YCKs9iKEe2_h6gglJivRSNDvKbyPcCWU5CkfTP_XqA=.6ec2cac7-d494-4f24-99e5-b86090889b0e@github.com> Message-ID: <1jYJx0jQqOgE6zQnsp8txin57i-Stq2aSLrdq932wOM=.ce3fc96b-acd0-4c93-9fc7-3f156a188a9f@github.com> On Thu, 9 Jan 2025 12:49:04 GMT, Emanuel Peter wrote: > Is it that Phi's could exist for aliasing memory, but it would be "above" initial_mem, and therefore irrelevant? Yes, exactly. I'll clarify it in the comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22852#discussion_r1909159357 From kvn at openjdk.org Thu Jan 9 17:00:41 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 9 Jan 2025 17:00:41 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing [v6] In-Reply-To: References: <3T6TjR9TRTlU6nvFobZ5zepPOUuejca18NlXbh5AnyE=.943900a8-296b-4376-b8c0-7275c68f7e49@github.com> Message-ID: On Thu, 9 Jan 2025 10:02:59 GMT, Theo Weidmann wrote: >>> Why you need the case for ModDNode? >> >> As @eme64 already explained this function is both for float and double. "floating point" here is supposed to mean both float and double. >> >>> Why not make it a BasicType bt instead of dbl, and then switch on that? Might be more readable than true / false. >> I read floating_point_mod(a, b, true), and am not sure what the true does. >> >> Good point. I will change it. >> >>> Why do you need the static_cast? I mean why not use the common type ModFloatingNode*, which is a subtype of CallNode*, right? >> >> The cast is necessary because of the ternary operator but you are right that ModFloatingNode could be used as a more concrete subtype here. I will change it. > > Actually the assignment further down from `as_Call` fails, so I'll leave it with CallNode. I have typo in my original question. What I meant to ask is: why you have cast only for ModDNode and not for ModFNode? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1909165263 From dlunden at openjdk.org Thu Jan 9 17:08:53 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 9 Jan 2025 17:08:53 GMT Subject: RFR: 8333393: PhaseCFG::insert_anti_dependences can fail to raise LCAs and to add necessary anti-dependence edges [v4] In-Reply-To: References: Message-ID: > When searching for load anti dependences in GCM, it is not always sufficient to just search starting at the direct initial memory input to the load. Specifically, there are cases when we must also search for anti dependences starting at relevant Phi memory nodes in between the load's early block and the initial memory input's block. Here, "in between" refers to blocks in the dominator tree in between the early and initial memory blocks. > > #### Example 1 > > Consider the ideal graph below. The initial memory for 183 loadI is 107 Phi and there is an important anti dependency for node 64 membar_release. To discover this anti dependency, we must rather search from 119 Phi which contains overlapping memory slices with 107 Phi. Looking at the ideal graph block view, we see that both 107 Phi and 119 Phi are in the initial memory block (B7) and thus dominate the early block (B20). If we only search from 107 Phi, we fail to add the anti dependency to 64 membar_release and do not force the load to schedule before 64 membar_release as we should. In the block view, we see that the load is actually scheduled in B24 _after_ a number of anti-dependent stores, the first of which is in block B20 (corresponding to the anti dependency on 64 membar_release). The result is the failure we see in this issue (we load the wrong value). > > ![failure-graph-1](https://github.com/user-attachments/assets/e5458646-7a5c-40e1-b1d8-e3f101e29b73) > ![failure-blocks-1](https://github.com/user-attachments/assets/a0b1f724-0809-4b2f-9feb-93e9c59a5d6a) > > #### Example 2 > > There are also situations when we need to start searching from Phis that are strictly in between the initial memory block and early block. Consider the ideal graph below. The initial memory for 100 loadI is 18 MachProj, but we also need to search from 76 Phi to find that we must raise the LCA to the last block on the path between 76 Phi and 75 Phi: B9 (= the load's early block). If we do not search from 76 Phi, the load is again likely scheduled too late (in B11 in the example) after anti-dependent stores (the first of which corresponds to 58 membar_release in B10). Note that the block B6 for 76 Phi is strictly dominated by the initial memory block B2 and also strictly dominates the early block B9. > > ![failure-graph-2](https://github.com/user-attachments/assets/ede0c299-6251-4ff8-8b84-af40a1ee9e8c) > ![failure-blocks-2](https://github.com/user-attachments/assets/e5a87e43-b6fe-4fa3-8961-54752f63633e) > > ### Changeset > > - Update `PhaseCFG::insert... Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: - Fix comma splice in comment - Update after comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22852/files - new: https://git.openjdk.org/jdk/pull/22852/files/fcd8bae3..7390f518 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22852&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22852&range=02-03 Stats: 13 lines in 1 file changed: 3 ins; 4 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22852.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22852/head:pull/22852 PR: https://git.openjdk.org/jdk/pull/22852 From qamai at openjdk.org Thu Jan 9 17:21:39 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 9 Jan 2025 17:21:39 GMT Subject: RFR: 8331717: C2: Crash with SIGFPE Because Loop Predication Wrongly Hoists Division Requiring Zero Check [v2] In-Reply-To: References: Message-ID: On Thu, 9 Jan 2025 15:20:15 GMT, Theo Weidmann wrote: >> I have looked more deeply into the issue with `depends_only_on_test` in general and I have a really deep concern regarding the current state of how it is handled. >> >> To start with, what is `depends_only_on_test`? Given a node `n` with the control input `c`, if `c` can be deduced from `c'` and `n->depends_only_on_test() == true`, then we can rewire the control input of `n` to `c'`. This means that `depends_only_on_test` does not mean that the node depends on a test, it means that the node depends on the test that is its control input. >> >> For example: >> >> if (y != 0) { >> if (x > 0) { >> if (y != 0) { >> x / y; >> } >> } >> } >> >> Then `x/y` `depends_only_on_test` because its control input is the test `y != 0`. Then, we can rewire the control input of the division to the outer `y != 0`, resulting in: >> >> if (y != 0) { >> x / y; >> if (x > 0) { >> } >> } >> >> On the other hand, consider this case: >> >> if (x > 0) { >> if (y != 0) { >> if (x > 0) { >> x / y; >> } >> } >> } >> >> Then `x/y` does not `depends_only_on_test` because its control input is the test `x > 0` which is unrelated, we can see that if we rewire the division to the outer `x > 0` test, the division floats above the actual test `y != 0`. This means that `depends_only_on_test` is a dynamic property of a node, and not a static property of the division operation. It can change when we transform the graph and it can be different for different nodes of the same kind. >> >> So, what is the issue in [JDK-8257822](https://bugs.openjdk.org/browse/JDK-8257822)? When the zero check of the division is split through `Phi` but the division is not, it is wired to the merge point, not the tests themselves. This means that the division no longer `depends_only_on_test` and should return `false`. However, it still reports that it `depends_only_on_test`, which makes `PhaseIdealLoop::dominated_by` move it out of the loop when its control input is moved. >> >> The fix was to make `PhaseIdealLoop::dominated_by` treat a division as if it does not `depends_only_on_test` when we cannot prove that the divisor is non-zero. This fixed the issue in that particular instance, as it achieved the same result as an actual correct fix of making `depends_only_on_test` return `false`. However, the node still reports itself as `depends_only_on_test`, and that opens more opportunities o... > > @merykitty Thank you for your detailed write-up! @chhagedorn and I talked about it and we also agree with you that there seems to be some fundamental flaw here that needs to be addressed. We should definitely address it soon. It looks like a bigger endeavor though and we should probably file your write-up as RFE so it does not get buried here. > > Do you think we should give up on this point fix then? Or do you think it's fine if we merge it and address the underlying cause separately? We still believe that there's no harm in applying this band-aid patch in the way I proposed, while, of course, we have to address the underlying issue here too. > > @rwestrel added pin_array_access_node. Maybe you also want to weigh in on this? @theoweidmannoracle I think this fix is fine, please go ahead > It looks like a bigger endeavor though and we should probably file your write-up as RFE so it does not get buried here. I created https://bugs.openjdk.org/browse/JDK-8347365 ------------- PR Comment: https://git.openjdk.org/jdk/pull/22666#issuecomment-2580858060 From qamai at openjdk.org Thu Jan 9 17:38:42 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 9 Jan 2025 17:38:42 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v6] In-Reply-To: References: Message-ID: On Thu, 9 Jan 2025 07:27:12 GMT, erifan wrote: >> Hi @iwanowww, to share my thoughts on this, there are 2 places when we do lowering: >> >> 1. Macro transform: >> - This place does lowering in a machine-independent manner. This makes it really awkward try to lower something that is highly dependent on the exact architecture. For example, we want to lower a `MulVL` with a constant into `AddVL`s and `LShiftVL`s. On Arm, long vector multiplication can be done pretty efficiently so we want to be conservative. However, on x86, long multiplication is multiple uops and has a massive latency. As a result, we want to be more aggressive in this transformation. Even worse, `vpmullq` is only available on AVX512, so for AVX2, we want to be even more aggressive, maybe even to the point of unconditionally doing the transformation. >> - It still does machine-independent idealisation on all the nodes. This is the opposite of machine-dependent lowering purposes. Idealisation tries to simplify the graph so we can do analysis and transformation more easily, while lowering tries to complicate the graph so that the final code can get smaller. For example, let's consider an unsigned vector comparison. During idealisation, we want to keep it as is so that we have an easier time moving it around. However, if the machine does not support unsigned vector comparison, we want to break it down to `x + MIN_VALUE <=> y + MIN_VALUE`. >> >> 2. Matching: >> - This place does not do GVN so we do not have much versatility here. Really this should only lower node in a one-to-one manner if we have `PhaseLowering` from before. >> - Even worse, the matcher uses a custom grammar, which makes it awkward to work with. This leads to some confusing constructs such as `Matcher::pd_clone_node` and `Matcher::pd_clone_address_expressions`. >> >> Furthermore, as it can be seen, there are several patches and to-do work that can benefit from this pass and have mentioned this PR. As a result, I think `PhaseLowering` is a beneficial and necessary addition. >> >> Cheers, >> Quan Anh > > Hi @merykitty I noticed you mentioned the optimization of vector multiplication to shift add. Since I am working on this recently, in order to avoid duplication of work, I'd like to ask if you have any plans to do this? @erifan No I mentioned it because I read your PR :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2580894226 From rehn at openjdk.org Thu Jan 9 17:55:25 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Thu, 9 Jan 2025 17:55:25 GMT Subject: RFR: 8347366: RISC-V: Add extension asserts for CMO instructions Message-ID: Hi please consider this minor improvement. Sanity tested. Thanks, Robbin ------------- Commit messages: - Added assert, comments and used templates Changes: https://git.openjdk.org/jdk/pull/23015/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23015&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347366 Stats: 51 lines in 1 file changed: 21 ins; 0 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/23015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23015/head:pull/23015 PR: https://git.openjdk.org/jdk/pull/23015 From phh at openjdk.org Thu Jan 9 19:11:39 2025 From: phh at openjdk.org (Paul Hohensee) Date: Thu, 9 Jan 2025 19:11:39 GMT Subject: [jdk24] RFR: 8347127: CTW fails to build after JDK-8334733 In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 09:34:13 GMT, Aleksey Shipilev wrote: > Fixes the JDK 24 regression for standalone CTW runner. Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22964#pullrequestreview-2540730338 From psandoz at openjdk.org Thu Jan 9 19:25:53 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 9 Jan 2025 19:25:53 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v9] In-Reply-To: <_SCKY9fuTqNDfR6K1y-FuMvursDMuOx39sKrXMj0Tdg=.225da2f1-fcdc-4418-a753-6d7404b4a83e@github.com> References: <_SCKY9fuTqNDfR6K1y-FuMvursDMuOx39sKrXMj0Tdg=.225da2f1-fcdc-4418-a753-6d7404b4a83e@github.com> Message-ID: On Thu, 9 Jan 2025 13:23:19 GMT, Emanuel Peter wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Updating copyright year of modified files. > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16.java line 1434: > >> 1432: return float16ToRawShortBits(valueOf(product + float16ToFloat(f16c))); >> 1433: }); >> 1434: return shortBitsToFloat16(res); > > I don't understand what is happening here. But I leave this to @PaulSandoz to review Uncertain on what bits, but i am guessing it's mostly related to the fallback code in the lambda. To avoid the intrinsics operating on Float16 instances we instead "unpack" the carrier (16bits) values and pass those as arguments to the intrinsic. The fallback (when intrinsification is not supported) also accepts those carrier values as arguments and we convert the carriers to floats, operate on then, convert to the carrier, and then back to float16 on the result. The code in the lambda could potentially be simplified if `Float16Math.fma` accepted six arguments the first three being the carrier values used by the intrinsic, and the subsequent three being the float16 values used by the fallback. Then we could express the code in the original source in the lambda. I believe when intrinsified there would be no penalty for those extra arguments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1909327094 From aturbanov at openjdk.org Thu Jan 9 20:31:43 2025 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Thu, 9 Jan 2025 20:31:43 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v14] In-Reply-To: References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: <8u1qwkCip7J8liiArZhJ5T-UxbBl9Q4VYkoFaDIwFkE=.5a024998-c034-4612-92cc-c949352ee448@github.com> On Thu, 9 Jan 2025 14:02:10 GMT, Theo Weidmann wrote: >> This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: >> >>> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >>> >>> I would like to start a collection of such generators, that can then be used in testing. >>> >>> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >>> >>> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >>> >>> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 >> >> The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. > > Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: > > Update Generators.java test/hotspot/jtreg/testlibrary_tests/generators/tests/TestGenerators.java line 420: > 418: // normal restrictions > 419: mockSource.checkEmpty().enqueueInteger(4, 6, 4); > 420: var g1 = mockGS.safeRestrictInt(mockGS.uniformInts(4, 5),2, 5); Suggestion: var g1 = mockGS.safeRestrictInt(mockGS.uniformInts(4, 5), 2, 5); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1909393909 From jrose at openjdk.org Thu Jan 9 22:47:34 2025 From: jrose at openjdk.org (John R Rose) Date: Thu, 9 Jan 2025 22:47:34 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v2] In-Reply-To: References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> Message-ID: On Wed, 8 Jan 2025 12:41:10 GMT, Emanuel Peter wrote: >> I think it will be better to schedule this patch after https://github.com/openjdk/jdk/pull/22863 >> where these nodes are marked as commutative operations. > > Ah ok, I will put that one in my review queue! There is lots that have accumulated over the last 2 weeks ? Suggestion: Align this hash expression and comment with existing ones elsewhere, such as: https://github.com/openjdk/jdk/blob/931914af76932c9b91fc9affd55d24b2562c72d2/src/hotspot/share/opto/addnode.cpp#L47 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r1909552410 From fyang at openjdk.org Fri Jan 10 00:29:46 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 10 Jan 2025 00:29:46 GMT Subject: RFR: 8346787: Fix two C2 IR matching tests for RISC-V [v3] In-Reply-To: <8behYr4lJA2W-WCl2Vi3wSbOvCubxu27rIPIXV_hiCw=.15410d14-1847-41b2-9c3b-309df605cd29@github.com> References: <8behYr4lJA2W-WCl2Vi3wSbOvCubxu27rIPIXV_hiCw=.15410d14-1847-41b2-9c3b-309df605cd29@github.com> Message-ID: On Thu, 9 Jan 2025 02:01:05 GMT, Fei Yang wrote: >> Two IR matching tests added by [JDK-8332268](https://bugs.openjdk.org/browse/JDK-8332268) are failing on RISC-V: >> TEST: compiler/c2/irTests/ModINodeIdealizationTests.java >> TEST: compiler/c2/irTests/ModLNodeIdealizationTests.java >> >> These two tests require conditional move support. See ModLNode::Ideal & ModLNode::Ideal [1][2]. But RISC-V base ISA (RV64GCV) does not support conditional move, so we set `ConditionalMoveLimit` to 0 for this CPU platform. This change simply skips these two tests for now. >> >> Some further information: >> An initial version of conditional move based on RISC-V `Zicond` extension has been added by: [JDK-8344306](https://bugs.openjdk.org/browse/JDK-8344306). But that still lacks performance tunning and we will reconsider and adjust this `ConditionalMoveLimit` parameter as we go. New issue: [JDK-8346786](https://bugs.openjdk.org/browse/JDK-8346786). >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L993 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L1253 > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Comment Thanks all for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22874#issuecomment-2581498490 From fyang at openjdk.org Fri Jan 10 00:29:46 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 10 Jan 2025 00:29:46 GMT Subject: Integrated: 8346787: Fix two C2 IR matching tests for RISC-V In-Reply-To: References: Message-ID: On Tue, 24 Dec 2024 05:23:42 GMT, Fei Yang wrote: > Two IR matching tests added by [JDK-8332268](https://bugs.openjdk.org/browse/JDK-8332268) are failing on RISC-V: > TEST: compiler/c2/irTests/ModINodeIdealizationTests.java > TEST: compiler/c2/irTests/ModLNodeIdealizationTests.java > > These two tests require conditional move support. See ModLNode::Ideal & ModLNode::Ideal [1][2]. But RISC-V base ISA (RV64GCV) does not support conditional move, so we set `ConditionalMoveLimit` to 0 for this CPU platform. This change simply skips these two tests for now. > > Some further information: > An initial version of conditional move based on RISC-V `Zicond` extension has been added by: [JDK-8344306](https://bugs.openjdk.org/browse/JDK-8344306). But that still lacks performance tunning and we will reconsider and adjust this `ConditionalMoveLimit` parameter as we go. New issue: [JDK-8346786](https://bugs.openjdk.org/browse/JDK-8346786). > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L993 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/divnode.cpp#L1253 This pull request has now been integrated. Changeset: a9351dfe Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/a9351dfec9e69f6d5671b9372a44de999e8ed3e6 Stats: 10 lines in 2 files changed: 4 ins; 0 del; 6 mod 8346787: Fix two C2 IR matching tests for RISC-V Reviewed-by: fjiang, mli, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/22874 From fyang at openjdk.org Fri Jan 10 02:46:34 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 10 Jan 2025 02:46:34 GMT Subject: RFR: 8347366: RISC-V: Add extension asserts for CMO instructions In-Reply-To: References: Message-ID: <74lFiCjnfKX2scrJrE6EXPs1gqPksSCnFwRy53nUsqk=.0b16c89d-e239-4c5c-9285-0700566b9647@github.com> On Thu, 9 Jan 2025 17:50:23 GMT, Robbin Ehn wrote: > Hi please consider this minor improvement. > > Sanity tested. > > Thanks, Robbin Nice cleanup! I only see two typos in code comment. src/hotspot/cpu/riscv/assembler_riscv.hpp line 3094: > 3092: > 3093: // This instruction have some security implication. > 3094: // At this time it's not likley to be enable for user mode. Suggestion: `// At this time it's not likely to be enabled for user mode.` ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23015#pullrequestreview-2541446072 PR Review Comment: https://git.openjdk.org/jdk/pull/23015#discussion_r1909715422 From vlivanov at openjdk.org Fri Jan 10 06:01:50 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 10 Jan 2025 06:01:50 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation [v8] In-Reply-To: References: Message-ID: <5CthSgAegJEfh1cUF81IZrAWVz1TrSpX5kU7swnO61A=.4f05045a-4e46-4492-ad42-0c9fa8f2a156@github.com> On Thu, 9 Jan 2025 05:45:08 GMT, Amit Kumar wrote: >> Lazy computation of TypeFunc. >> >> Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) > > Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - Merge branch 'master' into tf_v2 > - fix test case > - new year > - fixes code style, restores gc changes > - [wip] renamed caches, changed type factories, refactored code > - Merge branch 'master' into tf_v2 > - test fix > - Merge branch 'master' into tf_v2 > - fixing the merge conflict > - cover more TypeFunc objects > - ... and 7 more: https://git.openjdk.org/jdk/compare/a46ae703...22d97e4c Looks good! Thanks! > I can follow the same approach I followed for `CallNode` and `ArrayCopyNode` structure, if that would be fine ? I also feel that same change can be done for `CallNode` and `ArrayCopyNode` as they also need their own initializer. I am fine with both, fixing it here or fixing with separate RFE. Up to you. No strong preferences on my side. I'm fine with the current shape of the patch. src/hotspot/share/opto/runtime.cpp line 253: > 251: const TypeFunc* OptoRuntime::_osr_end_Type = nullptr; > 252: const TypeFunc* OptoRuntime::_register_finalizer_Type = nullptr; > 253: JFR_ONLY( For a multi-line case, `#ifdef INCLUDE_JFR` would look better here. src/hotspot/share/opto/runtime.hpp line 135: > 133: > 134: // static TypeFunc* data members > 135: static const TypeFunc *_new_instance_Type; Please, spell it as `TypeFunc* `. src/hotspot/share/opto/runtime.hpp line 193: > 191: static const TypeFunc *_osr_end_Type; > 192: static const TypeFunc *_register_finalizer_Type; > 193: JFR_ONLY(static const TypeFunc *_class_id_load_barrier_Type;) Same here: `#ifdef INCLUDE_JFR` would look better. src/hotspot/share/opto/runtime.hpp line 311: > 309: // ====================================================== > 310: > 311: static inline const TypeFunc *new_instance_Type() { Same: please, spell it as `TypeFunc*` (and subsequent declarations). ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21782#pullrequestreview-2541599232 PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2581841146 PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1909864110 PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1909864961 PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1909865315 PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1909865680 From amitkumar at openjdk.org Fri Jan 10 06:45:54 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 10 Jan 2025 06:45:54 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation [v9] In-Reply-To: References: Message-ID: > Lazy computation of TypeFunc. > > Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: fixes code style ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21782/files - new: https://git.openjdk.org/jdk/pull/21782/files/22d97e4c..7e77d69d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21782&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21782&range=07-08 Stats: 92 lines in 2 files changed: 2 ins; 0 del; 90 mod Patch: https://git.openjdk.org/jdk/pull/21782.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21782/head:pull/21782 PR: https://git.openjdk.org/jdk/pull/21782 From amitkumar at openjdk.org Fri Jan 10 06:45:58 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 10 Jan 2025 06:45:58 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation [v8] In-Reply-To: <5CthSgAegJEfh1cUF81IZrAWVz1TrSpX5kU7swnO61A=.4f05045a-4e46-4492-ad42-0c9fa8f2a156@github.com> References: <5CthSgAegJEfh1cUF81IZrAWVz1TrSpX5kU7swnO61A=.4f05045a-4e46-4492-ad42-0c9fa8f2a156@github.com> Message-ID: On Fri, 10 Jan 2025 05:54:32 GMT, Vladimir Ivanov wrote: >> Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: >> >> - Merge branch 'master' into tf_v2 >> - fix test case >> - new year >> - fixes code style, restores gc changes >> - [wip] renamed caches, changed type factories, refactored code >> - Merge branch 'master' into tf_v2 >> - test fix >> - Merge branch 'master' into tf_v2 >> - fixing the merge conflict >> - cover more TypeFunc objects >> - ... and 7 more: https://git.openjdk.org/jdk/compare/a46ae703...22d97e4c > > src/hotspot/share/opto/runtime.cpp line 253: > >> 251: const TypeFunc* OptoRuntime::_osr_end_Type = nullptr; >> 252: const TypeFunc* OptoRuntime::_register_finalizer_Type = nullptr; >> 253: JFR_ONLY( > > For a multi-line case, `#ifdef INCLUDE_JFR` would look better here. updated. > src/hotspot/share/opto/runtime.hpp line 311: > >> 309: // ====================================================== >> 310: >> 311: static inline const TypeFunc *new_instance_Type() { > > Same: please, spell it as `TypeFunc*` (and subsequent declarations). updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1909891871 PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1909891667 From amitkumar at openjdk.org Fri Jan 10 06:57:38 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 10 Jan 2025 06:57:38 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation [v8] In-Reply-To: <5CthSgAegJEfh1cUF81IZrAWVz1TrSpX5kU7swnO61A=.4f05045a-4e46-4492-ad42-0c9fa8f2a156@github.com> References: <5CthSgAegJEfh1cUF81IZrAWVz1TrSpX5kU7swnO61A=.4f05045a-4e46-4492-ad42-0c9fa8f2a156@github.com> Message-ID: On Fri, 10 Jan 2025 05:59:39 GMT, Vladimir Ivanov wrote: > > I can follow the same approach I followed for `CallNode` and `ArrayCopyNode` structure, if that would be fine ? I also feel that same change can be done for `CallNode` and `ArrayCopyNode` as they also need their own initializer. I am fine with both, fixing it here or fixing with separate RFE. > > Up to you. No strong preferences on my side. I'm fine with the current shape of the patch. I have opened https://bugs.openjdk.org/browse/JDK-8347396 for now. I will take this once current PR merges. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2581915996 From thartmann at openjdk.org Fri Jan 10 07:08:51 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 10 Jan 2025 07:08:51 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v4] In-Reply-To: <9OqXAax9IpkALXJHRnSuMqUSpo5VJbTVgR-REsMUT3o=.47dab670-dbac-4667-b746-c00992bdeb6a@github.com> References: <9OqXAax9IpkALXJHRnSuMqUSpo5VJbTVgR-REsMUT3o=.47dab670-dbac-4667-b746-c00992bdeb6a@github.com> Message-ID: <8QuLkvAR5QduPWwpwK-Q72lhDHuLDXAfglCIoBb0WyU=.323176ab-777b-4aa6-9a39-50ca78c4effd@github.com> On Thu, 9 Jan 2025 16:28:46 GMT, Vladimir Kozlov wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Copyright date > > src/hotspot/share/opto/library_call.cpp line 4307: > >> 4305: // Keep track of the fact that 'obj' is an array to prevent >> 4306: // array specific accesses from floating above the guard. >> 4307: *obj = _gvn.transform(new CastPPNode(is_array_ctrl, *obj, TypeAryPtr::BOTTOM)); > > Should we do this for above code when layout is known for compiler (`layout_con` is checked)? I thought about this as well but I don't think it's necessary because: - No cast is needed if we know the type already - We don't emit a guard and only one branch remains, so there is no risk of the array specific access floating above So I went with the simplest changes for now, also since we need to backport this change (it got already much more complicated than I was aiming for). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1909916119 From epeter at openjdk.org Fri Jan 10 07:15:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 10 Jan 2025 07:15:06 GMT Subject: RFR: 8335747: C2: fix overflow case for LoopLimit with constant inputs Message-ID: `LoopLimitNode::Value` tries to constant-fold when it has constant inputs. However, there can be an overflow in the int-computation, but we check for it with `if (final_con == (jlong)final_int) {` and do not constant fold in that case. However, there was an `assert` that checked that such an overflow would never be encountered. We already had to make an exception for this assert during PhaseCCP with [JDK-8309266](https://bugs.openjdk.org/browse/JDK-8335676). Why did we not hit this assert before? `LoopLimitNode` needs to have constant inputs. We used to assume that if the constants would lead to an overflow, then the loop-limit-check would also get similar constants, and detect that `limit <= max_int-stride` does not hold, and it would constant-fold away the loop, together with the `LoopLimitNode`. But now we found a second case: https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/src/hotspot/share/opto/loopnode.cpp#L2555-L2563 In `PhaseIdealLoop::split_thru_phi`, we temporarily split the `LoopLimitNode` through the phi, generating a new `LoopLimitNode` for each branch of the `phi`. We then call `Value` on it to see if that leads us to constant fold one of the branches, which would be considered a "win". https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/src/hotspot/share/opto/loopopts.cpp#L87-L105 In the regression test, we have this example: https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/test/hotspot/jtreg/compiler/loopopts/TestLoopLimitOverflowDuringSplitThruPhi.java#L44-L69 We generate a temporary clone of `LoopLimitNode(init=0, limit=x, stride=4)` (would not constant fold because of variable `x = Phi(1000, 2147483647)`), which happens to be `LoopLimitNode(init=0, limit=2147483647, stride=4)`. We evaluate `Value` on this temporary clone, and hit the overflow case. Why is it ok to just remove the assert and allow `LoopLimitNode` to overflow? We still have the loop limit check, which checks that `limit <= max_int-stride`, and this means we would never enter the loop if we took the `Phi` branch that led to the overflow. I could not just remove the assert, because in `LoopLimitNode::Ideal` we have this (strange?) check that does not optimize the `LoopLimitNode` if the inputs are constants: if (in(Init)->is_Con() && in(Limit)->is_Con()) return nullptr; // Value The assumption seems to be that we want `Value` to do the constant folding here - but of course we did not constant-fold because we had detected the overflow in `Value`. Not optimizing further here has a unfortunate consequence: on platforms that do not have `LoopLimit` implemented in the backend directly, we would have "lowered" the `LoopLimit` further down into `ConvI2L, SubL, AddL, DivL, ConvL2I ...` nodes. With this check, we do never lower it, and end up with a "bad AD", i.e. compilation bailout in product. I think this check can reasonably be removed, because `Value` should be called before `Ideal` anyway, and so if we can constand fold because of constant inputs, we would have already done so. Note, that the lowering is delayed until `post_loop_opts_phase`, but we never did `record_for_post_loop_opts_igvn`, and so it was not guaranteed that we actually ever processed the `LoopLimitNode` again, which would mean we got "bad AD" again, i.e. compilation bailout in product. ------------- Commit messages: - typo in comment - copyright - Merge branch 'master' into JDK-8335747-LoopLimitNode-overflow - JDK-8335747 Changes: https://git.openjdk.org/jdk/pull/23024/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23024&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335747 Stats: 85 lines in 2 files changed: 76 ins; 3 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/23024.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23024/head:pull/23024 PR: https://git.openjdk.org/jdk/pull/23024 From thartmann at openjdk.org Fri Jan 10 07:46:45 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 10 Jan 2025 07:46:45 GMT Subject: RFR: 8346184: C2: assert(has_node(i)) failed during split thru phi [v3] In-Reply-To: References: Message-ID: <0xxYIYwEF0toMHBPJyJifF5NKZ49AQ9xoqlQ63sy6TE=.81bfe7a1-75f1-4fc9-a3ad-09ab73c61bf2@github.com> On Thu, 9 Jan 2025 13:04:25 GMT, Roland Westrelin wrote: >> The assert fires during split thru phi because a call to `Identity` >> returns a new node (a constant null pointer). That happens because a >> `Load`, once pushed thru phi, can be constant folded because it loads >> from a newly allocated array. `Identity` shouldn't return new >> nodes. When split thru phi runs, in this case, `Value` should be the >> one returning constant null, not `Identity`. There is logic for that >> in `LoadNode::Value` but it's after some other checks that cause >> `Value` to return too early. >> >> To fix this, I propose reordering checks in `LoadNode::Value`. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/memnode.cpp > > Co-authored-by: Christian Hagedorn Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22818#pullrequestreview-2541737466 From tweidmann at openjdk.org Fri Jan 10 08:06:42 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Fri, 10 Jan 2025 08:06:42 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing [v6] In-Reply-To: References: <3T6TjR9TRTlU6nvFobZ5zepPOUuejca18NlXbh5AnyE=.943900a8-296b-4376-b8c0-7275c68f7e49@github.com> Message-ID: On Thu, 9 Jan 2025 16:57:58 GMT, Vladimir Kozlov wrote: >> Actually the assignment further down from `as_Call` fails, so I'll leave it with CallNode. > > I have typo in my original question. What I meant to ask is: why you have cast only for ModDNode and not for ModFNode? The cast is there to make the C++ type checker happy. It's fine if the right hand side of the `:` is a subtype of the left hand side (like my code above with the cast) but the compiler cannot automatically deduce a super type of both the lhs and rhs as the type for the expression (which would be required if I remove the cast). I could, of course, add a cast to the rhs too, but it's not necessary, so I left it off. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22786#discussion_r1909965917 From tweidmann at openjdk.org Fri Jan 10 08:08:14 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Fri, 10 Jan 2025 08:08:14 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v15] In-Reply-To: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: > This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: > >> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >> >> I would like to start a collection of such generators, that can then be used in testing. >> >> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >> >> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >> >> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 > > The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/testlibrary_tests/generators/tests/TestGenerators.java Co-authored-by: Andrey Turbanov ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22941/files - new: https://git.openjdk.org/jdk/pull/22941/files/e90223b9..9f9cc47a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=13-14 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22941/head:pull/22941 PR: https://git.openjdk.org/jdk/pull/22941 From tweidmann at openjdk.org Fri Jan 10 08:08:14 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Fri, 10 Jan 2025 08:08:14 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v14] In-Reply-To: <8u1qwkCip7J8liiArZhJ5T-UxbBl9Q4VYkoFaDIwFkE=.5a024998-c034-4612-92cc-c949352ee448@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> <8u1qwkCip7J8liiArZhJ5T-UxbBl9Q4VYkoFaDIwFkE=.5a024998-c034-4612-92cc-c949352ee448@github.com> Message-ID: On Thu, 9 Jan 2025 20:28:49 GMT, Andrey Turbanov wrote: >> Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Update Generators.java > > test/hotspot/jtreg/testlibrary_tests/generators/tests/TestGenerators.java line 420: > >> 418: // normal restrictions >> 419: mockSource.checkEmpty().enqueueInteger(4, 6, 4); >> 420: var g1 = mockGS.safeRestrictInt(mockGS.uniformInts(4, 5),2, 5); > > Suggestion: > > var g1 = mockGS.safeRestrictInt(mockGS.uniformInts(4, 5), 2, 5); Thanks for spotting this! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1909967167 From rehn at openjdk.org Fri Jan 10 08:33:59 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 10 Jan 2025 08:33:59 GMT Subject: RFR: 8347366: RISC-V: Add extension asserts for CMO instructions [v2] In-Reply-To: References: Message-ID: > Hi please consider this minor improvement. > > Sanity tested. > > Thanks, Robbin Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23015/files - new: https://git.openjdk.org/jdk/pull/23015/files/ab936267..66d53aac Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23015&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23015&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23015.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23015/head:pull/23015 PR: https://git.openjdk.org/jdk/pull/23015 From rehn at openjdk.org Fri Jan 10 08:33:59 2025 From: rehn at openjdk.org (Robbin Ehn) Date: Fri, 10 Jan 2025 08:33:59 GMT Subject: RFR: 8347366: RISC-V: Add extension asserts for CMO instructions [v2] In-Reply-To: <74lFiCjnfKX2scrJrE6EXPs1gqPksSCnFwRy53nUsqk=.0b16c89d-e239-4c5c-9285-0700566b9647@github.com> References: <74lFiCjnfKX2scrJrE6EXPs1gqPksSCnFwRy53nUsqk=.0b16c89d-e239-4c5c-9285-0700566b9647@github.com> Message-ID: On Fri, 10 Jan 2025 02:43:52 GMT, Fei Yang wrote: >> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments > > src/hotspot/cpu/riscv/assembler_riscv.hpp line 3094: > >> 3092: >> 3093: // This instruction have some security implication. >> 3094: // At this time it's not likley to be enable for user mode. > > Suggestion: `// At this time it's not likely to be enabled for user mode.` Thanks, updated! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23015#discussion_r1909993055 From aturbanov at openjdk.org Fri Jan 10 08:37:42 2025 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Fri, 10 Jan 2025 08:37:42 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v14] In-Reply-To: References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> <8u1qwkCip7J8liiArZhJ5T-UxbBl9Q4VYkoFaDIwFkE=.5a024998-c034-4612-92cc-c949352ee448@github.com> Message-ID: On Fri, 10 Jan 2025 08:04:31 GMT, Theo Weidmann wrote: >> test/hotspot/jtreg/testlibrary_tests/generators/tests/TestGenerators.java line 420: >> >>> 418: // normal restrictions >>> 419: mockSource.checkEmpty().enqueueInteger(4, 6, 4); >>> 420: var g1 = mockGS.safeRestrictInt(mockGS.uniformInts(4, 5),2, 5); >> >> Suggestion: >> >> var g1 = mockGS.safeRestrictInt(mockGS.uniformInts(4, 5), 2, 5); > > Thanks for spotting this! There are more missed spaces in this method `testSafeRestrict`. Let's fix them all ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1909998131 From shade at openjdk.org Fri Jan 10 08:45:44 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 10 Jan 2025 08:45:44 GMT Subject: [jdk24] RFR: 8347127: CTW fails to build after JDK-8334733 In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 09:34:13 GMT, Aleksey Shipilev wrote: > Fixes the JDK 24 regression for standalone CTW runner. Thanks for review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22964#issuecomment-2582068510 From shade at openjdk.org Fri Jan 10 08:45:44 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 10 Jan 2025 08:45:44 GMT Subject: [jdk24] Integrated: 8347127: CTW fails to build after JDK-8334733 In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 09:34:13 GMT, Aleksey Shipilev wrote: > Fixes the JDK 24 regression for standalone CTW runner. This pull request has now been integrated. Changeset: 41630c5c Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/41630c5c32134b31351fcac3f9c33ad3a7f4df6c Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8347127: CTW fails to build after JDK-8334733 Reviewed-by: phh Backport-of: e413fc643c4a58e3c46d81025c3ac9fbf89db4b9 ------------- PR: https://git.openjdk.org/jdk/pull/22964 From chagedorn at openjdk.org Fri Jan 10 08:54:47 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 10 Jan 2025 08:54:47 GMT Subject: RFR: 8346184: C2: assert(has_node(i)) failed during split thru phi [v3] In-Reply-To: References: Message-ID: On Thu, 9 Jan 2025 13:04:25 GMT, Roland Westrelin wrote: >> The assert fires during split thru phi because a call to `Identity` >> returns a new node (a constant null pointer). That happens because a >> `Load`, once pushed thru phi, can be constant folded because it loads >> from a newly allocated array. `Identity` shouldn't return new >> nodes. When split thru phi runs, in this case, `Value` should be the >> one returning constant null, not `Identity`. There is logic for that >> in `LoadNode::Value` but it's after some other checks that cause >> `Value` to return too early. >> >> To fix this, I propose reordering checks in `LoadNode::Value`. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/memnode.cpp > > Co-authored-by: Christian Hagedorn Testing looks good! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22818#issuecomment-2582087992 From tweidmann at openjdk.org Fri Jan 10 08:55:49 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Fri, 10 Jan 2025 08:55:49 GMT Subject: RFR: 8331717: C2: Crash with SIGFPE Because Loop Predication Wrongly Hoists Division Requiring Zero Check [v2] In-Reply-To: References: Message-ID: On Thu, 9 Jan 2025 17:18:33 GMT, Quan Anh Mai wrote: >> @merykitty Thank you for your detailed write-up! @chhagedorn and I talked about it and we also agree with you that there seems to be some fundamental flaw here that needs to be addressed. We should definitely address it soon. It looks like a bigger endeavor though and we should probably file your write-up as RFE so it does not get buried here. >> >> Do you think we should give up on this point fix then? Or do you think it's fine if we merge it and address the underlying cause separately? We still believe that there's no harm in applying this band-aid patch in the way I proposed, while, of course, we have to address the underlying issue here too. >> >> @rwestrel added pin_array_access_node. Maybe you also want to weigh in on this? > > @theoweidmannoracle I think this fix is fine, please go ahead > >> It looks like a bigger endeavor though and we should probably file your write-up as RFE so it does not get buried here. > > I created https://bugs.openjdk.org/browse/JDK-8347365 @merykitty Thanks for opening the RFE! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22666#issuecomment-2582086478 From duke at openjdk.org Fri Jan 10 08:55:50 2025 From: duke at openjdk.org (duke) Date: Fri, 10 Jan 2025 08:55:50 GMT Subject: RFR: 8331717: C2: Crash with SIGFPE Because Loop Predication Wrongly Hoists Division Requiring Zero Check [v2] In-Reply-To: References: Message-ID: On Thu, 12 Dec 2024 08:48:14 GMT, Theo Weidmann wrote: >> Fixes a bug in loop predication where not strictly invariant tests involving divisions or modulo are pulled out of the loop. >> >> The bug can be seen in this code: >> >> >> public class Reduced { >> static int iArr[] = new int[100]; >> >> public static void main(String[] strArr) { >> for (int i = 0; i < 10000; i++) { >> test(); >> } >> } >> >> static void test() { >> int i1 = 0; >> >> for (int i4 : iArr) { >> i4 = i1; >> try { >> iArr[0] = 1 / i4; >> i4 = iArr[2 / i4]; // Source of the crash >> } catch (ArithmeticException a_e) { >> } >> } >> } >> } >> >> >> The crucial element is the division `2 / i4`. Since it is used to access an array, it is the input to a range check. See node 230: >> Screenshot 2024-12-11 at 15 14 47 >> >> Loop predication will try to pull this range check together with its input, the division, before the `for` loop. Due to a bug in Invariance::compute_invariance loop predication is allowed to do so, which results in the division being pulled out without its non-zero check. 322 is a clone of 230 placed before the loop head without any zero check for the divisor: >> >> Screenshot 2024-12-11 at 15 11 48 >> >> >> More specifically, this bug occurs because 230's zero check (174 If) is not its direct control. Between the zero check and the division is another unrelated check (293 RangeCheck), which can be hoisted: >> >> Screenshot 2024-12-12 at 09 14 37 >> >> Due to the way the Invariance class works, a check that can be hoisted will be marked as invariant. Then, to determine if any given node is invariant, Invariance::compute_invariance checks if all its inputs are invariant: >> >> https://github.com/openjdk/jdk/blob/ceb4366ebf02f64165acc4a23195e9e3a7398a5c/src/hotspot/share/opto/loopPredicate.cpp#L456-L475 >> >> Therefore, when recursively traversing the inputs for 230 Div, the hoisted, unrelated check 293 RangeCheck is hit before the zero check. As that check has been hoisted before already, it is marked invariant and `all_inputs_invariant` will be set to true. (All other inputs are... > > Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: > > Combine test files @theoweidmannoracle Your change (at version 8da7cb585a422fde0d6b0926d0c11f26bf5ba6db) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22666#issuecomment-2582088204 From tweidmann at openjdk.org Fri Jan 10 09:01:16 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Fri, 10 Jan 2025 09:01:16 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v16] In-Reply-To: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: <89yHic-uESOnHEWlgbGBFANceqk6mF6qvPRHoHv9niw=.05ca4b72-3e43-4bb6-921d-8d90e994c823@github.com> > This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: > >> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >> >> I would like to start a collection of such generators, that can then be used in testing. >> >> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >> >> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >> >> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 > > The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: Fix spacing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22941/files - new: https://git.openjdk.org/jdk/pull/22941/files/9f9cc47a..9b05b639 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=14-15 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/22941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22941/head:pull/22941 PR: https://git.openjdk.org/jdk/pull/22941 From tweidmann at openjdk.org Fri Jan 10 09:01:17 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Fri, 10 Jan 2025 09:01:17 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v14] In-Reply-To: References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> <8u1qwkCip7J8liiArZhJ5T-UxbBl9Q4VYkoFaDIwFkE=.5a024998-c034-4612-92cc-c949352ee448@github.com> Message-ID: On Fri, 10 Jan 2025 08:04:31 GMT, Theo Weidmann wrote: >> test/hotspot/jtreg/testlibrary_tests/generators/tests/TestGenerators.java line 420: >> >>> 418: // normal restrictions >>> 419: mockSource.checkEmpty().enqueueInteger(4, 6, 4); >>> 420: var g1 = mockGS.safeRestrictInt(mockGS.uniformInts(4, 5),2, 5); >> >> Suggestion: >> >> var g1 = mockGS.safeRestrictInt(mockGS.uniformInts(4, 5), 2, 5); > > Thanks for spotting this! Thanks, I fixed it. IntelliJ's "inlay hints" makes this really hard to see. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22941#discussion_r1910023554 From tweidmann at openjdk.org Fri Jan 10 09:02:59 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Fri, 10 Jan 2025 09:02:59 GMT Subject: Integrated: 8331717: C2: Crash with SIGFPE Because Loop Predication Wrongly Hoists Division Requiring Zero Check In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 16:09:26 GMT, Theo Weidmann wrote: > Fixes a bug in loop predication where not strictly invariant tests involving divisions or modulo are pulled out of the loop. > > The bug can be seen in this code: > > > public class Reduced { > static int iArr[] = new int[100]; > > public static void main(String[] strArr) { > for (int i = 0; i < 10000; i++) { > test(); > } > } > > static void test() { > int i1 = 0; > > for (int i4 : iArr) { > i4 = i1; > try { > iArr[0] = 1 / i4; > i4 = iArr[2 / i4]; // Source of the crash > } catch (ArithmeticException a_e) { > } > } > } > } > > > The crucial element is the division `2 / i4`. Since it is used to access an array, it is the input to a range check. See node 230: > Screenshot 2024-12-11 at 15 14 47 > > Loop predication will try to pull this range check together with its input, the division, before the `for` loop. Due to a bug in Invariance::compute_invariance loop predication is allowed to do so, which results in the division being pulled out without its non-zero check. 322 is a clone of 230 placed before the loop head without any zero check for the divisor: > > Screenshot 2024-12-11 at 15 11 48 > > > More specifically, this bug occurs because 230's zero check (174 If) is not its direct control. Between the zero check and the division is another unrelated check (293 RangeCheck), which can be hoisted: > > Screenshot 2024-12-12 at 09 14 37 > > Due to the way the Invariance class works, a check that can be hoisted will be marked as invariant. Then, to determine if any given node is invariant, Invariance::compute_invariance checks if all its inputs are invariant: > > https://github.com/openjdk/jdk/blob/ceb4366ebf02f64165acc4a23195e9e3a7398a5c/src/hotspot/share/opto/loopPredicate.cpp#L456-L475 > > Therefore, when recursively traversing the inputs for 230 Div, the hoisted, unrelated check 293 RangeCheck is hit before the zero check. As that check has been hoisted before already, it is marked invariant and `all_inputs_invariant` will be set to true. (All other inputs are also trivially invariant as they are constant.) > > To fix this, Invariance::compute_invarianc... This pull request has now been integrated. Changeset: 55c6904e Author: Theo Weidmann URL: https://git.openjdk.org/jdk/commit/55c6904e8f3d02530749bf28f2cc966e8983a984 Stats: 90 lines in 2 files changed: 89 ins; 0 del; 1 mod 8331717: C2: Crash with SIGFPE Because Loop Predication Wrongly Hoists Division Requiring Zero Check Reviewed-by: chagedorn, qamai, kvn ------------- PR: https://git.openjdk.org/jdk/pull/22666 From duke at openjdk.org Fri Jan 10 09:12:55 2025 From: duke at openjdk.org (kuaiwei) Date: Fri, 10 Jan 2025 09:12:55 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value Message-ID: This patch enhance MergeStores optimization to support merge value with reverse byte order. Below is benchmark result before and after the patch: On aliyun g8a (aarch64) test , before , after , ratio MergeStoreBench.setCharBS ,5669.655000 ,5669.566000 ,0.00 % MergeStoreBench.setCharBV ,5516.911000 ,5516.273000 ,0.01 % MergeStoreBench.setCharC ,5578.644000 ,5552.809000 ,0.47 % MergeStoreBench.setCharLS ,5782.140000 ,5779.264000 ,0.05 % MergeStoreBench.setCharLV ,5496.403000 ,5499.195000 ,-0.05 % MergeStoreBench.setIntB ,6087.703000 ,2768.385000 ,119.90 % MergeStoreBench.setIntBU ,6733.813000 ,2950.240000 ,128.25 % MergeStoreBench.setIntBV ,1362.233000 ,1361.821000 ,0.03 % MergeStoreBench.setIntL ,2834.785000 ,2833.042000 ,0.06 % MergeStoreBench.setIntLU ,2947.145000 ,2946.874000 ,0.01 % MergeStoreBench.setIntLV ,5506.791000 ,5506.229000 ,0.01 % MergeStoreBench.setIntRB ,7634.279000 ,5611.058000 ,36.06 % MergeStoreBench.setIntRBU ,7766.737000 ,5551.281000 ,39.91 % MergeStoreBench.setIntRL ,5689.793000 ,5689.385000 ,0.01 % MergeStoreBench.setIntRLU ,5628.287000 ,5628.789000 ,-0.01 % MergeStoreBench.setIntRU ,5536.039000 ,5534.910000 ,0.02 % MergeStoreBench.setIntU ,5595.363000 ,5567.810000 ,0.49 % MergeStoreBench.setLongB ,13722.671000 ,6811.098000 ,101.48 % MergeStoreBench.setLongBU ,13728.844000 ,4280.240000 ,220.75 % MergeStoreBench.setLongBV ,2785.255000 ,2785.949000 ,-0.02 % MergeStoreBench.setLongL ,5714.615000 ,5710.402000 ,0.07 % MergeStoreBench.setLongLU ,4128.746000 ,4129.324000 ,-0.01 % MergeStoreBench.setLongLV ,2793.125000 ,2794.438000 ,-0.05 % MergeStoreBench.setLongRB ,14465.223000 ,7015.050000 ,106.20 % MergeStoreBench.setLongRBU ,14546.954000 ,6173.210000 ,135.65 % MergeStoreBench.setLongRL ,6816.145000 ,6813.348000 ,0.04 % MergeStoreBench.setLongRLU ,4289.445000 ,4284.239000 ,0.12 % MergeStoreBench.setLongRU ,3132.471000 ,3133.093000 ,-0.02 % MergeStoreBench.setLongU ,3086.779000 ,3087.298000 ,-0.02 % AMD EPYC 9T24 96-Core Processor test , before , after , ratio MergeStoreBench.setCharBS ,5317.887000 ,5327.174000 ,-0.17 % MergeStoreBench.setCharBV ,3088.976000 ,3091.006000 ,-0.07 % MergeStoreBench.setCharC ,3388.877000 ,3380.690000 ,0.24 % MergeStoreBench.setCharLS ,4584.065000 ,4588.369000 ,-0.09 % MergeStoreBench.setCharLV ,2250.598000 ,2252.032000 ,-0.06 % MergeStoreBench.setIntB ,6833.492000 ,2277.048000 ,200.10 % MergeStoreBench.setIntBU ,10100.114000 ,4599.712000 ,119.58 % MergeStoreBench.setIntBV ,571.860000 ,571.757000 ,0.02 % MergeStoreBench.setIntL ,2239.958000 ,2239.086000 ,0.04 % MergeStoreBench.setIntLU ,4565.547000 ,4596.785000 ,-0.68 % MergeStoreBench.setIntLV ,590.695000 ,589.769000 ,0.16 % MergeStoreBench.setIntRB ,8161.235000 ,3051.716000 ,167.43 % MergeStoreBench.setIntRBU ,10395.762000 ,6216.037000 ,67.24 % MergeStoreBench.setIntRL ,2555.976000 ,2554.368000 ,0.06 % MergeStoreBench.setIntRLU ,5244.833000 ,5258.066000 ,-0.25 % MergeStoreBench.setIntRU ,569.180000 ,569.181000 ,-0.00 % MergeStoreBench.setIntU ,592.201000 ,593.928000 ,-0.29 % MergeStoreBench.setLongB ,17983.730000 ,4889.856000 ,267.78 % MergeStoreBench.setLongBU ,19592.106000 ,4688.507000 ,317.88 % MergeStoreBench.setLongBV ,1142.753000 ,1143.237000 ,-0.04 % MergeStoreBench.setLongL ,4288.793000 ,4283.908000 ,0.11 % MergeStoreBench.setLongLU ,4459.737000 ,4452.856000 ,0.15 % MergeStoreBench.setLongLV ,1166.993000 ,1167.104000 ,-0.01 % MergeStoreBench.setLongRB ,17946.584000 ,4348.289000 ,312.73 % MergeStoreBench.setLongRBU ,20017.846000 ,5646.924000 ,254.49 % MergeStoreBench.setLongRL ,4895.993000 ,4899.112000 ,-0.06 % MergeStoreBench.setLongRLU ,4652.172000 ,4706.134000 ,-1.15 % MergeStoreBench.setLongRU ,1144.522000 ,1144.203000 ,0.03 % MergeStoreBench.setLongU ,1172.038000 ,1171.341000 ,0.06 % ------------- Commit messages: - Remove unused test option - 8347405: MergeStores with reverse bytes order value Changes: https://git.openjdk.org/jdk/pull/23030/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23030&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347405 Stats: 109 lines in 3 files changed: 86 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/23030.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23030/head:pull/23030 PR: https://git.openjdk.org/jdk/pull/23030 From tweidmann at openjdk.org Fri Jan 10 09:16:41 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Fri, 10 Jan 2025 09:16:41 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v4] In-Reply-To: References: Message-ID: <_bM_hl_t4Nis8xCcAJKrJD1c8N_lFoNL94yIJnR8SNs=.e1975c36-221e-493d-9030-3b181a317e29@github.com> On Wed, 8 Jan 2025 13:06:51 GMT, Emanuel Peter wrote: >> Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix test name > > test/hotspot/jtreg/compiler/stringopts/TestFluidAndNonFluid.java line 27: > >> 25: * @test >> 26: * @bug 8341696 >> 27: * @requires vm.compiler2.enabled > > Not sure if I asked about this already: do we need this C2 restriction? The IR framework only checks IR rules for C2, but the test could still do value verification for other settings where C2 is not available. Unintentional. Thanks for spotting it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22537#discussion_r1910049585 From chagedorn at openjdk.org Fri Jan 10 09:20:42 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 10 Jan 2025 09:20:42 GMT Subject: RFR: 8344035: Replace predicate walking code in Loop Unswitching with a predicate visitor [v2] In-Reply-To: References: Message-ID: On Mon, 6 Jan 2025 10:08:13 GMT, Christian Hagedorn wrote: >> This patch is a follow up to the clean-ups done with [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945) and introduces a new predicate visitor for Loop Unswitching to update the last remaining custom predicate cloning code. >> >> This patch includes the following: >> >> - New `CloneUnswitchedLoopPredicatesVisitor` class which delegates the cloning work to a new `ClonePredicateToTargetLoop` class. >> - We walk the predicate chain in the `PredicateIterator` and call the `CloneUnswitchedLoopPredicatesVisitor` for each visited predicate. Then we clone the predicate on the fly to the target loop. >> - New `ClonePredicateToTargetLoop` class: >> - Clones Parse Predicates >> - Clones Template Assertion Predicates >> - Includes rewiring of control dependent data nodes >> - Rewires the cloned predicates to the target loop with new `TargetLoopPredicateChain` class: >> - Keeps track of the current chain head, which is the target loop itself when the chain is still empty. >> - Each time a new predicate is inserted at the target loop, the old predicate chain head is set as output of the new predicate. >> - An example is shown as class comment at `TargetLoopPredicateChain`. >> - I plan to reuse this class later again when also updating `CreateAssertionPredicatesVisitor` which is done when we tackle the actual still remaining Assertion Predicate bugs. >> - Removal of custom predicate cloning code found in `PhaseIdealLoop`. >> - Changed steps performed in Loop Unswitching from: >> 1. Clone loop >> 2. Clone predicates and insert them below the unswitched loop selector If projections >> 3. Connect the cloned predicates to the unswitched loops >> >> to: >> >> 1. Clone loop >> 2. Connect unswitched loop selector If projections to unswitched loops such that they are now the new loop entries >> 3. Clone predicates and insert them between the unswitched loop selector If projections and the unswitched loops >> - Rename/update `get_template_assertion_predicates()`/`TemplateAssertionPredicateCollector` to reflect the only use left. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' into JDK-8344035 > - 8344035: Replace predicate walking code in Loop Unswitching with a predicate visitor Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22828#issuecomment-2582144502 From tweidmann at openjdk.org Fri Jan 10 09:37:04 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Fri, 10 Jan 2025 09:37:04 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v5] In-Reply-To: References: Message-ID: > Extends stringopts to also recognize non-fluid uses of StringBuilder and optimize them the same way. > > For example, this basic case was not optimized before and is optimized with this PR: > > > StringBuilder sb = new StringBuilder(); > sb.append("a"); > sb.append(a); > return sb.toString(); Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: Make code more clear ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22537/files - new: https://git.openjdk.org/jdk/pull/22537/files/b0a1b226..c82970fb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22537&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22537&range=03-04 Stats: 41 lines in 3 files changed: 13 ins; 1 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/22537.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22537/head:pull/22537 PR: https://git.openjdk.org/jdk/pull/22537 From tweidmann at openjdk.org Fri Jan 10 09:37:04 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Fri, 10 Jan 2025 09:37:04 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v4] In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 13:23:04 GMT, Emanuel Peter wrote: >> Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix test name > > Looks promising. Though I did not work with string-opts before, so a little hard for me to give a proper review. If you want/need me to review, a few more annotations with github comments would help. Otherwise I'll just leave it at the drive-by comments ;) @eme64 Thanks for your inputs. I added comments to document and explain the aspects you asked about, renamed symbols and replaced the breaks with more clear return statements. It should be clearer now hopefully. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22537#issuecomment-2582184145 From epeter at openjdk.org Fri Jan 10 09:59:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 10 Jan 2025 09:59:51 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v16] In-Reply-To: <89yHic-uESOnHEWlgbGBFANceqk6mF6qvPRHoHv9niw=.05ca4b72-3e43-4bb6-921d-8d90e994c823@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> <89yHic-uESOnHEWlgbGBFANceqk6mF6qvPRHoHv9niw=.05ca4b72-3e43-4bb6-921d-8d90e994c823@github.com> Message-ID: On Fri, 10 Jan 2025 09:01:16 GMT, Theo Weidmann wrote: >> This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: >> >>> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >>> >>> I would like to start a collection of such generators, that can then be used in testing. >>> >>> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >>> >>> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >>> >>> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 >> >> The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. > > Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: > > Fix spacing Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22941#pullrequestreview-2542014630 From fyang at openjdk.org Fri Jan 10 10:05:45 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 10 Jan 2025 10:05:45 GMT Subject: RFR: 8347366: RISC-V: Add extension asserts for CMO instructions [v2] In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 08:33:59 GMT, Robbin Ehn wrote: >> Hi please consider this minor improvement. >> >> Sanity tested. >> >> Thanks, Robbin > > Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23015#pullrequestreview-2542029312 From swen at openjdk.org Fri Jan 10 10:45:41 2025 From: swen at openjdk.org (Shaojin Wen) Date: Fri, 10 Jan 2025 10:45:41 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 09:07:11 GMT, kuaiwei wrote: > This patch enhance MergeStores optimization to support merge value with reverse byte order. > > Below is benchmark result before and after the patch: > > On aliyun g8a (aarch64) > |name | before | score2 | ratio | > |---|---|---|---| > |MergeStoreBench.setCharBS |5669.655000 |5669.566000 | 0.00 %| > |MergeStoreBench.setCharBV |5516.911000 |5516.273000 | 0.01 %| > |MergeStoreBench.setCharC |5578.644000 |5552.809000 | 0.47 %| > |MergeStoreBench.setCharLS |5782.140000 |5779.264000 | 0.05 %| > |MergeStoreBench.setCharLV |5496.403000 |5499.195000 | -0.05 %| > |MergeStoreBench.setIntB |6087.703000 |2768.385000 | 119.90 %| > |MergeStoreBench.setIntBU |6733.813000 |2950.240000 | 128.25 %| > |MergeStoreBench.setIntBV |1362.233000 |1361.821000 | 0.03 %| > |MergeStoreBench.setIntL |2834.785000 |2833.042000 | 0.06 %| > |MergeStoreBench.setIntLU |2947.145000 |2946.874000 | 0.01 %| > |MergeStoreBench.setIntLV |5506.791000 |5506.229000 | 0.01 %| > |MergeStoreBench.setIntRB |7634.279000 |5611.058000 | 36.06 %| > |MergeStoreBench.setIntRBU |7766.737000 |5551.281000 | 39.91 %| > |MergeStoreBench.setIntRL |5689.793000 |5689.385000 | 0.01 %| > |MergeStoreBench.setIntRLU |5628.287000 |5628.789000 | -0.01 %| > |MergeStoreBench.setIntRU |5536.039000 |5534.910000 | 0.02 %| > |MergeStoreBench.setIntU |5595.363000 |5567.810000 | 0.49 %| > |MergeStoreBench.setLongB |13722.671000 |6811.098000 | 101.48 %| > |MergeStoreBench.setLongBU |13728.844000 |4280.240000 | 220.75 %| > |MergeStoreBench.setLongBV |2785.255000 |2785.949000 | -0.02 %| > |MergeStoreBench.setLongL |5714.615000 |5710.402000 | 0.07 %| > |MergeStoreBench.setLongLU |4128.746000 |4129.324000 | -0.01 %| > |MergeStoreBench.setLongLV |2793.125000 |2794.438000 | -0.05 %| > |MergeStoreBench.setLongRB |14465.223000 |7015.050000 | 106.20 %| > |MergeStoreBench.setLongRBU |14546.954000 |6173.210000 | 135.65 %| > |MergeStoreBench.setLongRL |6816.145000 |6813.348000 | 0.04 %| > |MergeStoreBench.setLongRLU |4289.445000 |4284.239000 | 0.12 %| > |MergeStoreBench.setLongRU |3132.471000 |3133.093000 | -0.02 %| > |MergeStoreBench.setLongU |3086.779000 |3087.298000 | -0.02 %| > > AMD EPYC 9T24 > |name | before | after | ratio | > |---|---|---|---| > |MergeStoreBench.setChar... @eme64 Can you help review this PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23030#issuecomment-2582393288 From swen at openjdk.org Fri Jan 10 10:48:53 2025 From: swen at openjdk.org (Shaojin Wen) Date: Fri, 10 Jan 2025 10:48:53 GMT Subject: RFR: 8343629: More MergeStore benchmark In-Reply-To: <0bDxaCchizjbuutDlPOANL0F8f9fftZy-XLJ2XodJlo=.c79d363e-a078-4538-9ff5-47087c60fe01@github.com> References: <6B34f81JucswxU43rqcM1jF1UDoVhYs7ukuClJvYKNw=.6c7cc0a1-fe21-4928-9ee6-26deb1b189eb@github.com> <3imwZoYxFhWbvIM871w0bRVtAaZRVSrvpr47GxOtWGI=.a0184988-0266-45ea-af48-becc568ee5bd@github.com> <0bDxaCchizjbuutDlPOANL0F8f9fftZy-XLJ2XodJlo=.c79d363e-a078-4538-9ff5-47087c60fe01@github.com> Message-ID: On Thu, 14 Nov 2024 06:57:01 GMT, Emanuel Peter wrote: > These are all good ideas, and I already discussed it offline with @cl4es . I have lots of tasks I'm working on, and this is on the lowest tier of priorities for me personally. But if someone else wants to jump on that, then I can coach and review. > > We could also be interested in "MergeCopy", i.e. load->store patterns. Maybe this just ends up being SuperWord again, but this time for straight line code. PR #23030 has been submitted to add support for BigEndian MergeStore ------------- PR Comment: https://git.openjdk.org/jdk/pull/21659#issuecomment-2582402580 From dfenacci at openjdk.org Fri Jan 10 10:58:09 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 10 Jan 2025 10:58:09 GMT Subject: RFR: 8347407: [BACKOUT] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) Message-ID: This reverts _8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash)_ (commit 633fad8). The fix increased the code cache but was incomplete and didn't fix the underlying issue with code cache allocation potentially crashing. ------------- Commit messages: - JDK-8347407: [BACKOUT] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) Changes: https://git.openjdk.org/jdk/pull/23031/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23031&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347407 Stats: 44 lines in 9 files changed: 5 ins; 26 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/23031.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23031/head:pull/23031 PR: https://git.openjdk.org/jdk/pull/23031 From bulasevich at openjdk.org Fri Jan 10 11:16:40 2025 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 10 Jan 2025 11:16:40 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v6] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Thu, 9 Jan 2025 01:23:18 GMT, Dean Long wrote: > I just noticed the old code would have replaced the oop_Relocation with a internal_word_Relocation for the NearCpool==false case. How did that ever work correctly? Thanks for review! NearCpool==false was not functioning correctly. I had considered a separate [fix](https://github.com/openjdk/jdk/pull/22448/files) for it, but with the current change, the incorrect code is eliminated. $ ~/jdk-23/bin/java -XX:-NearCpool -XX:+UseShenandoahGC -version # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x0000ffff6f94c190, pid=95263, tid=95264 # # JRE version: OpenJDK Runtime Environment (23.0+38) (build 23+38) # Java VM: OpenJDK 64-Bit Server VM (23+38, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, shenandoah gc, linux-aarch64) # Problematic frame: # j java.lang.module.ModuleDescriptor.(Ljava/lang/String;Ljava/lang/module/ModuleDescriptor$Version;Ljava/util/Set;Ljava/util/Set;Ljava/util/Set;Ljava/util/Set;Ljava/util/Set;Ljava/util/Set;Ljava/util/Set;Ljava/lang/String;IZ)V+29 java.base # # Core dump will be written. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2582462847 From epeter at openjdk.org Fri Jan 10 11:21:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 10 Jan 2025 11:21:44 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 09:07:11 GMT, kuaiwei wrote: > This patch enhance MergeStores optimization to support merge value with reverse byte order. > > Below is benchmark result before and after the patch: > > On aliyun g8a (aarch64) > |name | before | score2 | ratio | > |---|---|---|---| > |MergeStoreBench.setCharBS |5669.655000 |5669.566000 | 0.00 %| > |MergeStoreBench.setCharBV |5516.911000 |5516.273000 | 0.01 %| > |MergeStoreBench.setCharC |5578.644000 |5552.809000 | 0.47 %| > |MergeStoreBench.setCharLS |5782.140000 |5779.264000 | 0.05 %| > |MergeStoreBench.setCharLV |5496.403000 |5499.195000 | -0.05 %| > |MergeStoreBench.setIntB |6087.703000 |2768.385000 | 119.90 %| > |MergeStoreBench.setIntBU |6733.813000 |2950.240000 | 128.25 %| > |MergeStoreBench.setIntBV |1362.233000 |1361.821000 | 0.03 %| > |MergeStoreBench.setIntL |2834.785000 |2833.042000 | 0.06 %| > |MergeStoreBench.setIntLU |2947.145000 |2946.874000 | 0.01 %| > |MergeStoreBench.setIntLV |5506.791000 |5506.229000 | 0.01 %| > |MergeStoreBench.setIntRB |7634.279000 |5611.058000 | 36.06 %| > |MergeStoreBench.setIntRBU |7766.737000 |5551.281000 | 39.91 %| > |MergeStoreBench.setIntRL |5689.793000 |5689.385000 | 0.01 %| > |MergeStoreBench.setIntRLU |5628.287000 |5628.789000 | -0.01 %| > |MergeStoreBench.setIntRU |5536.039000 |5534.910000 | 0.02 %| > |MergeStoreBench.setIntU |5595.363000 |5567.810000 | 0.49 %| > |MergeStoreBench.setLongB |13722.671000 |6811.098000 | 101.48 %| > |MergeStoreBench.setLongBU |13728.844000 |4280.240000 | 220.75 %| > |MergeStoreBench.setLongBV |2785.255000 |2785.949000 | -0.02 %| > |MergeStoreBench.setLongL |5714.615000 |5710.402000 | 0.07 %| > |MergeStoreBench.setLongLU |4128.746000 |4129.324000 | -0.01 %| > |MergeStoreBench.setLongLV |2793.125000 |2794.438000 | -0.05 %| > |MergeStoreBench.setLongRB |14465.223000 |7015.050000 | 106.20 %| > |MergeStoreBench.setLongRBU |14546.954000 |6173.210000 | 135.65 %| > |MergeStoreBench.setLongRL |6816.145000 |6813.348000 | 0.04 %| > |MergeStoreBench.setLongRLU |4289.445000 |4284.239000 | 0.12 %| > |MergeStoreBench.setLongRU |3132.471000 |3133.093000 | -0.02 %| > |MergeStoreBench.setLongU |3086.779000 |3087.298000 | -0.02 %| > > AMD EPYC 9T24 > |name | before | after | ratio | > |---|---|---|---| > |MergeStoreBench.setChar... This looks very promising, thanks for working on this! Makes me very happy that people are extending it ? I have a few comments and suggestions below. Can you please link the JBS issue to the other relevant RFE's for MergeStores? Is there no way to reverse shorts and ints? src/hotspot/share/opto/memnode.cpp line 2794: > 2792: StoreNode* const _store; > 2793: enum DataOrder { Unknown, Forward, Reverse}; > 2794: DataOrder _value_order; Suggestion: // State machine with initial state Unknown // Allowed transitions: // Unknown -> Forward // Unknown -> Backward // Forward -> Forward // Backward -> Backward enum DataOrder { Unknown, Forward, Reverse }; DataOrder _value_order; Also: call it either consistently data-order or value-order ;) src/hotspot/share/opto/memnode.cpp line 2956: > 2954: } > 2955: > 2956: bool MergePrimitiveStores::is_adjacent_input_pair(const Node* n1, const Node* n2, const int memory_size) { Nit: This may be a little "nitpicky". But I don't like `is_...` methods that have side-effects. That is why I'd be sad to see the `const` go. src/hotspot/share/opto/memnode.cpp line 2993: > 2991: } > 2992: > 2993: // initialize value_order once Suggestion: // Initial state "Unknown": check for transition to Forward or Reverse. src/hotspot/share/opto/memnode.cpp line 2996: > 2994: if (_value_order == DataOrder::Unknown) { > 2995: if (shift_n1 < shift_n2) { > 2996: _value_order = DataOrder::Forward; Suggestion: _value_order = DataOrder::Forward; // First pair has Forward order. src/hotspot/share/opto/memnode.cpp line 3002: > 3000: Matcher::match_rule_supported(Op_ReverseBytesL)) { > 3001: _value_order = DataOrder::Reverse; // only support reverse bytes > 3002: #endif Can you leave a comment why we only need this for little endian? It seems you are now generating `ReverseByte` nodes on any platform, right? src/hotspot/share/opto/memnode.cpp line 3010: > 3008: if ((_value_order == DataOrder::Forward && shift_n1 > shift_n2) || > 3009: (_value_order == DataOrder::Reverse && shift_n1 < shift_n2)) { > 3010: // wrong order Suggestion: // Wrong order: mixed Forward and Reverse not allowed. src/hotspot/share/opto/memnode.cpp line 3044: > 3042: shift_out = 0; > 3043: return true; > 3044: } Can you tell me why you added this? src/hotspot/share/opto/memnode.cpp line 3270: > 3268: "merged_input_value is either int or long, and new_memory_size is small enough"); > 3269: > 3270: if (_value_order == DataOrder::Reverse) { Suggestion: if (_value_order == DataOrder::Reverse) { assert(_store->memory_size() == 1, "only implemented for bytes"); That would be correct, right? src/hotspot/share/opto/memnode.cpp line 3276: > 3274: merged_input_value = _phase->transform(new ReverseBytesINode(nullptr, merged_input_value)); > 3275: } else { > 3276: return nullptr; Suggestion: // . return nullptr; ------------- PR Review: https://git.openjdk.org/jdk/pull/23030#pullrequestreview-2542142687 PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910222532 PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910203556 PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910226335 PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910227487 PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910207983 PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910224201 PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910208790 PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910232296 PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910230898 From epeter at openjdk.org Fri Jan 10 11:21:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 10 Jan 2025 11:21:45 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 10:56:59 GMT, Emanuel Peter wrote: >> This patch enhance MergeStores optimization to support merge value with reverse byte order. >> >> Below is benchmark result before and after the patch: >> >> On aliyun g8a (aarch64) >> |name | before | score2 | ratio | >> |---|---|---|---| >> |MergeStoreBench.setCharBS |5669.655000 |5669.566000 | 0.00 %| >> |MergeStoreBench.setCharBV |5516.911000 |5516.273000 | 0.01 %| >> |MergeStoreBench.setCharC |5578.644000 |5552.809000 | 0.47 %| >> |MergeStoreBench.setCharLS |5782.140000 |5779.264000 | 0.05 %| >> |MergeStoreBench.setCharLV |5496.403000 |5499.195000 | -0.05 %| >> |MergeStoreBench.setIntB |6087.703000 |2768.385000 | 119.90 %| >> |MergeStoreBench.setIntBU |6733.813000 |2950.240000 | 128.25 %| >> |MergeStoreBench.setIntBV |1362.233000 |1361.821000 | 0.03 %| >> |MergeStoreBench.setIntL |2834.785000 |2833.042000 | 0.06 %| >> |MergeStoreBench.setIntLU |2947.145000 |2946.874000 | 0.01 %| >> |MergeStoreBench.setIntLV |5506.791000 |5506.229000 | 0.01 %| >> |MergeStoreBench.setIntRB |7634.279000 |5611.058000 | 36.06 %| >> |MergeStoreBench.setIntRBU |7766.737000 |5551.281000 | 39.91 %| >> |MergeStoreBench.setIntRL |5689.793000 |5689.385000 | 0.01 %| >> |MergeStoreBench.setIntRLU |5628.287000 |5628.789000 | -0.01 %| >> |MergeStoreBench.setIntRU |5536.039000 |5534.910000 | 0.02 %| >> |MergeStoreBench.setIntU |5595.363000 |5567.810000 | 0.49 %| >> |MergeStoreBench.setLongB |13722.671000 |6811.098000 | 101.48 %| >> |MergeStoreBench.setLongBU |13728.844000 |4280.240000 | 220.75 %| >> |MergeStoreBench.setLongBV |2785.255000 |2785.949000 | -0.02 %| >> |MergeStoreBench.setLongL |5714.615000 |5710.402000 | 0.07 %| >> |MergeStoreBench.setLongLU |4128.746000 |4129.324000 | -0.01 %| >> |MergeStoreBench.setLongLV |2793.125000 |2794.438000 | -0.05 %| >> |MergeStoreBench.setLongRB |14465.223000 |7015.050000 | 106.20 %| >> |MergeStoreBench.setLongRBU |14546.954000 |6173.210000 | 135.65 %| >> |MergeStoreBench.setLongRL |6816.145000 |6813.348000 | 0.04 %| >> |MergeStoreBench.setLongRLU |4289.445000 |4284.239000 | 0.12 %| >> |MergeStoreBench.setLongRU |3132.471000 |3133.093000 | -0.02 %| >> |MergeStoreBench.setLongU |3086.779000 |3087.298000 | -0.02 %| >> >> AMD EPYC 9T24 >> ... > > src/hotspot/share/opto/memnode.cpp line 3002: > >> 3000: Matcher::match_rule_supported(Op_ReverseBytesL)) { >> 3001: _value_order = DataOrder::Reverse; // only support reverse bytes >> 3002: #endif > > Can you leave a comment why we only need this for little endian? It seems you are now generating `ReverseByte` nodes on any platform, right? Ah, maybe that is because you have not `big-endian` machine to test on. We could leave this to someone who cares about big-endian - and they could also adjust the tests accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910213753 From epeter at openjdk.org Fri Jan 10 11:21:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 10 Jan 2025 11:21:46 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 11:01:44 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/memnode.cpp line 3002: >> >>> 3000: Matcher::match_rule_supported(Op_ReverseBytesL)) { >>> 3001: _value_order = DataOrder::Reverse; // only support reverse bytes >>> 3002: #endif >> >> Can you leave a comment why we only need this for little endian? It seems you are now generating `ReverseByte` nodes on any platform, right? > > Ah, maybe that is because you have not `big-endian` machine to test on. We could leave this to someone who cares about big-endian - and they could also adjust the tests accordingly. Suggestion: #ifdef VM_LITTLE_ENDIAN // For now, we only implement Reverse order for little-endian, and only for bytes. } else if (memory_size == 1 && Matcher::match_rule_supported(Op_ReverseBytesI) && Matcher::match_rule_supported(Op_ReverseBytesL)) { _value_order = DataOrder::Reverse; // First pair has Reverse order. #endif ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910228511 From dfenacci at openjdk.org Fri Jan 10 11:37:23 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 10 Jan 2025 11:37:23 GMT Subject: RFR: 8347407: [BACKOUT] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v2] In-Reply-To: References: Message-ID: > This reverts _8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash)_ (commit 633fad8). > The fix increased the code cache but was incomplete and didn't fix the underlying issue with code cache allocation potentially crashing (for the latter we have JBS issue [JDK-8339700](https://bugs.openjdk.org/browse/JDK-8339700)). Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8347407: fix JBS issue number in problem list file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23031/files - new: https://git.openjdk.org/jdk/pull/23031/files/fa668352..8c228b5b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23031&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23031&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23031.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23031/head:pull/23031 PR: https://git.openjdk.org/jdk/pull/23031 From duke at openjdk.org Fri Jan 10 12:49:40 2025 From: duke at openjdk.org (kuaiwei) Date: Fri, 10 Jan 2025 12:49:40 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 11:14:09 GMT, Emanuel Peter wrote: >> Ah, maybe that is because you have not `big-endian` machine to test on. We could leave this to someone who cares about big-endian - and they could also adjust the tests accordingly. > > Suggestion: > > #ifdef VM_LITTLE_ENDIAN > // For now, we only implement Reverse order for little-endian, and only for bytes. > } else if (memory_size == 1 && > Matcher::match_rule_supported(Op_ReverseBytesI) && > Matcher::match_rule_supported(Op_ReverseBytesL)) { > _value_order = DataOrder::Reverse; // First pair has Reverse order. > #endif > Ah, maybe that is because you have not `big-endian` machine to test on. We could leave this to someone who cares about big-endian - and they could also adjust the tests accordingly. Yes, I'm not sure work correctly on big endian machine. So I only enable it for little endian mode. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910332094 From duke at openjdk.org Fri Jan 10 12:53:38 2025 From: duke at openjdk.org (kuaiwei) Date: Fri, 10 Jan 2025 12:53:38 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 10:57:35 GMT, Emanuel Peter wrote: >> This patch enhance MergeStores optimization to support merge value with reverse byte order. >> >> Below is benchmark result before and after the patch: >> >> On aliyun g8a (aarch64) >> |name | before | score2 | ratio | >> |---|---|---|---| >> |MergeStoreBench.setCharBS |5669.655000 |5669.566000 | 0.00 %| >> |MergeStoreBench.setCharBV |5516.911000 |5516.273000 | 0.01 %| >> |MergeStoreBench.setCharC |5578.644000 |5552.809000 | 0.47 %| >> |MergeStoreBench.setCharLS |5782.140000 |5779.264000 | 0.05 %| >> |MergeStoreBench.setCharLV |5496.403000 |5499.195000 | -0.05 %| >> |MergeStoreBench.setIntB |6087.703000 |2768.385000 | 119.90 %| >> |MergeStoreBench.setIntBU |6733.813000 |2950.240000 | 128.25 %| >> |MergeStoreBench.setIntBV |1362.233000 |1361.821000 | 0.03 %| >> |MergeStoreBench.setIntL |2834.785000 |2833.042000 | 0.06 %| >> |MergeStoreBench.setIntLU |2947.145000 |2946.874000 | 0.01 %| >> |MergeStoreBench.setIntLV |5506.791000 |5506.229000 | 0.01 %| >> |MergeStoreBench.setIntRB |7634.279000 |5611.058000 | 36.06 %| >> |MergeStoreBench.setIntRBU |7766.737000 |5551.281000 | 39.91 %| >> |MergeStoreBench.setIntRL |5689.793000 |5689.385000 | 0.01 %| >> |MergeStoreBench.setIntRLU |5628.287000 |5628.789000 | -0.01 %| >> |MergeStoreBench.setIntRU |5536.039000 |5534.910000 | 0.02 %| >> |MergeStoreBench.setIntU |5595.363000 |5567.810000 | 0.49 %| >> |MergeStoreBench.setLongB |13722.671000 |6811.098000 | 101.48 %| >> |MergeStoreBench.setLongBU |13728.844000 |4280.240000 | 220.75 %| >> |MergeStoreBench.setLongBV |2785.255000 |2785.949000 | -0.02 %| >> |MergeStoreBench.setLongL |5714.615000 |5710.402000 | 0.07 %| >> |MergeStoreBench.setLongLU |4128.746000 |4129.324000 | -0.01 %| >> |MergeStoreBench.setLongLV |2793.125000 |2794.438000 | -0.05 %| >> |MergeStoreBench.setLongRB |14465.223000 |7015.050000 | 106.20 %| >> |MergeStoreBench.setLongRBU |14546.954000 |6173.210000 | 135.65 %| >> |MergeStoreBench.setLongRL |6816.145000 |6813.348000 | 0.04 %| >> |MergeStoreBench.setLongRLU |4289.445000 |4284.239000 | 0.12 %| >> |MergeStoreBench.setLongRU |3132.471000 |3133.093000 | -0.02 %| >> |MergeStoreBench.setLongU |3086.779000 |3087.298000 | -0.02 %| >> >> AMD EPYC 9T24 >> ... > > src/hotspot/share/opto/memnode.cpp line 3044: > >> 3042: shift_out = 0; >> 3043: return true; >> 3044: } > > Can you tell me why you added this? When I debug a case, I found a base node is a LoadNode, and the rest shift nodes are derived from it. Because the LoadNode is not recognized as a shift node, all of them can not be merged. I think all node with Int or Long type can be parsed as "base << 0". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910337160 From fyang at openjdk.org Fri Jan 10 12:56:56 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 10 Jan 2025 12:56:56 GMT Subject: [jdk24] RFR: 8346868: RISC-V: compiler/sharedstubs tests fail after JDK-8332689 In-Reply-To: References: Message-ID: <9DMLqpSUj4PkW-QH7aBWtHfWLkKyaSebSB2v9zIR-fA=.da9ae93e-4e02-4acf-abe8-ff5caad7ae50@github.com> On Tue, 7 Jan 2025 10:57:48 GMT, Fei Yang wrote: > Hi all, > > Same issue is there in jdk24 repo. > > This pull request contains a backport of commit [3f7052ed](https://github.com/openjdk/jdk/commit/3f7052ed7af89efd1c6977df0b4f3b95fcfec764) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Fei Yang on 7 Jan 2025 and was reviewed by Robbin Ehn and Hamlin Li. > > Thanks! Close in favor of https://github.com/openjdk/jdk24u/pull/15. (This is a P4 bug) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22945#issuecomment-2582646739 From fyang at openjdk.org Fri Jan 10 12:56:56 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 10 Jan 2025 12:56:56 GMT Subject: [jdk24] Withdrawn: 8346868: RISC-V: compiler/sharedstubs tests fail after JDK-8332689 In-Reply-To: References: Message-ID: On Tue, 7 Jan 2025 10:57:48 GMT, Fei Yang wrote: > Hi all, > > Same issue is there in jdk24 repo. > > This pull request contains a backport of commit [3f7052ed](https://github.com/openjdk/jdk/commit/3f7052ed7af89efd1c6977df0b4f3b95fcfec764) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Fei Yang on 7 Jan 2025 and was reviewed by Robbin Ehn and Hamlin Li. > > Thanks! This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/22945 From aph at openjdk.org Fri Jan 10 13:12:40 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 Jan 2025 13:12:40 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 09:07:11 GMT, kuaiwei wrote: > This patch enhance MergeStores optimization to support merge value with reverse byte order. > > Below is benchmark result before and after the patch: > > On aliyun g8a (aarch64) > |name | before | score2 | ratio | > |---|---|---|---| > |MergeStoreBench.setCharBS |5669.655000 |5669.566000 | 0.00 %| > |MergeStoreBench.setCharBV |5516.911000 |5516.273000 | 0.01 %| > |MergeStoreBench.setCharC |5578.644000 |5552.809000 | 0.47 %| > |MergeStoreBench.setCharLS |5782.140000 |5779.264000 | 0.05 %| > |MergeStoreBench.setCharLV |5496.403000 |5499.195000 | -0.05 %| > |MergeStoreBench.setIntB |6087.703000 |2768.385000 | 119.90 %| > |MergeStoreBench.setIntBU |6733.813000 |2950.240000 | 128.25 %| > |MergeStoreBench.setIntBV |1362.233000 |1361.821000 | 0.03 %| > |MergeStoreBench.setIntL |2834.785000 |2833.042000 | 0.06 %| > |MergeStoreBench.setIntLU |2947.145000 |2946.874000 | 0.01 %| > |MergeStoreBench.setIntLV |5506.791000 |5506.229000 | 0.01 %| > |MergeStoreBench.setIntRB |7634.279000 |5611.058000 | 36.06 %| > |MergeStoreBench.setIntRBU |7766.737000 |5551.281000 | 39.91 %| > |MergeStoreBench.setIntRL |5689.793000 |5689.385000 | 0.01 %| > |MergeStoreBench.setIntRLU |5628.287000 |5628.789000 | -0.01 %| > |MergeStoreBench.setIntRU |5536.039000 |5534.910000 | 0.02 %| > |MergeStoreBench.setIntU |5595.363000 |5567.810000 | 0.49 %| > |MergeStoreBench.setLongB |13722.671000 |6811.098000 | 101.48 %| > |MergeStoreBench.setLongBU |13728.844000 |4280.240000 | 220.75 %| > |MergeStoreBench.setLongBV |2785.255000 |2785.949000 | -0.02 %| > |MergeStoreBench.setLongL |5714.615000 |5710.402000 | 0.07 %| > |MergeStoreBench.setLongLU |4128.746000 |4129.324000 | -0.01 %| > |MergeStoreBench.setLongLV |2793.125000 |2794.438000 | -0.05 %| > |MergeStoreBench.setLongRB |14465.223000 |7015.050000 | 106.20 %| > |MergeStoreBench.setLongRBU |14546.954000 |6173.210000 | 135.65 %| > |MergeStoreBench.setLongRL |6816.145000 |6813.348000 | 0.04 %| > |MergeStoreBench.setLongRLU |4289.445000 |4284.239000 | 0.12 %| > |MergeStoreBench.setLongRU |3132.471000 |3133.093000 | -0.02 %| > |MergeStoreBench.setLongU |3086.779000 |3087.298000 | -0.02 %| > > AMD EPYC 9T24 > |name | before | after | ratio | > |---|---|---|---| > |MergeStoreBench.setChar... This patch should include a JMH benchmark. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23030#issuecomment-2582679262 From duke at openjdk.org Fri Jan 10 13:22:50 2025 From: duke at openjdk.org (kuaiwei) Date: Fri, 10 Jan 2025 13:22:50 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value [v2] In-Reply-To: References: Message-ID: > This patch enhance MergeStores optimization to support merge value with reverse byte order. > > Below is benchmark result before and after the patch: > > On aliyun g8a (aarch64) > |name | before | score2 | ratio | > |---|---|---|---| > |MergeStoreBench.setCharBS |5669.655000 |5669.566000 | 0.00 %| > |MergeStoreBench.setCharBV |5516.911000 |5516.273000 | 0.01 %| > |MergeStoreBench.setCharC |5578.644000 |5552.809000 | 0.47 %| > |MergeStoreBench.setCharLS |5782.140000 |5779.264000 | 0.05 %| > |MergeStoreBench.setCharLV |5496.403000 |5499.195000 | -0.05 %| > |MergeStoreBench.setIntB |6087.703000 |2768.385000 | 119.90 %| > |MergeStoreBench.setIntBU |6733.813000 |2950.240000 | 128.25 %| > |MergeStoreBench.setIntBV |1362.233000 |1361.821000 | 0.03 %| > |MergeStoreBench.setIntL |2834.785000 |2833.042000 | 0.06 %| > |MergeStoreBench.setIntLU |2947.145000 |2946.874000 | 0.01 %| > |MergeStoreBench.setIntLV |5506.791000 |5506.229000 | 0.01 %| > |MergeStoreBench.setIntRB |7634.279000 |5611.058000 | 36.06 %| > |MergeStoreBench.setIntRBU |7766.737000 |5551.281000 | 39.91 %| > |MergeStoreBench.setIntRL |5689.793000 |5689.385000 | 0.01 %| > |MergeStoreBench.setIntRLU |5628.287000 |5628.789000 | -0.01 %| > |MergeStoreBench.setIntRU |5536.039000 |5534.910000 | 0.02 %| > |MergeStoreBench.setIntU |5595.363000 |5567.810000 | 0.49 %| > |MergeStoreBench.setLongB |13722.671000 |6811.098000 | 101.48 %| > |MergeStoreBench.setLongBU |13728.844000 |4280.240000 | 220.75 %| > |MergeStoreBench.setLongBV |2785.255000 |2785.949000 | -0.02 %| > |MergeStoreBench.setLongL |5714.615000 |5710.402000 | 0.07 %| > |MergeStoreBench.setLongLU |4128.746000 |4129.324000 | -0.01 %| > |MergeStoreBench.setLongLV |2793.125000 |2794.438000 | -0.05 %| > |MergeStoreBench.setLongRB |14465.223000 |7015.050000 | 106.20 %| > |MergeStoreBench.setLongRBU |14546.954000 |6173.210000 | 135.65 %| > |MergeStoreBench.setLongRL |6816.145000 |6813.348000 | 0.04 %| > |MergeStoreBench.setLongRLU |4289.445000 |4284.239000 | 0.12 %| > |MergeStoreBench.setLongRU |3132.471000 |3133.093000 | -0.02 %| > |MergeStoreBench.setLongU |3086.779000 |3087.298000 | -0.02 %| > > AMD EPYC 9T24 > |name | before | after | ratio | > |---|---|---|---| > |MergeStoreBench.setChar... kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Fix as review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23030/files - new: https://git.openjdk.org/jdk/pull/23030/files/a48ce987..4262b93c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23030&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23030&range=00-01 Stats: 20 lines in 1 file changed: 7 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/23030.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23030/head:pull/23030 PR: https://git.openjdk.org/jdk/pull/23030 From duke at openjdk.org Fri Jan 10 13:26:45 2025 From: duke at openjdk.org (kuaiwei) Date: Fri, 10 Jan 2025 13:26:45 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value [v2] In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 11:08:53 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix as review comments > > src/hotspot/share/opto/memnode.cpp line 2794: > >> 2792: StoreNode* const _store; >> 2793: enum DataOrder { Unknown, Forward, Reverse}; >> 2794: DataOrder _value_order; > > Suggestion: > > // State machine with initial state Unknown > // Allowed transitions: > // Unknown -> Forward > // Unknown -> Backward > // Forward -> Forward > // Backward -> Backward > enum DataOrder { Unknown, Forward, Reverse }; > DataOrder _value_order; > > > Also: call it either consistently data-order or value-order ;) Thanks for comment. I rename the enum to ValueOrder, and rename Reverse to Backward. > src/hotspot/share/opto/memnode.cpp line 2993: > >> 2991: } >> 2992: >> 2993: // initialize value_order once > > Suggestion: > > // Initial state "Unknown": check for transition to Forward or Reverse. Fixed > src/hotspot/share/opto/memnode.cpp line 2996: > >> 2994: if (_value_order == DataOrder::Unknown) { >> 2995: if (shift_n1 < shift_n2) { >> 2996: _value_order = DataOrder::Forward; > > Suggestion: > > _value_order = DataOrder::Forward; // First pair has Forward order. Fixed > src/hotspot/share/opto/memnode.cpp line 3010: > >> 3008: if ((_value_order == DataOrder::Forward && shift_n1 > shift_n2) || >> 3009: (_value_order == DataOrder::Reverse && shift_n1 < shift_n2)) { >> 3010: // wrong order > > Suggestion: > > // Wrong order: mixed Forward and Reverse not allowed. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910374211 PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910374912 PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910375233 PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910374680 From duke at openjdk.org Fri Jan 10 13:31:42 2025 From: duke at openjdk.org (kuaiwei) Date: Fri, 10 Jan 2025 13:31:42 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value [v2] In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 11:17:54 GMT, Emanuel Peter wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix as review comments > > src/hotspot/share/opto/memnode.cpp line 3270: > >> 3268: "merged_input_value is either int or long, and new_memory_size is small enough"); >> 3269: >> 3270: if (_value_order == DataOrder::Reverse) { > > Suggestion: > > if (_value_order == DataOrder::Reverse) { > assert(_store->memory_size() == 1, "only implemented for bytes"); > > That would be correct, right? Assert added ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1910380118 From aph at openjdk.org Fri Jan 10 13:31:40 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 Jan 2025 13:31:40 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 13:09:35 GMT, Andrew Haley wrote: > This patch should include a JMH benchmark. My mistake, I see the benchmark was committed in an earlier patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23030#issuecomment-2582712772 From dfenacci at openjdk.org Fri Jan 10 14:02:32 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 10 Jan 2025 14:02:32 GMT Subject: RFR: 8347407: [BACKOUT] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v3] In-Reply-To: References: Message-ID: > This reverts _8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash)_ (commit 633fad8). > The fix increased the code cache but was incomplete and didn't fix the underlying issue with code cache allocation potentially crashing (for the latter we have JBS issue [JDK-8339700](https://bugs.openjdk.org/browse/JDK-8339700)). Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8347407: re-fix JBS issue number in problem list file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23031/files - new: https://git.openjdk.org/jdk/pull/23031/files/8c228b5b..0c0874bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23031&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23031&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23031.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23031/head:pull/23031 PR: https://git.openjdk.org/jdk/pull/23031 From thartmann at openjdk.org Fri Jan 10 14:07:34 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 10 Jan 2025 14:07:34 GMT Subject: RFR: 8347407: [BACKOUT] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v3] In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 14:02:32 GMT, Damon Fenacci wrote: >> This reverts _8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash)_ (commit 633fad8). >> The fix increased the code cache but was incomplete and didn't fix the underlying issue with code cache allocation potentially crashing (for the latter we have JBS issue [JDK-8339700](https://bugs.openjdk.org/browse/JDK-8339700)). > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8347407: re-fix JBS issue number in problem list file The extra includes are a bit weird, let's make sure to fix that again with the REDO. Looks like a clean backout, i.e. trivial, to me otherwise. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23031#pullrequestreview-2542516560 From dfenacci at openjdk.org Fri Jan 10 14:38:41 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 10 Jan 2025 14:38:41 GMT Subject: RFR: 8347407: [BACKOUT] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v3] In-Reply-To: References: Message-ID: <0qE7Tl_aipXjrwLz96nOUDSeBUdabBpjpKKfUtv8k_4=.fe544cc5-7f40-4d1b-8bcc-b44a106d7339@github.com> On Fri, 10 Jan 2025 14:04:38 GMT, Tobias Hartmann wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8347407: re-fix JBS issue number in problem list file > > The extra includes are a bit weird, let's make sure to fix that again with the REDO. Looks like a clean backout, i.e. trivial, to me otherwise. Thanks @TobiHartmann for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23031#issuecomment-2582851491 From tweidmann at openjdk.org Fri Jan 10 15:28:26 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Fri, 10 Jan 2025 15:28:26 GMT Subject: RFR: 8346607: IGV: Support drag-and-drop for opening graph files Message-ID: Adds the ability to drag-and-drop graph files into IGV. A welcome screen invites users to drag and drop files: https://github.com/user-attachments/assets/e08810bd-4e75-42d0-9507-fb24366803f9 The welcome screen disappears while any graphs have been imported (via opening a file or via socket) and reappears automatically if no graphs are imported in the workspace (that is, if the list on the left-hand side is completely empty). Furthermore, graphs can be dropped at any time in the areas in the green boxes in the image below. The reason only these areas work is due to the unfortunate way in which Apache NetBeans handles drag and drop events, as many components just consume events they cannot process without offering any apparent way to customize the drop behavior. Screenshot 2025-01-10 at 16 11 13 ------------- Commit messages: - Fix whitespace - Improve show hide logic - Implement file drag and drop Changes: https://git.openjdk.org/jdk/pull/23040/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23040&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346607 Stats: 148 lines in 2 files changed: 138 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/23040.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23040/head:pull/23040 PR: https://git.openjdk.org/jdk/pull/23040 From tweidmann at openjdk.org Fri Jan 10 15:33:17 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Fri, 10 Jan 2025 15:33:17 GMT Subject: RFR: 8346607: IGV: Support drag-and-drop for opening graph files [v2] In-Reply-To: References: Message-ID: > Adds the ability to drag-and-drop graph files into IGV. > > A welcome screen invites users to drag and drop files: > > https://github.com/user-attachments/assets/e08810bd-4e75-42d0-9507-fb24366803f9 > > The welcome screen disappears while any graphs have been imported (via opening a file or via socket) and reappears automatically if no graphs are imported in the workspace (that is, if the list on the left-hand side is completely empty). > > Furthermore, graphs can be dropped at any time in the areas in the green boxes in the image below. The reason only these areas work is due to the unfortunate way in which Apache NetBeans handles drag and drop events, as many components just consume events they cannot process without offering any apparent way to customize the drop behavior. > > Screenshot 2025-01-10 at 16 11 13 Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: Clean up ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23040/files - new: https://git.openjdk.org/jdk/pull/23040/files/a011853d..9be4df50 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23040&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23040&range=00-01 Stats: 117 lines in 1 file changed: 59 ins; 57 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23040.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23040/head:pull/23040 PR: https://git.openjdk.org/jdk/pull/23040 From mdoerr at openjdk.org Fri Jan 10 15:36:46 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 10 Jan 2025 15:36:46 GMT Subject: RFR: 8346038: [REDO] - [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: On Thu, 12 Dec 2024 13:12:45 GMT, Martin Doerr wrote: > 1st commit: Same as https://github.com/openjdk/jdk/commit/a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c > 2nd commit: Removal of special "Knights Landing" code for `lir_abs` and `lir_neg` from C1. > > Testing: > make run-test TEST="test/hotspot/jtreg/compiler" JTREG="VM_OPTIONS=-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting" > All passed. > > This is how this PR changes the C1 code on Knights Landing CPUs (emulated by -XX:+UseKNLSetting): > > `lir_abs` without this patch: > > vmovsd -0x63(%rip),%xmm1 > vpandnd %zmm0,%zmm1,%zmm0 > > > `lir_neg` without this patch: > > vmovsd -0x63(%rip),%xmm1 > vpxord %zmm0,%zmm1,%zmm0 > > > (The `vmovsd` loads the `LIR_OprFact::doubleConst(-0.0)`.) > > `lir_abs` with this patch: > > vandpd 0xa1b213d(%rip),%xmm0,%xmm0 > > > `lir_neg` with this patch: > > vxorpd 0xa12585d(%rip),%xmm0,%xmm0 > > > New code is faster on our machine (using -XX:+UseKNLSetting). @vnkozlov: This is now the version with updates from @sviswa7. Could you help finding a reviewer and making sure it gets properly tested, please? I think this cleanup makes the code better readable (and also a bit faster). I also prefer handling `UseKNLSetting` in macroAssembler_x86.cpp instead of in multiple C1 files. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22709#issuecomment-2582973107 From roland at openjdk.org Fri Jan 10 16:50:56 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 10 Jan 2025 16:50:56 GMT Subject: Integrated: 8346184: C2: assert(has_node(i)) failed during split thru phi In-Reply-To: References: Message-ID: On Wed, 18 Dec 2024 16:57:45 GMT, Roland Westrelin wrote: > The assert fires during split thru phi because a call to `Identity` > returns a new node (a constant null pointer). That happens because a > `Load`, once pushed thru phi, can be constant folded because it loads > from a newly allocated array. `Identity` shouldn't return new > nodes. When split thru phi runs, in this case, `Value` should be the > one returning constant null, not `Identity`. There is logic for that > in `LoadNode::Value` but it's after some other checks that cause > `Value` to return too early. > > To fix this, I propose reordering checks in `LoadNode::Value`. This pull request has now been integrated. Changeset: 9cf7d42b Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/9cf7d42b65cfecfe27d0267f971acb743c02b675 Stats: 93 lines in 2 files changed: 79 ins; 14 del; 0 mod 8346184: C2: assert(has_node(i)) failed during split thru phi Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/22818 From roland at openjdk.org Fri Jan 10 16:50:55 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 10 Jan 2025 16:50:55 GMT Subject: RFR: 8346184: C2: assert(has_node(i)) failed during split thru phi [v3] In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 08:52:16 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> Update src/hotspot/share/opto/memnode.cpp >> >> Co-authored-by: Christian Hagedorn > > Testing looks good! @chhagedorn @TobiHartmann thanks for testing and reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/22818#issuecomment-2583234450 From aph at openjdk.org Fri Jan 10 16:53:45 2025 From: aph at openjdk.org (Andrew Haley) Date: Fri, 10 Jan 2025 16:53:45 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value [v2] In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 13:22:50 GMT, kuaiwei wrote: >> This patch enhance MergeStores optimization to support merge value with reverse byte order. >> >> Below is benchmark result before and after the patch: >> >> On aliyun g8y (aarch64) >> |name | before | score2 | ratio | >> |---|---|---|---| >> |MergeStoreBench.setCharBS |5669.655000 |5669.566000 | 0.00 %| >> |MergeStoreBench.setCharBV |5516.911000 |5516.273000 | 0.01 %| >> |MergeStoreBench.setCharC |5578.644000 |5552.809000 | 0.47 %| >> |MergeStoreBench.setCharLS |5782.140000 |5779.264000 | 0.05 %| >> |MergeStoreBench.setCharLV |5496.403000 |5499.195000 | -0.05 %| >> |MergeStoreBench.setIntB |6087.703000 |2768.385000 | 119.90 %| >> |MergeStoreBench.setIntBU |6733.813000 |2950.240000 | 128.25 %| >> |MergeStoreBench.setIntBV |1362.233000 |1361.821000 | 0.03 %| >> |MergeStoreBench.setIntL |2834.785000 |2833.042000 | 0.06 %| >> |MergeStoreBench.setIntLU |2947.145000 |2946.874000 | 0.01 %| >> |MergeStoreBench.setIntLV |5506.791000 |5506.229000 | 0.01 %| >> |MergeStoreBench.setIntRB |7634.279000 |5611.058000 | 36.06 %| >> |MergeStoreBench.setIntRBU |7766.737000 |5551.281000 | 39.91 %| >> |MergeStoreBench.setIntRL |5689.793000 |5689.385000 | 0.01 %| >> |MergeStoreBench.setIntRLU |5628.287000 |5628.789000 | -0.01 %| >> |MergeStoreBench.setIntRU |5536.039000 |5534.910000 | 0.02 %| >> |MergeStoreBench.setIntU |5595.363000 |5567.810000 | 0.49 %| >> |MergeStoreBench.setLongB |13722.671000 |6811.098000 | 101.48 %| >> |MergeStoreBench.setLongBU |13728.844000 |4280.240000 | 220.75 %| >> |MergeStoreBench.setLongBV |2785.255000 |2785.949000 | -0.02 %| >> |MergeStoreBench.setLongL |5714.615000 |5710.402000 | 0.07 %| >> |MergeStoreBench.setLongLU |4128.746000 |4129.324000 | -0.01 %| >> |MergeStoreBench.setLongLV |2793.125000 |2794.438000 | -0.05 %| >> |MergeStoreBench.setLongRB |14465.223000 |7015.050000 | 106.20 %| >> |MergeStoreBench.setLongRBU |14546.954000 |6173.210000 | 135.65 %| >> |MergeStoreBench.setLongRL |6816.145000 |6813.348000 | 0.04 %| >> |MergeStoreBench.setLongRLU |4289.445000 |4284.239000 | 0.12 %| >> |MergeStoreBench.setLongRU |3132.471000 |3133.093000 | -0.02 %| >> |MergeStoreBench.setLongU |3086.779000 |3087.298000 | -0.02 %| >> >> AMD EPYC 9T24 >> ... > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Fix as review comments I haven't reviewed the code in detail, but I can confirm that it seems to work well, and the generated code on AArch64 look excellent. Bravo! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23030#issuecomment-2583246567 From lmesnik at openjdk.org Fri Jan 10 17:40:36 2025 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 10 Jan 2025 17:40:36 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v4] In-Reply-To: References: <3d7ezYP3-By8ArAoM6IkKGH8XHZ4VNVcviiMzMy2EPQ=.d82874c8-40e0-42ce-808c-527e44aac2dc@github.com> Message-ID: On Wed, 8 Jan 2025 12:45:48 GMT, Tobias Hartmann wrote: >> OK, I was confused by this in PR body then: >> >>> I was able to reliably reproduce the issue with compiler/arraycopy/TestArrayCopyNoInit.java and -XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:-UseCompressedClassPointers on Linux AArch64 and verified that the fix solves the problem. >> >> But fine, if it reproduces with +UCOH, let it be there. > > Ah, that's actually a typo, good catch. Should be `-XX:+UseCompactObjectHeaders`. I'll fix it in the description. Cant you please add this '@run' as a separate testcase with it's own id. So it is easier to identify and exclude the failures. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22967#discussion_r1910706521 From kvn at openjdk.org Fri Jan 10 20:06:38 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 10 Jan 2025 20:06:38 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v4] In-Reply-To: References: Message-ID: On Thu, 9 Jan 2025 10:22:12 GMT, Tobias Hartmann wrote: >> C2's arraycopy intrinsic adds guards that check that the source and destination objects are arrays: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5917-L5919 >> >> If these guards pass, the array length is loaded: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5930-L5933 >> >> But since the `LoadRangeNode` is not pinned, it might float above the array guard: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/graphKit.cpp#L1214 >> >> If the object is not an array, we will read garbage. That's usually fine because the result will not be used (the array guard will trigger) but with `-XX:+UseCompactObjectHeaders` it can happen that the memory right after the header is not mapped and we crash. >> >> The fix is to add a `CheckCastPPNode` to propagate the information that the operand is an array and prevent the load from floating. >> >> Thanks to @shipilev for identifying the root cause! >> >> I was able to reliably reproduce the issue with `compiler/arraycopy/TestArrayCopyNoInit.java` and `-XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders` on Linux AArch64 and verified that the fix solves the problem. >> >> Best regards, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Copyright date Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22967#pullrequestreview-2543636498 From kvn at openjdk.org Fri Jan 10 20:42:42 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 10 Jan 2025 20:42:42 GMT Subject: RFR: 8347407: [BACKOUT] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v3] In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 14:02:32 GMT, Damon Fenacci wrote: >> This reverts _8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash)_ (commit 633fad8). >> The fix increased the code cache but was incomplete and didn't fix the underlying issue with code cache allocation potentially crashing (for the latter we have JBS issue [JDK-8339700](https://bugs.openjdk.org/browse/JDK-8339700)). >> >> The `compiler/compilerDirectives.hpp` and `ci/ciStreams.hpp` were removed by [JDK-8345801](https://bugs.openjdk.org/browse/JDK-8345801) because the symbols needed were imported by `#include "c1/c1_Compiler.hpp"`. By removing this we need to put the 2 includes back. > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8347407: re-fix JBS issue number in problem list file Good ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23031#pullrequestreview-2543771293 From kvn at openjdk.org Fri Jan 10 20:43:40 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 10 Jan 2025 20:43:40 GMT Subject: RFR: 8346038: [REDO] - [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: On Thu, 12 Dec 2024 13:12:45 GMT, Martin Doerr wrote: > 1st commit: Same as https://github.com/openjdk/jdk/commit/a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c > 2nd commit: Removal of special "Knights Landing" code for `lir_abs` and `lir_neg` from C1. > > Testing: > make run-test TEST="test/hotspot/jtreg/compiler" JTREG="VM_OPTIONS=-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting" > All passed. > > This is how this PR changes the C1 code on Knights Landing CPUs (emulated by -XX:+UseKNLSetting): > > `lir_abs` without this patch: > > vmovsd -0x63(%rip),%xmm1 > vpandnd %zmm0,%zmm1,%zmm0 > > > `lir_neg` without this patch: > > vmovsd -0x63(%rip),%xmm1 > vpxord %zmm0,%zmm1,%zmm0 > > > (The `vmovsd` loads the `LIR_OprFact::doubleConst(-0.0)`.) > > `lir_abs` with this patch: > > vandpd 0xa1b213d(%rip),%xmm0,%xmm0 > > > `lir_neg` with this patch: > > vxorpd 0xa12585d(%rip),%xmm0,%xmm0 > > > New code is faster on our machine (using -XX:+UseKNLSetting). I will submit testing and let you know results. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22709#issuecomment-2584084833 From kvn at openjdk.org Fri Jan 10 22:03:37 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 10 Jan 2025 22:03:37 GMT Subject: RFR: 8346038: [REDO] - [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: <_9-6VH-KEelXRHjxtg2LwJcn8Z0tBZWN20MFA1FBPes=.9d5a6075-55db-4242-b6ec-dd58521b1f67@github.com> On Fri, 10 Jan 2025 15:33:31 GMT, Martin Doerr wrote: >> 1st commit: Same as https://github.com/openjdk/jdk/commit/a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c >> 2nd commit: Removal of special "Knights Landing" code for `lir_abs` and `lir_neg` from C1. >> >> Testing: >> make run-test TEST="test/hotspot/jtreg/compiler" JTREG="VM_OPTIONS=-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting" >> All passed. >> >> This is how this PR changes the C1 code on Knights Landing CPUs (emulated by -XX:+UseKNLSetting): >> >> `lir_abs` without this patch: >> >> vmovsd -0x63(%rip),%xmm1 >> vpandnd %zmm0,%zmm1,%zmm0 >> >> >> `lir_neg` without this patch: >> >> vmovsd -0x63(%rip),%xmm1 >> vpxord %zmm0,%zmm1,%zmm0 >> >> >> (The `vmovsd` loads the `LIR_OprFact::doubleConst(-0.0)`.) >> >> `lir_abs` with this patch: >> >> vandpd 0xa1b213d(%rip),%xmm0,%xmm0 >> >> >> `lir_neg` with this patch: >> >> vxorpd 0xa12585d(%rip),%xmm0,%xmm0 >> >> >> New code is faster on our machine (using -XX:+UseKNLSetting). > > @vnkozlov: This is now the version with updates from @sviswa7. Could you help finding a reviewer and making sure it gets properly tested, please? I think this cleanup makes the code better readable (and also a bit faster). I also prefer handling `UseKNLSetting` in macroAssembler_x86.cpp instead of in multiple C1 files. @TheRealMDoerr please update Copyright year while I am testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22709#issuecomment-2584435413 From kvn at openjdk.org Fri Jan 10 22:16:47 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 10 Jan 2025 22:16:47 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing [v6] In-Reply-To: <6fqwGHPo8z6GSKCjH3i0rv9OvxAqKUTTLsdh8aktG0w=.d7a7e025-d7ed-4ef5-8c8d-2a2b3deb0ee3@github.com> References: <6fqwGHPo8z6GSKCjH3i0rv9OvxAqKUTTLsdh8aktG0w=.d7a7e025-d7ed-4ef5-8c8d-2a2b3deb0ee3@github.com> Message-ID: <2LzPW0WMfjDEldr9y7IBLwYC51mnckCiKEvDoEqG-qM=.1c9c933d-2a5b-49e2-9d74-3d3619c293c8@github.com> On Thu, 9 Jan 2025 14:35:58 GMT, Theo Weidmann wrote: >> C2 currently emits runtime calls if the platform rules do not support lowering floating point remainder operations. For example, for float: >> >> https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L2305-L2318 >> >> https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L1099-L1109 >> >> The only platform, which currently supports this, however, is x86_32. On all other platforms, runtime calls are generated directly during parsing, which prevent any constant folding or other idealizations. Even C1 can perform these optimizations, resulting in significantly lower C2 performance compared to C1 for simple test cases. This function was observed to be around 15x slower with C2 compared to C1 due to redundant runtime calls: >> >> >> public static double process(final double x) { >> double w = (double) 0.1; >> double p = 0; >> p = (double) (3.109615012413746E307 % (w % Z)); >> p = (double) (7.614949555185036E307 / (x % x)); // <- return value only dependends on this line >> return (double) (x * p); >> } >> >> >> To fix this, this PR turns ModFNode and ModDNode into macros, which are always created during parsing. They support idealization (constant folding) and are lowered to runtime calls during macro expansion. For simplicity, these operations will now also call into the runtime on x86_32, as this platform is deprecated. > > Theo Weidmann has updated the pull request incrementally with three additional commits since the last revision: > > - Address comments > - Actually return top > - Update divnode.cpp Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22786#pullrequestreview-2544081976 From kvn at openjdk.org Fri Jan 10 22:39:34 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 10 Jan 2025 22:39:34 GMT Subject: RFR: 8335747: C2: fix overflow case for LoopLimit with constant inputs In-Reply-To: References: Message-ID: <-3gw5JoZFgSB0hnfYu6xMsmSuXmxOQQ-gnVmbNGAg7A=.fb319060-389a-46b0-8c78-93e039405b5d@github.com> On Fri, 10 Jan 2025 06:20:08 GMT, Emanuel Peter wrote: > `LoopLimitNode::Value` tries to constant-fold when it has constant inputs. However, there can be an overflow in the int-computation, but we check for it with `if (final_con == (jlong)final_int) {` and do not constant fold in that case. > > However, there was an `assert` that checked that such an overflow would never be encountered. We already had to make an exception for this assert during PhaseCCP with [JDK-8309266](https://bugs.openjdk.org/browse/JDK-8309266). > > Why did we not hit this assert before? > `LoopLimitNode` needs to have constant inputs. We used to assume that if the constants would lead to an overflow, then the loop-limit-check would also get similar constants, and detect that `limit <= max_int-stride` does not hold, and it would constant-fold away the loop, together with the `LoopLimitNode`. > > But now we found a second case: > https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/src/hotspot/share/opto/loopnode.cpp#L2555-L2563 > > In `PhaseIdealLoop::split_thru_phi`, we temporarily split the `LoopLimitNode` through the phi, generating a new `LoopLimitNode` for each branch of the `phi`. We then call `Value` on it to see if that leads us to constant fold one of the branches, which would be considered a "win". > https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/src/hotspot/share/opto/loopopts.cpp#L87-L105 > > In the regression test, we have this example: > https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/test/hotspot/jtreg/compiler/loopopts/TestLoopLimitOverflowDuringSplitThruPhi.java#L44-L69 > > We generate a temporary clone of `LoopLimitNode(init=0, limit=x, stride=4)` (would not constant fold because of variable `x = Phi(1000, 2147483647)`), which happens to be `LoopLimitNode(init=0, limit=2147483647, stride=4)`. We evaluate `Value` on this temporary clone, and hit the overflow case. > > Why is it ok to just remove the assert and allow `LoopLimitNode` to overflow? > We still have the loop limit check, which checks that `limit <= max_int-stride`, and this means we would never enter the loop if we took the `Phi` branch that led to the overflow. > > I could not just remove the assert, because in `LoopLimitNode::Ideal` we have this (strange?) check that does not optimize the `LoopLimitNode` if the inputs are constants: > > if (in(Init)->is_Con() && in(Limit)->is_Con()) > return nullptr; // Value > > The assumption seems to be that we want `Value` to do the constant folding here - but of course we di... Good. > I could not just remove the assert, because in LoopLimitNode::Ideal we have this (strange?) check that does not optimize the LoopLimitNode if the inputs are constants: May be we check it to not touch this loop until we fully unroll it (for small number of iterations) ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23024#pullrequestreview-2544138455 PR Comment: https://git.openjdk.org/jdk/pull/23024#issuecomment-2584586784 From mdoerr at openjdk.org Fri Jan 10 23:03:20 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 10 Jan 2025 23:03:20 GMT Subject: RFR: 8346038: [REDO] - [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: On Thu, 12 Dec 2024 13:12:45 GMT, Martin Doerr wrote: > 1st commit: Same as https://github.com/openjdk/jdk/commit/a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c > 2nd commit: Removal of special "Knights Landing" code for `lir_abs` and `lir_neg` from C1. > > Testing: > make run-test TEST="test/hotspot/jtreg/compiler" JTREG="VM_OPTIONS=-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting" > All passed. > > This is how this PR changes the C1 code on Knights Landing CPUs (emulated by -XX:+UseKNLSetting): > > `lir_abs` without this patch: > > vmovsd -0x63(%rip),%xmm1 > vpandnd %zmm0,%zmm1,%zmm0 > > > `lir_neg` without this patch: > > vmovsd -0x63(%rip),%xmm1 > vpxord %zmm0,%zmm1,%zmm0 > > > (The `vmovsd` loads the `LIR_OprFact::doubleConst(-0.0)`.) > > `lir_abs` with this patch: > > vandpd 0xa1b213d(%rip),%xmm0,%xmm0 > > > `lir_neg` with this patch: > > vxorpd 0xa12585d(%rip),%xmm0,%xmm0 > > > New code is faster on our machine (using -XX:+UseKNLSetting). Thank you! The script only updated the Copyright year in macroAssembler_x86.cpp. All other files were only changed in 2024. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22709#issuecomment-2584684172 From mdoerr at openjdk.org Fri Jan 10 23:03:20 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 10 Jan 2025 23:03:20 GMT Subject: RFR: 8346038: [REDO] - [C1] LIR Operations with one input should be implemented as LIR_Op1 [v2] In-Reply-To: References: Message-ID: <7nB330iJ42qBoROI2rfnlDj8UYaG469KueJLTfWTEng=.ec85bfcc-11f6-4c68-b076-08e4410d8a2b@github.com> > 1st commit: Same as https://github.com/openjdk/jdk/commit/a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c > 2nd commit: Removal of special "Knights Landing" code for `lir_abs` and `lir_neg` from C1. > > Testing: > make run-test TEST="test/hotspot/jtreg/compiler" JTREG="VM_OPTIONS=-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting" > All passed. > > This is how this PR changes the C1 code on Knights Landing CPUs (emulated by -XX:+UseKNLSetting): > > `lir_abs` without this patch: > > vmovsd -0x63(%rip),%xmm1 > vpandnd %zmm0,%zmm1,%zmm0 > > > `lir_neg` without this patch: > > vmovsd -0x63(%rip),%xmm1 > vpxord %zmm0,%zmm1,%zmm0 > > > (The `vmovsd` loads the `LIR_OprFact::doubleConst(-0.0)`.) > > `lir_abs` with this patch: > > vandpd 0xa1b213d(%rip),%xmm0,%xmm0 > > > `lir_neg` with this patch: > > vxorpd 0xa12585d(%rip),%xmm0,%xmm0 > > > New code is faster on our machine (using -XX:+UseKNLSetting). Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Update Copyright year. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22709/files - new: https://git.openjdk.org/jdk/pull/22709/files/5576ac95..6d8dad1c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22709&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22709&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22709.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22709/head:pull/22709 PR: https://git.openjdk.org/jdk/pull/22709 From kvn at openjdk.org Fri Jan 10 23:32:45 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 10 Jan 2025 23:32:45 GMT Subject: RFR: 8346038: [REDO] - [C1] LIR Operations with one input should be implemented as LIR_Op1 [v2] In-Reply-To: <7nB330iJ42qBoROI2rfnlDj8UYaG469KueJLTfWTEng=.ec85bfcc-11f6-4c68-b076-08e4410d8a2b@github.com> References: <7nB330iJ42qBoROI2rfnlDj8UYaG469KueJLTfWTEng=.ec85bfcc-11f6-4c68-b076-08e4410d8a2b@github.com> Message-ID: On Fri, 10 Jan 2025 23:03:20 GMT, Martin Doerr wrote: >> 1st commit: Same as https://github.com/openjdk/jdk/commit/a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c >> 2nd commit: Removal of special "Knights Landing" code for `lir_abs` and `lir_neg` from C1. >> >> Testing: >> make run-test TEST="test/hotspot/jtreg/compiler" JTREG="VM_OPTIONS=-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting" >> All passed. >> >> This is how this PR changes the C1 code on Knights Landing CPUs (emulated by -XX:+UseKNLSetting): >> >> `lir_abs` without this patch: >> >> vmovsd -0x63(%rip),%xmm1 >> vpandnd %zmm0,%zmm1,%zmm0 >> >> >> `lir_neg` without this patch: >> >> vmovsd -0x63(%rip),%xmm1 >> vpxord %zmm0,%zmm1,%zmm0 >> >> >> (The `vmovsd` loads the `LIR_OprFact::doubleConst(-0.0)`.) >> >> `lir_abs` with this patch: >> >> vandpd 0xa1b213d(%rip),%xmm0,%xmm0 >> >> >> `lir_neg` with this patch: >> >> vxorpd 0xa12585d(%rip),%xmm0,%xmm0 >> >> >> New code is faster on our machine (using -XX:+UseKNLSetting). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Update Copyright year. What is with GHA testing failure? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22709#issuecomment-2584824868 From qamai at openjdk.org Sat Jan 11 02:32:35 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 11 Jan 2025 02:32:35 GMT Subject: RFR: 8345766: C2 should emit macro nodes for ModF/ModD instead of calls during parsing [v6] In-Reply-To: <6fqwGHPo8z6GSKCjH3i0rv9OvxAqKUTTLsdh8aktG0w=.d7a7e025-d7ed-4ef5-8c8d-2a2b3deb0ee3@github.com> References: <6fqwGHPo8z6GSKCjH3i0rv9OvxAqKUTTLsdh8aktG0w=.d7a7e025-d7ed-4ef5-8c8d-2a2b3deb0ee3@github.com> Message-ID: On Thu, 9 Jan 2025 14:35:58 GMT, Theo Weidmann wrote: >> C2 currently emits runtime calls if the platform rules do not support lowering floating point remainder operations. For example, for float: >> >> https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L2305-L2318 >> >> https://github.com/openjdk/jdk/blob/fbbc7c35f422294090b8c7a02a19ab2fb67c7070/src/hotspot/share/opto/parse2.cpp#L1099-L1109 >> >> The only platform, which currently supports this, however, is x86_32. On all other platforms, runtime calls are generated directly during parsing, which prevent any constant folding or other idealizations. Even C1 can perform these optimizations, resulting in significantly lower C2 performance compared to C1 for simple test cases. This function was observed to be around 15x slower with C2 compared to C1 due to redundant runtime calls: >> >> >> public static double process(final double x) { >> double w = (double) 0.1; >> double p = 0; >> p = (double) (3.109615012413746E307 % (w % Z)); >> p = (double) (7.614949555185036E307 / (x % x)); // <- return value only dependends on this line >> return (double) (x * p); >> } >> >> >> To fix this, this PR turns ModFNode and ModDNode into macros, which are always created during parsing. They support idealization (constant folding) and are lowered to runtime calls during macro expansion. For simplicity, these operations will now also call into the runtime on x86_32, as this platform is deprecated. > > Theo Weidmann has updated the pull request incrementally with three additional commits since the last revision: > > - Address comments > - Actually return top > - Update divnode.cpp Sorry for being late here. What do you think about using a general-purpose `CallPureNode` that represents a call not reading or writing external modifiable states? Apart from `ModF` and `ModD`, there are several other nodes that may benefit from this such as the trigonometric functions, svml calls, etc. A `CallPureNode` does not have input and output control or memory, which makes it more susceptible to GVN and deadcode elimination, as well as allowing it to be more freely scheduled. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22786#issuecomment-2585016091 From kvn at openjdk.org Sat Jan 11 03:43:42 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 11 Jan 2025 03:43:42 GMT Subject: RFR: 8346038: [REDO] - [C1] LIR Operations with one input should be implemented as LIR_Op1 [v2] In-Reply-To: <7nB330iJ42qBoROI2rfnlDj8UYaG469KueJLTfWTEng=.ec85bfcc-11f6-4c68-b076-08e4410d8a2b@github.com> References: <7nB330iJ42qBoROI2rfnlDj8UYaG469KueJLTfWTEng=.ec85bfcc-11f6-4c68-b076-08e4410d8a2b@github.com> Message-ID: On Fri, 10 Jan 2025 23:03:20 GMT, Martin Doerr wrote: >> 1st commit: Same as https://github.com/openjdk/jdk/commit/a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c >> 2nd commit: Removal of special "Knights Landing" code for `lir_abs` and `lir_neg` from C1. >> >> Testing: >> make run-test TEST="test/hotspot/jtreg/compiler" JTREG="VM_OPTIONS=-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting" >> All passed. >> >> This is how this PR changes the C1 code on Knights Landing CPUs (emulated by -XX:+UseKNLSetting): >> >> `lir_abs` without this patch: >> >> vmovsd -0x63(%rip),%xmm1 >> vpandnd %zmm0,%zmm1,%zmm0 >> >> >> `lir_neg` without this patch: >> >> vmovsd -0x63(%rip),%xmm1 >> vpxord %zmm0,%zmm1,%zmm0 >> >> >> (The `vmovsd` loads the `LIR_OprFact::doubleConst(-0.0)`.) >> >> `lir_abs` with this patch: >> >> vandpd 0xa1b213d(%rip),%xmm0,%xmm0 >> >> >> `lir_neg` with this patch: >> >> vxorpd 0xa12585d(%rip),%xmm0,%xmm0 >> >> >> New code is faster on our machine (using -XX:+UseKNLSetting). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Update Copyright year. My testing passed. Changes seems fine. You need second review. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22709#pullrequestreview-2544524391 From mdoerr at openjdk.org Sat Jan 11 20:25:37 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 11 Jan 2025 20:25:37 GMT Subject: RFR: 8346038: [REDO] - [C1] LIR Operations with one input should be implemented as LIR_Op1 [v2] In-Reply-To: References: <7nB330iJ42qBoROI2rfnlDj8UYaG469KueJLTfWTEng=.ec85bfcc-11f6-4c68-b076-08e4410d8a2b@github.com> Message-ID: On Fri, 10 Jan 2025 23:30:26 GMT, Vladimir Kozlov wrote: > What is with GHA testing failure? "wget" has failed. That happens sometimes. Maybe it works after restart. Anyway, GHA had passed before the Copyright header change. Plus your and our tests. Thanks for testing and your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22709#issuecomment-2585397169 From qamai at openjdk.org Sun Jan 12 13:18:35 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 12 Jan 2025 13:18:35 GMT Subject: RFR: 8335747: C2: fix overflow case for LoopLimit with constant inputs In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 06:20:08 GMT, Emanuel Peter wrote: > `LoopLimitNode::Value` tries to constant-fold when it has constant inputs. However, there can be an overflow in the int-computation, but we check for it with `if (final_con == (jlong)final_int) {` and do not constant fold in that case. > > However, there was an `assert` that checked that such an overflow would never be encountered. We already had to make an exception for this assert during PhaseCCP with [JDK-8309266](https://bugs.openjdk.org/browse/JDK-8309266). > > Why did we not hit this assert before? > `LoopLimitNode` needs to have constant inputs. We used to assume that if the constants would lead to an overflow, then the loop-limit-check would also get similar constants, and detect that `limit <= max_int-stride` does not hold, and it would constant-fold away the loop, together with the `LoopLimitNode`. > > But now we found a second case: > https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/src/hotspot/share/opto/loopnode.cpp#L2555-L2563 > > In `PhaseIdealLoop::split_thru_phi`, we temporarily split the `LoopLimitNode` through the phi, generating a new `LoopLimitNode` for each branch of the `phi`. We then call `Value` on it to see if that leads us to constant fold one of the branches, which would be considered a "win". > https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/src/hotspot/share/opto/loopopts.cpp#L87-L105 > > In the regression test, we have this example: > https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/test/hotspot/jtreg/compiler/loopopts/TestLoopLimitOverflowDuringSplitThruPhi.java#L44-L69 > > We generate a temporary clone of `LoopLimitNode(init=0, limit=x, stride=4)` (would not constant fold because of variable `x = Phi(1000, 2147483647)`), which happens to be `LoopLimitNode(init=0, limit=2147483647, stride=4)`. We evaluate `Value` on this temporary clone, and hit the overflow case. > > Why is it ok to just remove the assert and allow `LoopLimitNode` to overflow? > We still have the loop limit check, which checks that `limit <= max_int-stride`, and this means we would never enter the loop if we took the `Phi` branch that led to the overflow. > > I could not just remove the assert, because in `LoopLimitNode::Ideal` we have this (strange?) check that does not optimize the `LoopLimitNode` if the inputs are constants: > > if (in(Init)->is_Con() && in(Limit)->is_Con()) > return nullptr; // Value > > The assumption seems to be that we want `Value` to do the constant folding here - but of course we di... > I think this check can reasonably be removed, because `Value` should be called before `Ideal` anyway, and so if we can constant fold because of constant inputs, we would have already done so. I think you are mistaken here, `Ideal` is called before `Value`. > Why is it ok to just remove the assert and allow `LoopLimitNode` to overflow? > We still have the loop limit check, which checks that `limit <= max_int-stride`, and this means we would never enter the loop if we took the `Phi` branch that led to the overflow. In this case, can we return `Type::TOP`, so that if this assumption is false we will get an error? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23024#issuecomment-2585731038 From qamai at openjdk.org Sun Jan 12 13:48:03 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 12 Jan 2025 13:48:03 GMT Subject: RFR: 8347481: C2: Remove the control input of some nodes Message-ID: Hi, While working on [JDK-8347365](https://bugs.openjdk.org/browse/JDK-8347365), I noticed that there are some nodes that have their control inputs being set in a seemingly erroneous manner. This patch removes the control inputs for those nodes. Please review this PR, thanks a lot. ------------- Commit messages: - remove control inputs from several nodes Changes: https://git.openjdk.org/jdk/pull/23055/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23055&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347481 Stats: 58 lines in 8 files changed: 0 ins; 2 del; 56 mod Patch: https://git.openjdk.org/jdk/pull/23055.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23055/head:pull/23055 PR: https://git.openjdk.org/jdk/pull/23055 From qxing at openjdk.org Mon Jan 13 01:16:57 2025 From: qxing at openjdk.org (Qizheng Xing) Date: Mon, 13 Jan 2025 01:16:57 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops Message-ID: In `PhaseIdealLoop`, `IdealLoopTree::check_safepts` method checks if any call that is guaranteed to have a safepoint dominates the tail of the loop. In the previous implementation, `check_safepts` would stop if it found a local non-call safepoint. At this time, if there was a call before the safepoint in the dom-path, this safepoint would not be eliminated. loop-safepoint This patch changes the behavior of `check_safepts` to not stop when it finds a non-local safepoint. This makes simple loops with one method call ~3.8% faster (on aarch64). Benchmark Mode Cnt Score Error Units LoopSafepoint.loopVar avgt 15 208296.259 ? 1350.409 ns/op # baseline LoopSafepoint.loopVar avgt 15 200692.874 ? 616.770 ns/op # this patch Testing: tier1-2 on x86_64 and aarch64. ------------- Commit messages: - Add IR test and microbench. - Make `PhaseIdealLoop` eliminate more redundant safepoints in loops. Changes: https://git.openjdk.org/jdk/pull/23057/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23057&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347499 Stats: 166 lines in 3 files changed: 162 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23057.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23057/head:pull/23057 PR: https://git.openjdk.org/jdk/pull/23057 From amitkumar at openjdk.org Mon Jan 13 04:01:41 2025 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 13 Jan 2025 04:01:41 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value [v2] In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 13:22:50 GMT, kuaiwei wrote: >> This patch enhance MergeStores optimization to support merge value with reverse byte order. >> >> Below is benchmark result before and after the patch: >> >> On aliyun g8y (aarch64) >> |name | before | score2 | ratio | >> |---|---|---|---| >> |MergeStoreBench.setCharBS |5669.655000 |5669.566000 | 0.00 %| >> |MergeStoreBench.setCharBV |5516.911000 |5516.273000 | 0.01 %| >> |MergeStoreBench.setCharC |5578.644000 |5552.809000 | 0.47 %| >> |MergeStoreBench.setCharLS |5782.140000 |5779.264000 | 0.05 %| >> |MergeStoreBench.setCharLV |5496.403000 |5499.195000 | -0.05 %| >> |MergeStoreBench.setIntB |6087.703000 |2768.385000 | 119.90 %| >> |MergeStoreBench.setIntBU |6733.813000 |2950.240000 | 128.25 %| >> |MergeStoreBench.setIntBV |1362.233000 |1361.821000 | 0.03 %| >> |MergeStoreBench.setIntL |2834.785000 |2833.042000 | 0.06 %| >> |MergeStoreBench.setIntLU |2947.145000 |2946.874000 | 0.01 %| >> |MergeStoreBench.setIntLV |5506.791000 |5506.229000 | 0.01 %| >> |MergeStoreBench.setIntRB |7634.279000 |5611.058000 | 36.06 %| >> |MergeStoreBench.setIntRBU |7766.737000 |5551.281000 | 39.91 %| >> |MergeStoreBench.setIntRL |5689.793000 |5689.385000 | 0.01 %| >> |MergeStoreBench.setIntRLU |5628.287000 |5628.789000 | -0.01 %| >> |MergeStoreBench.setIntRU |5536.039000 |5534.910000 | 0.02 %| >> |MergeStoreBench.setIntU |5595.363000 |5567.810000 | 0.49 %| >> |MergeStoreBench.setLongB |13722.671000 |6811.098000 | 101.48 %| >> |MergeStoreBench.setLongBU |13728.844000 |4280.240000 | 220.75 %| >> |MergeStoreBench.setLongBV |2785.255000 |2785.949000 | -0.02 %| >> |MergeStoreBench.setLongL |5714.615000 |5710.402000 | 0.07 %| >> |MergeStoreBench.setLongLU |4128.746000 |4129.324000 | -0.01 %| >> |MergeStoreBench.setLongLV |2793.125000 |2794.438000 | -0.05 %| >> |MergeStoreBench.setLongRB |14465.223000 |7015.050000 | 106.20 %| >> |MergeStoreBench.setLongRBU |14546.954000 |6173.210000 | 135.65 %| >> |MergeStoreBench.setLongRL |6816.145000 |6813.348000 | 0.04 %| >> |MergeStoreBench.setLongRLU |4289.445000 |4284.239000 | 0.12 %| >> |MergeStoreBench.setLongRU |3132.471000 |3133.093000 | -0.02 %| >> |MergeStoreBench.setLongU |3086.779000 |3087.298000 | -0.02 %| >> >> AMD EPYC 9T24 >> ... > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > Fix as review comments src/hotspot/share/opto/memnode.cpp line 2799: > 2797: // Forward -> Forward > 2798: // Backward -> Backward > 2799: enum ValueOrder { Unknown, Forward, Backward }; can we update it to: Suggestion: enum ValueOrder : uint8_t { Unknown, Forward, Backward }; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1912642160 From thartmann at openjdk.org Mon Jan 13 06:11:36 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 13 Jan 2025 06:11:36 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v4] In-Reply-To: References: Message-ID: <_NHQ3uM6mAOsnk_h6vMfR_7rScZq0_utgLE37hTBakM=.f00fe780-d4fb-450a-a474-269294f50e07@github.com> On Wed, 8 Jan 2025 12:17:14 GMT, Roland Westrelin wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Copyright date > > Looks good to me. Thanks for the review, Vladimir! @rwestrel are you okay with the changes as well? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22967#issuecomment-2586236419 From epeter at openjdk.org Mon Jan 13 07:41:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Jan 2025 07:41:39 GMT Subject: RFR: 8335747: C2: fix overflow case for LoopLimit with constant inputs In-Reply-To: References: Message-ID: On Sun, 12 Jan 2025 13:16:07 GMT, Quan Anh Mai wrote: >> `LoopLimitNode::Value` tries to constant-fold when it has constant inputs. However, there can be an overflow in the int-computation, but we check for it with `if (final_con == (jlong)final_int) {` and do not constant fold in that case. >> >> However, there was an `assert` that checked that such an overflow would never be encountered. We already had to make an exception for this assert during PhaseCCP with [JDK-8309266](https://bugs.openjdk.org/browse/JDK-8309266). >> >> Why did we not hit this assert before? >> `LoopLimitNode` needs to have constant inputs. We used to assume that if the constants would lead to an overflow, then the loop-limit-check would also get similar constants, and detect that `limit <= max_int-stride` does not hold, and it would constant-fold away the loop, together with the `LoopLimitNode`. >> >> But now we found a second case: >> https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/src/hotspot/share/opto/loopnode.cpp#L2555-L2563 >> >> In `PhaseIdealLoop::split_thru_phi`, we temporarily split the `LoopLimitNode` through the phi, generating a new `LoopLimitNode` for each branch of the `phi`. We then call `Value` on it to see if that leads us to constant fold one of the branches, which would be considered a "win". >> https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/src/hotspot/share/opto/loopopts.cpp#L87-L105 >> >> In the regression test, we have this example: >> https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/test/hotspot/jtreg/compiler/loopopts/TestLoopLimitOverflowDuringSplitThruPhi.java#L44-L69 >> >> We generate a temporary clone of `LoopLimitNode(init=0, limit=x, stride=4)` (would not constant fold because of variable `x = Phi(1000, 2147483647)`), which happens to be `LoopLimitNode(init=0, limit=2147483647, stride=4)`. We evaluate `Value` on this temporary clone, and hit the overflow case. >> >> Why is it ok to just remove the assert and allow `LoopLimitNode` to overflow? >> We still have the loop limit check, which checks that `limit <= max_int-stride`, and this means we would never enter the loop if we took the `Phi` branch that led to the overflow. >> >> I could not just remove the assert, because in `LoopLimitNode::Ideal` we have this (strange?) check that does not optimize the `LoopLimitNode` if the inputs are constants: >> >> if (in(Init)->is_Con() && in(Limit)->is_Con()) >> return nullptr; // Value >> >> The assumption seems to be that we want `Value`... > >> I think this check can reasonably be removed, because `Value` should be called before `Ideal` anyway, and so if we can constant fold because of constant inputs, we would have already done so. > > I think you are mistaken here, `Ideal` is called before `Value`. > >> Why is it ok to just remove the assert and allow `LoopLimitNode` to overflow? >> We still have the loop limit check, which checks that `limit <= max_int-stride`, and this means we would never enter the loop if we took the `Phi` branch that led to the overflow. > > In this case, can we return `Type::TOP`, so that if this assumption is false we will get an error? @merykitty > I think you are mistaken here, Ideal is called before Value. Yikes, I think you are right. Still, the lowering happens only after post-loop-opts phase, so the Value optimization could have constant folded it in most cases by then. I hope that is good enough. The alternative is to test-run `Value` during `Ideal`, and check if it would constant fold... but that is a little hacky too. > In this case, can we return Type::TOP, so that if this assumption is false we will get an error? Hmm, I'm not sure. I'm a little worried about the if here: `int x = flag ? 1000 : 2147483647; ` The `LoopLimitNode` gets split through the `Phi` of the `Region` of this `If`. If the `LoopLimitNode` constant folds to TOP on the right branch, then the phi collapses. But the `If` here does not need to collapse. Indeed: we can take the right branch of the if, but we just cannot enter the loop after having taken it. Also: `LoopLimitNode::Ideal` generates nodes in the lowering. Those could in principle also reach an overflow case, and in some strange case this could later constant fold. Then we would not get TOP either... I think we should give back a valid int value or range for the `Value` case as well, and not TOP. I would rather do that then possibly mess up the graph. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23024#issuecomment-2586401300 From dfenacci at openjdk.org Mon Jan 13 08:02:40 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 13 Jan 2025 08:02:40 GMT Subject: RFR: 8347407: [BACKOUT] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v3] In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 20:39:31 GMT, Vladimir Kozlov wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8347407: re-fix JBS issue number in problem list file > > Good Thanks @vnkozlov for your review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23031#issuecomment-2586428533 From dfenacci at openjdk.org Mon Jan 13 08:02:41 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 13 Jan 2025 08:02:41 GMT Subject: Integrated: 8347407: [BACKOUT] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 09:34:19 GMT, Damon Fenacci wrote: > This reverts _8326615: C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash)_ (commit 633fad8). > The fix increased the code cache but was incomplete and didn't fix the underlying issue with code cache allocation potentially crashing (for the latter we have JBS issue [JDK-8339700](https://bugs.openjdk.org/browse/JDK-8339700)). > > The `compiler/compilerDirectives.hpp` and `ci/ciStreams.hpp` were removed by [JDK-8345801](https://bugs.openjdk.org/browse/JDK-8345801) because the symbols needed were imported by `#include "c1/c1_Compiler.hpp"`. By removing this we need to put the 2 includes back. This pull request has now been integrated. Changeset: b37f1236 Author: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/b37f12362507fb2cd291a2b44b4777ba76efd35e Stats: 44 lines in 9 files changed: 5 ins; 26 del; 13 mod 8347407: [BACKOUT] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/23031 From chagedorn at openjdk.org Mon Jan 13 08:11:47 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 Jan 2025 08:11:47 GMT Subject: Integrated: 8344035: Replace predicate walking code in Loop Unswitching with a predicate visitor In-Reply-To: References: Message-ID: <4PY4M8boFXS_H8DZIwUrUBW8As_wqw8NCMMfGbeCGqs=.6b7f6504-306e-497c-880b-aee72f1e1502@github.com> On Thu, 19 Dec 2024 13:56:41 GMT, Christian Hagedorn wrote: > This patch is a follow up to the clean-ups done with [JDK-8342945](https://bugs.openjdk.org/browse/JDK-8342945) and introduces a new predicate visitor for Loop Unswitching to update the last remaining custom predicate cloning code. > > This patch includes the following: > > - New `CloneUnswitchedLoopPredicatesVisitor` class which delegates the cloning work to a new `ClonePredicateToTargetLoop` class. > - We walk the predicate chain in the `PredicateIterator` and call the `CloneUnswitchedLoopPredicatesVisitor` for each visited predicate. Then we clone the predicate on the fly to the target loop. > - New `ClonePredicateToTargetLoop` class: > - Clones Parse Predicates > - Clones Template Assertion Predicates > - Includes rewiring of control dependent data nodes > - Rewires the cloned predicates to the target loop with new `TargetLoopPredicateChain` class: > - Keeps track of the current chain head, which is the target loop itself when the chain is still empty. > - Each time a new predicate is inserted at the target loop, the old predicate chain head is set as output of the new predicate. > - An example is shown as class comment at `TargetLoopPredicateChain`. > - I plan to reuse this class later again when also updating `CreateAssertionPredicatesVisitor` which is done when we tackle the actual still remaining Assertion Predicate bugs. > - Removal of custom predicate cloning code found in `PhaseIdealLoop`. > - Changed steps performed in Loop Unswitching from: > 1. Clone loop > 2. Clone predicates and insert them below the unswitched loop selector If projections > 3. Connect the cloned predicates to the unswitched loops > > to: > > 1. Clone loop > 2. Connect unswitched loop selector If projections to unswitched loops such that they are now the new loop entries > 3. Clone predicates and insert them between the unswitched loop selector If projections and the unswitched loops > - Rename/update `get_template_assertion_predicates()`/`TemplateAssertionPredicateCollector` to reflect the only use left. > > Thanks, > Christian This pull request has now been integrated. Changeset: ed0b5556 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/ed0b5556276cd8bb5e4a4d1f34a49c4442e2a34e Stats: 443 lines in 7 files changed: 227 ins; 190 del; 26 mod 8344035: Replace predicate walking code in Loop Unswitching with a predicate visitor Reviewed-by: roland, kvn ------------- PR: https://git.openjdk.org/jdk/pull/22828 From roland at openjdk.org Mon Jan 13 08:14:40 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 Jan 2025 08:14:40 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v4] In-Reply-To: References: Message-ID: On Thu, 9 Jan 2025 10:22:12 GMT, Tobias Hartmann wrote: >> C2's arraycopy intrinsic adds guards that check that the source and destination objects are arrays: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5917-L5919 >> >> If these guards pass, the array length is loaded: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5930-L5933 >> >> But since the `LoadRangeNode` is not pinned, it might float above the array guard: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/graphKit.cpp#L1214 >> >> If the object is not an array, we will read garbage. That's usually fine because the result will not be used (the array guard will trigger) but with `-XX:+UseCompactObjectHeaders` it can happen that the memory right after the header is not mapped and we crash. >> >> The fix is to add a `CheckCastPPNode` to propagate the information that the operand is an array and prevent the load from floating. >> >> Thanks to @shipilev for identifying the root cause! >> >> I was able to reliably reproduce the issue with `compiler/arraycopy/TestArrayCopyNoInit.java` and `-XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders` on Linux AArch64 and verified that the fix solves the problem. >> >> Best regards, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Copyright date Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22967#pullrequestreview-2545954242 From djelinski at openjdk.org Mon Jan 13 08:17:04 2025 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Mon, 13 Jan 2025 08:17:04 GMT Subject: Integrated: 8345471: Clean up compiler/intrinsics/sha/cli tests In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 15:48:52 GMT, Daniel Jeli?ski wrote: > Merge all the GenericTestCaseForUnsupportedXXXCPU and GenericTestCaseForOtherCPU into GenericTestCaseForUnsupportedCPU.java. > > The CPU-specific files are almost identical; I chose to resolve the differences in favor of the AArch64 version. The OtherCPU version looks wrong, and it wasn't executed on any supported platform. > > The tests continue to pass on linux-aarch64/x64, windows-x64 and mac-aarch64. I didn't test other platforms. > > After the change, the tests will start running on PPC and S390. They will also automatically run on any new architectures. > > For those interested in historical background, when the tests were introduced, there were only 2 supported CPU architectures. X86 did not support any of the intrinsics, and the X86 test case did not even call `getPredicateForOption`. The call to `getPredicateForOption` was added in f2e9b827d699115f8683e9def06c249e5476fd50, and since then all the cases are the same. This pull request has now been integrated. Changeset: 3b9732ed Author: Daniel Jeli?ski URL: https://git.openjdk.org/jdk/commit/3b9732edc6dd22868634166678d220bf1066e5be Stats: 628 lines in 11 files changed: 114 ins; 497 del; 17 mod 8345471: Clean up compiler/intrinsics/sha/cli tests Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/22517 From jbhateja at openjdk.org Mon Jan 13 09:06:12 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 13 Jan 2025 09:06:12 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v10] In-Reply-To: References: Message-ID: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 9. X86 backend implementation for all supported intrinsics. > 10. Functional and Performance validation tests. > > Kindly review the patch and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22754/files - new: https://git.openjdk.org/jdk/pull/22754/files/175f4ed2..43aa3eb7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22754&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22754&range=08-09 Stats: 22 lines in 5 files changed: 5 ins; 2 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/22754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22754/head:pull/22754 PR: https://git.openjdk.org/jdk/pull/22754 From jbhateja at openjdk.org Mon Jan 13 09:06:13 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 13 Jan 2025 09:06:13 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v9] In-Reply-To: <_SCKY9fuTqNDfR6K1y-FuMvursDMuOx39sKrXMj0Tdg=.225da2f1-fcdc-4418-a753-6d7404b4a83e@github.com> References: <_SCKY9fuTqNDfR6K1y-FuMvursDMuOx39sKrXMj0Tdg=.225da2f1-fcdc-4418-a753-6d7404b4a83e@github.com> Message-ID: <88_pE_E7P1iOkpSUuLuou6wH9UxWvPx83MFo033dY2Y=.d942086a-e87f-45dd-8c1d-72b8fd9c85d6@github.com> On Thu, 9 Jan 2025 13:13:30 GMT, Emanuel Peter wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: >> >> Updating copyright year of modified files. > > src/hotspot/share/opto/superword.cpp line 2567: > >> 2565: // half float to float, in such a case back propagation of narrow type (SHORT) >> 2566: // may not be possible. >> 2567: if (n->Opcode() == Op_ConvF2HF || n->Opcode() == Op_ReinterpretHF2S) { > > Is this relevant, or does that belong to a different (vector) RFE? It makes sure to assign a SHORT container type to the ReinterpretHF2S node which could be succeeded by a ConvHF2F IR which expects its inputs to be of SHORT type. During early phase of SLP extraction we get into a control flow querying the implemented vector IR opcode through split_packs_only_implemented_with_smaller_size https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/vectornode.cpp#L1446 This scenario is tested by following JTREG [test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorConvChain.java](https://github.com/openjdk/jdk/pull/22754/files#diff-7e7404a977d8ca567f8005b80bd840ea2e722c022e7187fa2dd21df4a5837faaR49) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1912858395 From jbhateja at openjdk.org Mon Jan 13 09:06:14 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 13 Jan 2025 09:06:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v9] In-Reply-To: References: <_SCKY9fuTqNDfR6K1y-FuMvursDMuOx39sKrXMj0Tdg=.225da2f1-fcdc-4418-a753-6d7404b4a83e@github.com> Message-ID: On Thu, 9 Jan 2025 19:22:35 GMT, Paul Sandoz wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16.java line 1434: >> >>> 1432: return float16ToRawShortBits(valueOf(product + float16ToFloat(f16c))); >>> 1433: }); >>> 1434: return shortBitsToFloat16(res); >> >> I don't understand what is happening here. But I leave this to @PaulSandoz to review > > Uncertain on what bits, but i am guessing it's mostly related to the fallback code in the lambda. To avoid the intrinsics operating on Float16 instances we instead "unpack" the carrier (16bits) values and pass those as arguments to the intrinsic. The fallback (when intrinsification is not supported) also accepts those carrier values as arguments and we convert the carriers to floats, operate on then, convert to the carrier, and then back to float16 on the result. > > The code in the lambda could potentially be simplified if `Float16Math.fma` accepted six arguments the first three being the carrier values used by the intrinsic, and the subsequent three being the float16 values used by the fallback. Then we could express the code in the original source in the lambda. I believe when intrinsified there would be no penalty for those extra arguments. Hi @PaulSandoz , In the current scheme we are passing unboxed carriers to intrinsic entry point, in the fallback implementation carrier type is first converted to floating point value using Float.float16ToFloat API which expects to receive a short type argument, after the operation we again convert float value to carrier type (short) using Float.floatToFloat16 API which expects a float argument, thus our intent here is to perform unboxing and boxing outside the intrinsic thereby avoiding all complexities around boxing by compiler. Even if we pass 3 additional parameters we still need to use Float16.floatValue which invokes Float.float16ToFloat underneath, thus this minor modification on Java side is on account of optimizing the intrinsic interface. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1912858286 From rcastanedalo at openjdk.org Mon Jan 13 09:33:59 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 13 Jan 2025 09:33:59 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 09:59:44 GMT, Roberto Casta?eda Lozano wrote: > Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands, which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). > > This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::has_initial_implicit_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: > > ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) > > Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (measured in percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo chopin benchmarks: > > ![C2-inc-hit-rate-jdk-25+1-vs-jdk-25+1-with-8345067](https://github.com/user-attachments/assets/8d114058-c6b2-4254-a374-0d0b220af718) > > The larger number of implicit null checks results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. > > A further extension of the optimization to arbitrary memory access instructions (including e.g. G1 object stores, which emit multiple memory accesses at arbitrary address offsets) will be investigated separately as part of [JDK-8344627](https://bugs.openjdk.org/browse/JDK-8344627). > > #### Testing > - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). Putting this PR on hold until the related RFE [JDK-8341611](https://bugs.openjdk.org/browse/JDK-8341611) (with PR https://github.com/openjdk/jdk/pull/22862 under review) is resolved. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22678#issuecomment-2586601599 From jbhateja at openjdk.org Mon Jan 13 09:51:28 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 13 Jan 2025 09:51:28 GMT Subject: RFR: 8347422: Crash during safepoint handler execution with -XX:+UseAPX Message-ID: This bug fix patch fixes an internal error seen during safepoint handler execution. The problem occurs due to missing handling for REX2 prefixed polling test instruction. Manually verified the patch with the -XX:+SafepointALot runtime flag. Best Regards, Jatin PS: Patch will be opened for review after some validation. ------------- Commit messages: - 8347422: Crash during safepoint handler execution Changes: https://git.openjdk.org/jdk/pull/23035/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23035&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347422 Stats: 23 lines in 1 file changed: 20 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/23035.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23035/head:pull/23035 PR: https://git.openjdk.org/jdk/pull/23035 From thartmann at openjdk.org Mon Jan 13 09:52:51 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 13 Jan 2025 09:52:51 GMT Subject: RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic [v4] In-Reply-To: References: Message-ID: On Thu, 9 Jan 2025 10:22:12 GMT, Tobias Hartmann wrote: >> C2's arraycopy intrinsic adds guards that check that the source and destination objects are arrays: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5917-L5919 >> >> If these guards pass, the array length is loaded: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5930-L5933 >> >> But since the `LoadRangeNode` is not pinned, it might float above the array guard: >> https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/graphKit.cpp#L1214 >> >> If the object is not an array, we will read garbage. That's usually fine because the result will not be used (the array guard will trigger) but with `-XX:+UseCompactObjectHeaders` it can happen that the memory right after the header is not mapped and we crash. >> >> The fix is to add a `CheckCastPPNode` to propagate the information that the operand is an array and prevent the load from floating. >> >> Thanks to @shipilev for identifying the root cause! >> >> I was able to reliably reproduce the issue with `compiler/arraycopy/TestArrayCopyNoInit.java` and `-XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders` on Linux AArch64 and verified that the fix solves the problem. >> >> Best regards, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Copyright date Thanks again, Roland! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22967#issuecomment-2586636675 From thartmann at openjdk.org Mon Jan 13 09:52:52 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 13 Jan 2025 09:52:52 GMT Subject: Integrated: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic In-Reply-To: References: Message-ID: On Wed, 8 Jan 2025 12:07:16 GMT, Tobias Hartmann wrote: > C2's arraycopy intrinsic adds guards that check that the source and destination objects are arrays: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5917-L5919 > > If these guards pass, the array length is loaded: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/library_call.cpp#L5930-L5933 > > But since the `LoadRangeNode` is not pinned, it might float above the array guard: > https://github.com/openjdk/jdk/blob/afe543414f58a04832d4f07dea88881d64954a0b/src/hotspot/share/opto/graphKit.cpp#L1214 > > If the object is not an array, we will read garbage. That's usually fine because the result will not be used (the array guard will trigger) but with `-XX:+UseCompactObjectHeaders` it can happen that the memory right after the header is not mapped and we crash. > > The fix is to add a `CheckCastPPNode` to propagate the information that the operand is an array and prevent the load from floating. > > Thanks to @shipilev for identifying the root cause! > > I was able to reliably reproduce the issue with `compiler/arraycopy/TestArrayCopyNoInit.java` and `-XX:-UseTLAB -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders` on Linux AArch64 and verified that the fix solves the problem. > > Best regards, > Tobias This pull request has now been integrated. Changeset: 82e2a791 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/82e2a791225a289ba32360bf415274c4b48b9e00 Stats: 58 lines in 5 files changed: 14 ins; 0 del; 44 mod 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic Reviewed-by: roland, qamai, kvn ------------- PR: https://git.openjdk.org/jdk/pull/22967 From jbhateja at openjdk.org Mon Jan 13 09:54:39 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 13 Jan 2025 09:54:39 GMT Subject: RFR: 8342393: Promote commutative vector IR node sharing [v4] In-Reply-To: References: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> Message-ID: On Wed, 8 Jan 2025 13:02:32 GMT, Emanuel Peter wrote: >> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. > > Looks promising, thanks for the work! Hi @eme64, Your comments have been addressed. Kindly verify. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22863#issuecomment-2586644050 From thartmann at openjdk.org Mon Jan 13 10:03:38 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 13 Jan 2025 10:03:38 GMT Subject: [jdk24] RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic Message-ID: <-9ijGzFfYh1nwsa9JmJPDLbujCJUTAwUNffcu0XcA1g=.346d2d7a-49fa-4e44-ad63-948ae96550a3@github.com> Hi all, This pull request contains a backport of commit [82e2a791](https://github.com/openjdk/jdk/commit/82e2a791225a289ba32360bf415274c4b48b9e00) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Tobias Hartmann on 13 Jan 2025 and was reviewed by Roland Westrelin, Quan Anh Mai and Vladimir Kozlov. Thanks! ------------- Commit messages: - Backport 82e2a791225a289ba32360bf415274c4b48b9e00 Changes: https://git.openjdk.org/jdk/pull/23063/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23063&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347006 Stats: 58 lines in 5 files changed: 14 ins; 0 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/23063.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23063/head:pull/23063 PR: https://git.openjdk.org/jdk/pull/23063 From chagedorn at openjdk.org Mon Jan 13 10:21:50 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 Jan 2025 10:21:50 GMT Subject: [jdk24] RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic In-Reply-To: <-9ijGzFfYh1nwsa9JmJPDLbujCJUTAwUNffcu0XcA1g=.346d2d7a-49fa-4e44-ad63-948ae96550a3@github.com> References: <-9ijGzFfYh1nwsa9JmJPDLbujCJUTAwUNffcu0XcA1g=.346d2d7a-49fa-4e44-ad63-948ae96550a3@github.com> Message-ID: <0Ne6wgfK-3iq9UW6gnlTcAeb1inawpuPZAj9Fqa7ArI=.d8779111-c32e-44f7-8705-888ef669d8b5@github.com> On Mon, 13 Jan 2025 09:57:01 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [82e2a791](https://github.com/openjdk/jdk/commit/82e2a791225a289ba32360bf415274c4b48b9e00) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Tobias Hartmann on 13 Jan 2025 and was reviewed by Roland Westrelin, Quan Anh Mai and Vladimir Kozlov. > > Thanks! Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23063#pullrequestreview-2546218675 From thartmann at openjdk.org Mon Jan 13 10:27:50 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 13 Jan 2025 10:27:50 GMT Subject: [jdk24] RFR: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic In-Reply-To: <-9ijGzFfYh1nwsa9JmJPDLbujCJUTAwUNffcu0XcA1g=.346d2d7a-49fa-4e44-ad63-948ae96550a3@github.com> References: <-9ijGzFfYh1nwsa9JmJPDLbujCJUTAwUNffcu0XcA1g=.346d2d7a-49fa-4e44-ad63-948ae96550a3@github.com> Message-ID: <2uNuo6we9RJ4orQPuGCa-GGyTSgHvsGdfH4nRsjqUEQ=.db8bb385-bdc4-4870-ad08-b09b85afafc1@github.com> On Mon, 13 Jan 2025 09:57:01 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [82e2a791](https://github.com/openjdk/jdk/commit/82e2a791225a289ba32360bf415274c4b48b9e00) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Tobias Hartmann on 13 Jan 2025 and was reviewed by Roland Westrelin, Quan Anh Mai and Vladimir Kozlov. > > Thanks! Thanks for the review, Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23063#issuecomment-2586720650 From thartmann at openjdk.org Mon Jan 13 10:30:35 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 13 Jan 2025 10:30:35 GMT Subject: RFR: 8347422: Crash during safepoint handler execution with -XX:+UseAPX In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 13:24:21 GMT, Jatin Bhateja wrote: > This bug fix patch fixes an internal error seen during safepoint handler execution. > The problem occurs due to missing handling for REX2 prefixed polling test instruction. > > Manually verified the patch with the -XX:+SafepointALot runtime flag. > > Best Regards, > Jatin > PS: Patch will be opened for review after some validation. Hi @jatin-bhateja, please see my questions in JBS. Should we add a regression test for this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23035#issuecomment-2586725946 From jbhateja at openjdk.org Mon Jan 13 11:13:33 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 13 Jan 2025 11:13:33 GMT Subject: RFR: 8347422: Crash during safepoint handler execution with -XX:+UseAPX In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 10:27:39 GMT, Tobias Hartmann wrote: > Hi @jatin-bhateja, please see my questions in JBS. Should we add a regression test for this? Hi Tobias, To stress the validation I generally give preference to EGPRs by changing the static allocation order defined in AD file. We already have regressions in place for this in the following directory. https://github.com/openjdk/jdk/tree/master/test/hotspot/jtreg/compiler/c2/cr6340864 Please make sure the use debug build to verify the fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23035#issuecomment-2586808593 From dfenacci at openjdk.org Mon Jan 13 11:32:49 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 13 Jan 2025 11:32:49 GMT Subject: [jdk24] RFR: 8347407: [BACKOUT] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) Message-ID: <6zBwDnhH5Ox-I5h-cyrBHf_o_Em6d16ixMVT8jxl8BE=.a9736d27-e6f8-43ca-9b8e-448ebe730fdc@github.com> Hi all, This pull request contains a backport of commit [b37f1236](https://github.com/openjdk/jdk/commit/b37f12362507fb2cd291a2b44b4777ba76efd35e) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Damon Fenacci on 13 Jan 2025 and was reviewed by Tobias Hartmann and Vladimir Kozlov. Thanks! ------------- Commit messages: - JDK-8347517: [BACKOUT] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) Changes: https://git.openjdk.org/jdk/pull/23065/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23065&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347407 Stats: 42 lines in 9 files changed: 3 ins; 26 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/23065.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23065/head:pull/23065 PR: https://git.openjdk.org/jdk/pull/23065 From thartmann at openjdk.org Mon Jan 13 11:35:35 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 13 Jan 2025 11:35:35 GMT Subject: RFR: 8347422: Crash during safepoint handler execution with -XX:+UseAPX In-Reply-To: References: Message-ID: <0C5TGgbOGV0xQnTEbtqaZw00hJ9JJneIQrJ91pTHCDc=.e49fc21d-77bb-4825-a5ac-e67f8a0c082a@github.com> On Fri, 10 Jan 2025 13:24:21 GMT, Jatin Bhateja wrote: > This bug fix patch fixes an internal error seen during safepoint handler execution. > The problem occurs due to missing handling for REX2 prefixed polling test instruction. > > Manually verified the patch with the -XX:+SafepointALot runtime flag. > > Best Regards, > Jatin > PS: Patch will be opened for review after some validation. Thanks Jatin. In this case I would suggest that we should add a VM stress flag to give preference to EGPRs, or maybe [JDK-8343294](https://bugs.openjdk.org/browse/JDK-8343294) is good enough? Do the existing tests trigger this even without `-XX:+SafepointALot`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23035#issuecomment-2586861370 From dfenacci at openjdk.org Mon Jan 13 11:58:21 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 13 Jan 2025 11:58:21 GMT Subject: [jdk24] RFR: 8347407: [BACKOUT] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v2] In-Reply-To: <6zBwDnhH5Ox-I5h-cyrBHf_o_Em6d16ixMVT8jxl8BE=.a9736d27-e6f8-43ca-9b8e-448ebe730fdc@github.com> References: <6zBwDnhH5Ox-I5h-cyrBHf_o_Em6d16ixMVT8jxl8BE=.a9736d27-e6f8-43ca-9b8e-448ebe730fdc@github.com> Message-ID: > Hi all, > > This pull request contains a backport of commit [b37f1236](https://github.com/openjdk/jdk/commit/b37f12362507fb2cd291a2b44b4777ba76efd35e) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Damon Fenacci on 13 Jan 2025 and was reviewed by Tobias Hartmann and Vladimir Kozlov. > > Includes that were re-added by the original backout in these 2 files > `src/hotspot/share/c1/c1_Compilation.hpp` > `src/hotspot/share/c1/c1_IR.hpp` > could not be cleanly applied but are not needed in the backport as the change happened after the jdk24 branch. > > Thanks! Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: - Update c1_IR.hpp - Update c1_Compilation.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23065/files - new: https://git.openjdk.org/jdk/pull/23065/files/e82341e8..f5152046 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23065&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23065&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23065.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23065/head:pull/23065 PR: https://git.openjdk.org/jdk/pull/23065 From thartmann at openjdk.org Mon Jan 13 11:58:21 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 13 Jan 2025 11:58:21 GMT Subject: [jdk24] RFR: 8347407: [BACKOUT] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v2] In-Reply-To: References: <6zBwDnhH5Ox-I5h-cyrBHf_o_Em6d16ixMVT8jxl8BE=.a9736d27-e6f8-43ca-9b8e-448ebe730fdc@github.com> Message-ID: On Mon, 13 Jan 2025 11:55:37 GMT, Damon Fenacci wrote: >> Hi all, >> >> This pull request contains a backport of commit [b37f1236](https://github.com/openjdk/jdk/commit/b37f12362507fb2cd291a2b44b4777ba76efd35e) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Damon Fenacci on 13 Jan 2025 and was reviewed by Tobias Hartmann and Vladimir Kozlov. >> >> Includes that were re-added by the original backout in these 2 files >> `src/hotspot/share/c1/c1_Compilation.hpp` >> `src/hotspot/share/c1/c1_IR.hpp` >> could not be cleanly applied but are not needed in the backport as the change happened after the jdk24 branch. >> >> Thanks! > > Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: > > - Update c1_IR.hpp > - Update c1_Compilation.hpp Looks good and trivial to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23065#pullrequestreview-2546410273 From chagedorn at openjdk.org Mon Jan 13 11:59:15 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 Jan 2025 11:59:15 GMT Subject: RFR: 8347554: [BACKOUT] C2: implement optimization for series of Add of unique value Message-ID: The Java Fuzzer found a wrong execution which could be traced back to [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) which went into JDK 24. Since we are close to RDP 2, a backout seems the safest option for JDK 24. A REDO can be done for JDK 25 @tabjy. Details about the wrong execution and how to trigger it can be found in the JBS description of this backout. The backout applied cleanly. Thanks, Christian ------------- Commit messages: - Revert "8325495: C2: implement optimization for series of Add of unique value" Changes: https://git.openjdk.org/jdk/pull/23066/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23066&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347554 Stats: 414 lines in 3 files changed: 0 ins; 414 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23066.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23066/head:pull/23066 PR: https://git.openjdk.org/jdk/pull/23066 From thartmann at openjdk.org Mon Jan 13 12:01:51 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 13 Jan 2025 12:01:51 GMT Subject: RFR: 8347554: [BACKOUT] C2: implement optimization for series of Add of unique value In-Reply-To: References: Message-ID: <71IJHtoWyATCu5p5OV0qhLiXmNm5wBmq9IPcT0Xd4ZQ=.e7220730-3032-4850-a863-8a02d355ea8e@github.com> On Mon, 13 Jan 2025 11:53:43 GMT, Christian Hagedorn wrote: > The Java Fuzzer found a wrong execution which could be traced back to [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) which went into JDK 24. Since we are close to RDP 2, a backout seems the safest option for JDK 24. A REDO can be done for JDK 25 @tabjy. > > Details about the wrong execution and how to trigger it can be found in the JBS description of this backout. > > The backout applied cleanly. > > Thanks, > Christian Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23066#pullrequestreview-2546420033 From chagedorn at openjdk.org Mon Jan 13 12:08:36 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 Jan 2025 12:08:36 GMT Subject: RFR: 8347554: [BACKOUT] C2: implement optimization for series of Add of unique value In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 11:53:43 GMT, Christian Hagedorn wrote: > The Java Fuzzer found a wrong execution which could be traced back to [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) which went into JDK 24. Since we are close to RDP 2, a backout seems the safest option for JDK 24. A REDO can be done for JDK 25 @tabjy. > > Details about the wrong execution and how to trigger it can be found in the JBS description of this backout. > > The backout applied cleanly. > > Thanks, > Christian Thanks Tobias! Once testing is complete, I will integrate it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23066#issuecomment-2586928324 From jbhateja at openjdk.org Mon Jan 13 12:32:34 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 13 Jan 2025 12:32:34 GMT Subject: RFR: 8347422: Crash during safepoint handler execution with -XX:+UseAPX In-Reply-To: <0C5TGgbOGV0xQnTEbtqaZw00hJ9JJneIQrJ91pTHCDc=.e49fc21d-77bb-4825-a5ac-e67f8a0c082a@github.com> References: <0C5TGgbOGV0xQnTEbtqaZw00hJ9JJneIQrJ91pTHCDc=.e49fc21d-77bb-4825-a5ac-e67f8a0c082a@github.com> Message-ID: <4Xq-CpQQh-Oly6J4jhnvN49YG_gQMIu_PYjW0lDoJ9o=.04a4d890-529c-446c-ac45-ea9c291d2f96@github.com> On Mon, 13 Jan 2025 11:32:48 GMT, Tobias Hartmann wrote: > Thanks Jatin. In this case I would suggest that we should add a VM stress flag to give preference to EGPRs, or maybe [JDK-8343294](https://bugs.openjdk.org/browse/JDK-8343294) is good enough? > Technically, randomizing allocation sequence from the same register class is not very useful, from the compiler standpoint it simply ensures that an LRG corresponding to the definition MachOper is never assigned the same register which is already assigned to its neighbors in the interference graph, so it simply picks the next available free register while choosing the color, a change in allocation order will not impact the spilling behavior either, what I am doing is modifying the static allocation order so that we deterministically give preferences to EGPRs over GPRs, this will ensure that all our APX specific assembler support and special handling like the one extended by this patch as exercised throughly. > Do the existing tests trigger this even without `-XX:+SafepointALot`? Yes, but with -XX:+SafepointALot crash hit early. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23035#issuecomment-2586976104 From roland at openjdk.org Mon Jan 13 12:44:25 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 Jan 2025 12:44:25 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v8] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: - refactor - Merge branch 'master' into JDK-8342692 - Merge branch 'master' into JDK-8342692 - Merge branch 'master' into JDK-8342692 - Merge branch 'master' into JDK-8342692 - review - reviews - Update src/hotspot/share/opto/loopTransform.cpp Co-authored-by: Emanuel Peter - Merge branch 'master' into JDK-8342692 - whitespaces - ... and 19 more: https://git.openjdk.org/jdk/compare/3b9732ed...0f137359 ------------- Changes: https://git.openjdk.org/jdk/pull/21630/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=07 Stats: 1308 lines in 24 files changed: 1252 ins; 16 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From roland at openjdk.org Mon Jan 13 12:44:26 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 Jan 2025 12:44:26 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v5] In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 07:46:08 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: >> >> - Merge branch 'master' into JDK-8342692 >> - whitespaces >> - more >> - merge >> - more >> - one more test >> - Merge branch 'master' into JDK-8342692 >> - more >> - more >> - Merge branch 'master' into JDK-8342692 >> - ... and 11 more: https://git.openjdk.org/jdk/compare/3b21a298...74c38342 > > src/hotspot/share/opto/loopnode.cpp line 1190: > >> 1188: get_template_assertion_predicates(parse_predicate_proj, list); >> 1189: clone_assertion_predicates(loop, list, ctrl->in(0)->as_ParsePredicate()); >> 1190: } > > You may want to talk with @chhagedorn to see if this cannot be done with less code-duplication. > Also: where are the `Unique_Node_List` allocated from / deallocated? Done in new commits. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r1913132596 From roland at openjdk.org Mon Jan 13 12:50:52 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 Jan 2025 12:50:52 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v8] In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 12:44:25 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: > > - refactor > - Merge branch 'master' into JDK-8342692 > - Merge branch 'master' into JDK-8342692 > - Merge branch 'master' into JDK-8342692 > - Merge branch 'master' into JDK-8342692 > - review > - reviews > - Update src/hotspot/share/opto/loopTransform.cpp > > Co-authored-by: Emanuel Peter > - Merge branch 'master' into JDK-8342692 > - whitespaces > - ... and 19 more: https://git.openjdk.org/jdk/compare/3b9732ed...0f137359 ![perf](https://github.com/user-attachments/assets/d1f41cbc-f68e-48bc-ab11-e6db1129ddbe) Performance of Maurizio's micro benchmark: red is without this patch, green is with the just updated version of the patch (which treats all loops that execute for less than `max_jint/max RC scale` as short running), blue is the previous version of the patch (short running loops run for less than `ShortLoopIter` iterations). Previous patch performs slightly better than current one because restricting the number of iterations for a short running loop more also allows removing the outer strip mined loop but it has the drawback that the transformation may applicable to fewer loops. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2587016221 From duke at openjdk.org Mon Jan 13 12:58:22 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 13 Jan 2025 12:58:22 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value [v3] In-Reply-To: References: Message-ID: > This patch enhance MergeStores optimization to support merge value with reverse byte order. > > Below is benchmark result before and after the patch: > > On aliyun g8y (aarch64) > |name | before | score2 | ratio | > |---|---|---|---| > |MergeStoreBench.setCharBS |5669.655000 |5669.566000 | 0.00 %| > |MergeStoreBench.setCharBV |5516.911000 |5516.273000 | 0.01 %| > |MergeStoreBench.setCharC |5578.644000 |5552.809000 | 0.47 %| > |MergeStoreBench.setCharLS |5782.140000 |5779.264000 | 0.05 %| > |MergeStoreBench.setCharLV |5496.403000 |5499.195000 | -0.05 %| > |MergeStoreBench.setIntB |6087.703000 |2768.385000 | 119.90 %| > |MergeStoreBench.setIntBU |6733.813000 |2950.240000 | 128.25 %| > |MergeStoreBench.setIntBV |1362.233000 |1361.821000 | 0.03 %| > |MergeStoreBench.setIntL |2834.785000 |2833.042000 | 0.06 %| > |MergeStoreBench.setIntLU |2947.145000 |2946.874000 | 0.01 %| > |MergeStoreBench.setIntLV |5506.791000 |5506.229000 | 0.01 %| > |MergeStoreBench.setIntRB |7634.279000 |5611.058000 | 36.06 %| > |MergeStoreBench.setIntRBU |7766.737000 |5551.281000 | 39.91 %| > |MergeStoreBench.setIntRL |5689.793000 |5689.385000 | 0.01 %| > |MergeStoreBench.setIntRLU |5628.287000 |5628.789000 | -0.01 %| > |MergeStoreBench.setIntRU |5536.039000 |5534.910000 | 0.02 %| > |MergeStoreBench.setIntU |5595.363000 |5567.810000 | 0.49 %| > |MergeStoreBench.setLongB |13722.671000 |6811.098000 | 101.48 %| > |MergeStoreBench.setLongBU |13728.844000 |4280.240000 | 220.75 %| > |MergeStoreBench.setLongBV |2785.255000 |2785.949000 | -0.02 %| > |MergeStoreBench.setLongL |5714.615000 |5710.402000 | 0.07 %| > |MergeStoreBench.setLongLU |4128.746000 |4129.324000 | -0.01 %| > |MergeStoreBench.setLongLV |2793.125000 |2794.438000 | -0.05 %| > |MergeStoreBench.setLongRB |14465.223000 |7015.050000 | 106.20 %| > |MergeStoreBench.setLongRBU |14546.954000 |6173.210000 | 135.65 %| > |MergeStoreBench.setLongRL |6816.145000 |6813.348000 | 0.04 %| > |MergeStoreBench.setLongRLU |4289.445000 |4284.239000 | 0.12 %| > |MergeStoreBench.setLongRU |3132.471000 |3133.093000 | -0.02 %| > |MergeStoreBench.setLongU |3086.779000 |3087.298000 | -0.02 %| > > AMD EPYC 9T24 > |name | before | after | ratio | > |---|---|---|---| > |MergeStoreBench.setChar... kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Update enum ValueOrder type ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23030/files - new: https://git.openjdk.org/jdk/pull/23030/files/4262b93c..5214a3b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23030&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23030&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23030.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23030/head:pull/23030 PR: https://git.openjdk.org/jdk/pull/23030 From duke at openjdk.org Mon Jan 13 13:10:40 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 13 Jan 2025 13:10:40 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value [v3] In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 11:18:50 GMT, Emanuel Peter wrote: > This looks very promising, thanks for working on this! Makes me very happy that people are extending it ? > > I have a few comments and suggestions below. > > Can you please link the JBS issue to the other relevant RFE's for MergeStores? > > Is there no way to reverse shorts and ints? I related the JBS issue to https://bugs.openjdk.org/browse/JDK-8318446 C2: optimize stores into primitive arrays by combining values into larger store . I think reverse shorts and ints can be done as well. But I think they are not as common as reverse byte order. And we need add shift and mask instructions, we can gain less performance improvement. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23030#issuecomment-2587060751 From duke at openjdk.org Mon Jan 13 13:10:41 2025 From: duke at openjdk.org (kuaiwei) Date: Mon, 13 Jan 2025 13:10:41 GMT Subject: RFR: 8347405: MergeStores with reverse bytes order value [v2] In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 03:58:41 GMT, Amit Kumar wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix as review comments > > src/hotspot/share/opto/memnode.cpp line 2799: > >> 2797: // Forward -> Forward >> 2798: // Backward -> Backward >> 2799: enum ValueOrder { Unknown, Forward, Backward }; > > can we update it to: > Suggestion: > > enum ValueOrder : uint8_t { Unknown, Forward, Backward }; The type is added. Thanks for suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23030#discussion_r1913166031 From mdoerr at openjdk.org Mon Jan 13 13:20:49 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 13 Jan 2025 13:20:49 GMT Subject: RFR: 8346038: [REDO] - [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: On Thu, 2 Jan 2025 18:11:59 GMT, Sandhya Viswanathan wrote: >> @TheRealMDoerr The only problem arises when higher bank register (xmm16 and above) is used. For higher bank registers we need to use AVX512 instructions and KNL doesn't support variable vector length for AVX512 (i.e. it doesn't support avx512vl). So under those circumstances these instructions should use 512 bit vector length. Please find below a patch which could do that: >> [masm.patch](https://github.com/user-attachments/files/18282922/masm.patch) > >> @sviswa7: Thank you so much for your assistance. I have applied your proposal with Commit number 4. I think your patch missed some "else" statements. I've used the same condition `(!VM_Version::supports_avx512dq() || !VM_Version::supports_avx512vl())` for all. Please take a look and check if that makes sense: [643b010](https://github.com/openjdk/jdk/commit/643b0109e9aeb966a11fce76dff39ab052aa76c4) > > Yes, "else" was missing. Rest of your changes look good, I have only one comment above. Please take a look. @sviswa7: Can you approve this PR or ask one of your colleagues to review the x86_64 code, please? The original change (1st commit) was already reviewed by @rrich and @goetz. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22709#issuecomment-2587082544 From qamai at openjdk.org Mon Jan 13 13:28:43 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 13 Jan 2025 13:28:43 GMT Subject: RFR: 8335747: C2: fix overflow case for LoopLimit with constant inputs In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 06:20:08 GMT, Emanuel Peter wrote: > `LoopLimitNode::Value` tries to constant-fold when it has constant inputs. However, there can be an overflow in the int-computation, but we check for it with `if (final_con == (jlong)final_int) {` and do not constant fold in that case. > > However, there was an `assert` that checked that such an overflow would never be encountered. We already had to make an exception for this assert during PhaseCCP with [JDK-8309266](https://bugs.openjdk.org/browse/JDK-8309266). > > Why did we not hit this assert before? > `LoopLimitNode` needs to have constant inputs. We used to assume that if the constants would lead to an overflow, then the loop-limit-check would also get similar constants, and detect that `limit <= max_int-stride` does not hold, and it would constant-fold away the loop, together with the `LoopLimitNode`. > > But now we found a second case: > https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/src/hotspot/share/opto/loopnode.cpp#L2555-L2563 > > In `PhaseIdealLoop::split_thru_phi`, we temporarily split the `LoopLimitNode` through the phi, generating a new `LoopLimitNode` for each branch of the `phi`. We then call `Value` on it to see if that leads us to constant fold one of the branches, which would be considered a "win". > https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/src/hotspot/share/opto/loopopts.cpp#L87-L105 > > In the regression test, we have this example: > https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/test/hotspot/jtreg/compiler/loopopts/TestLoopLimitOverflowDuringSplitThruPhi.java#L44-L69 > > We generate a temporary clone of `LoopLimitNode(init=0, limit=x, stride=4)` (would not constant fold because of variable `x = Phi(1000, 2147483647)`), which happens to be `LoopLimitNode(init=0, limit=2147483647, stride=4)`. We evaluate `Value` on this temporary clone, and hit the overflow case. > > Why is it ok to just remove the assert and allow `LoopLimitNode` to overflow? > We still have the loop limit check, which checks that `limit <= max_int-stride`, and this means we would never enter the loop if we took the `Phi` branch that led to the overflow. > > I could not just remove the assert, because in `LoopLimitNode::Ideal` we have this (strange?) check that does not optimize the `LoopLimitNode` if the inputs are constants: > > if (in(Init)->is_Con() && in(Limit)->is_Con()) > return nullptr; // Value > > The assumption seems to be that we want `Value` to do the constant folding here - but of course we di... You are right, but I believe the resulting expansion will constant fold regardless. So, why do we need to reject constant folding of the `LoopLimitNode` in the presence of overflow? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23024#issuecomment-2587100205 From qamai at openjdk.org Mon Jan 13 13:30:50 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 13 Jan 2025 13:30:50 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v8] In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 12:44:25 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: > > - refactor > - Merge branch 'master' into JDK-8342692 > - Merge branch 'master' into JDK-8342692 > - Merge branch 'master' into JDK-8342692 > - Merge branch 'master' into JDK-8342692 > - review > - reviews > - Update src/hotspot/share/opto/loopTransform.cpp > > Co-authored-by: Emanuel Peter > - Merge branch 'master' into JDK-8342692 > - whitespaces > - ... and 19 more: https://git.openjdk.org/jdk/compare/3b9732ed...0f137359 Can you inject the iteration count into the created loop so that it can avoid strip mining? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2587105950 From thartmann at openjdk.org Mon Jan 13 13:48:47 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 13 Jan 2025 13:48:47 GMT Subject: [jdk24] Integrated: 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic In-Reply-To: <-9ijGzFfYh1nwsa9JmJPDLbujCJUTAwUNffcu0XcA1g=.346d2d7a-49fa-4e44-ad63-948ae96550a3@github.com> References: <-9ijGzFfYh1nwsa9JmJPDLbujCJUTAwUNffcu0XcA1g=.346d2d7a-49fa-4e44-ad63-948ae96550a3@github.com> Message-ID: On Mon, 13 Jan 2025 09:57:01 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [82e2a791](https://github.com/openjdk/jdk/commit/82e2a791225a289ba32360bf415274c4b48b9e00) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Tobias Hartmann on 13 Jan 2025 and was reviewed by Roland Westrelin, Quan Anh Mai and Vladimir Kozlov. > > Thanks! This pull request has now been integrated. Changeset: da74fbd9 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/da74fbd920cdcd16f6097fcb8488a061f4753be5 Stats: 58 lines in 5 files changed: 14 ins; 0 del; 44 mod 8347006: LoadRangeNode floats above array guard in arraycopy intrinsic Reviewed-by: chagedorn Backport-of: 82e2a791225a289ba32360bf415274c4b48b9e00 ------------- PR: https://git.openjdk.org/jdk/pull/23063 From chagedorn at openjdk.org Mon Jan 13 14:06:10 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 Jan 2025 14:06:10 GMT Subject: RFR: 8347018: C2: assert(find_block_for_node(self->in(0)) == early) failed: The home of a memory writer must also be its earliest placement In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 13:42:43 GMT, Christian Hagedorn wrote: > ## Failing Assert > > The failing assert in `PhaseCFG::schedule_late()` checks the following: > https://github.com/openjdk/jdk/blob/55c6904e8f3d02530749bf28f2cc966e8983a984/src/hotspot/share/opto/gcm.cpp#L1431-L1438 > > In the test case, this is violated for `87 storeI`: > ![image](https://github.com/user-attachments/assets/6844e060-69c4-436e-ac47-c14ec59ee4d2) > > The early block `early` for `87 storeI` is bound by `115 loadI` pinned at `161 Region` which is dominated by the control input `146 Region` of `87 storeI`. This lets the assert fail. > > ## How Did `115 loadI` End up Being Pinned below `87 storeI`? > ### Before Pre/Main/Post Loop Creation > Before the creation of pre/main/post loops, we have the following graph: > > ![image](https://github.com/user-attachments/assets/3f6e6b92-5194-4efd-89a9-246a977e3022) > > Everything looks fine: The control input of `312 StoreI` (which is eventually cloned and becomes `87 storeI` in the Mach graph) corresponds to the early placement of the store. `415 LoadI` was hoisted out of the loop during Loop Predication and is pinned above at a Template Assertion Predicate. > > ### Pre/Main/Post Loop Creation > #### Post Loop Body Creation > During the creation of pre/main/post loops, we clone the main loop body for the post loop body: > > ![image](https://github.com/user-attachments/assets/e8de3d6d-34c3-46fd-abd4-df212e2e77fb) > > We notice that `312 StoreI` is pinned on the main loop backedge. When finishing the last iteration from the main loop and possibly continuing in the post loop, we need to feed everything on the loop backedge of the main loop to the post loop. However, the pinned nodes on the main loop backedge cannot float. Therefore, we need to create new copies of these pinned nodes with `PhaseIdealLoop::clone_up_backedge_goo()`. > > The pins are updated to the entry of the post loop. All inputs into these pinned nodes that have their current control (fetched with `get_ctrl()`) on the main loop backedge as well are also cloned but keep their control inputs (if any) if it's not the loop backedge. > > In our example, this applies to `453 StoreI` -> `479 StoreI`, and some inputs recursively (`454 AddI` -> `482 AddI`, `481 LoadI` -> `541 Load`): > > ![image](https://github.com/user-attachments/assets/0d4b818b-5c23-4ad9-b474-976c352f4c88) > > Still, all looks fine. Notice that the clone `481 LoadI` of `455 LoadI` is currently still pinned at the same Template Assertion. > > #### Assertion Predicate Creation > In the next step, we create new Assertion Predicates at th... src/hotspot/share/opto/loopTransform.cpp line 1453: > 1451: // the if branch that enters the loop, between the input induction > 1452: // variable value and the induction variable Phi to preserve correct > 1453: // dependencies. Noticed that this comment block should have been removed earlier with [JDK-8334724](https://bugs.openjdk.org/browse/JDK-8334724) which removed the cast node. I squeezed this in here - probably not worth a separate task. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23071#discussion_r1913215768 From chagedorn at openjdk.org Mon Jan 13 14:06:10 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 Jan 2025 14:06:10 GMT Subject: RFR: 8347018: C2: assert(find_block_for_node(self->in(0)) == early) failed: The home of a memory writer must also be its earliest placement Message-ID: ## Failing Assert The failing assert in `PhaseCFG::schedule_late()` checks the following: https://github.com/openjdk/jdk/blob/55c6904e8f3d02530749bf28f2cc966e8983a984/src/hotspot/share/opto/gcm.cpp#L1431-L1438 In the test case, this is violated for `87 storeI`: ![image](https://github.com/user-attachments/assets/6844e060-69c4-436e-ac47-c14ec59ee4d2) The early block `early` for `87 storeI` is bound by `115 loadI` pinned at `161 Region` which is dominated by the control input `146 Region` of `87 storeI`. This lets the assert fail. ## How Did `115 loadI` End up Being Pinned below `87 storeI`? ### Before Pre/Main/Post Loop Creation Before the creation of pre/main/post loops, we have the following graph: ![image](https://github.com/user-attachments/assets/3f6e6b92-5194-4efd-89a9-246a977e3022) Everything looks fine: The control input of `312 StoreI` (which is eventually cloned and becomes `87 storeI` in the Mach graph) corresponds to the early placement of the store. `415 LoadI` was hoisted out of the loop during Loop Predication and is pinned above at a Template Assertion Predicate. ### Pre/Main/Post Loop Creation #### Post Loop Body Creation During the creation of pre/main/post loops, we clone the main loop body for the post loop body: ![image](https://github.com/user-attachments/assets/e8de3d6d-34c3-46fd-abd4-df212e2e77fb) We notice that `312 StoreI` is pinned on the main loop backedge. When finishing the last iteration from the main loop and possibly continuing in the post loop, we need to feed everything on the loop backedge of the main loop to the post loop. However, the pinned nodes on the main loop backedge cannot float. Therefore, we need to create new copies of these pinned nodes with `PhaseIdealLoop::clone_up_backedge_goo()`. The pins are updated to the entry of the post loop. All inputs into these pinned nodes that have their current control (fetched with `get_ctrl()`) on the main loop backedge as well are also cloned but keep their control inputs (if any) if it's not the loop backedge. In our example, this applies to `453 StoreI` -> `479 StoreI`, and some inputs recursively (`454 AddI` -> `482 AddI`, `481 LoadI` -> `541 Load`): ![image](https://github.com/user-attachments/assets/0d4b818b-5c23-4ad9-b474-976c352f4c88) Still, all looks fine. Notice that the clone `481 LoadI` of `455 LoadI` is currently still pinned at the same Template Assertion. #### Assertion Predicate Creation In the next step, we create new Assertion Predicates at the post loop and rewire any data nodes control dependent on Assertion Predicates down to the post loop - including the new `481 LoadI` from `PhaseIdealLoop::clone_up_backedge_goo()`: ![image](https://github.com/user-attachments/assets/da529f25-de01-4ebc-a051-e9a8132f92d3) This creates the graph shape with which we are then later failing during scheduling in the backend: The control input of `479 StoreI` further up in the graph as the actual early block limited by `481 LoadI` pinned at `493 IfTrue`. ## Same Problem with `clone_up_backedge_goo()` for Main Loop? The very same problem could theoretically also be observed for the main loop when creating the pre loop. But it is not due to how we implemented the rewiring of data nodes when creating new Assertion Predicates: After the pre loop is created, the old Assertion Predicates are above the pre loop and actually need to be established at the main loop. Therefore, all data nodes control dependent on Assertion Predicates and belonging to the main loop need to be rewired. In our test case, this is `415 LoadI` (original node) and `540 LoadI` (cloned node by `clone_up_backedge_goo()` actually belonging to main loop): ![image](https://github.com/user-attachments/assets/5898aa1a-fc38-45f8-9573-5b3c982202b3) ### Check If Data Belongs to Main Loop Since the pre loop only contains cloned nodes we do the following trick to determine if a node belongs to the main loop (implemented [here](https://github.com/chhagedorn/jdk/blob/3b9732edc6dd22868634166678d220bf1066e5be/src/hotspot/share/opto/predicates.hpp#L964-L973)): Store index IDX for the next newly created node just before pre loop creation. For any data node dependency n: Is index of n < IDX? -> Not a node in the pre loop Is there a clone of n with index >= IDX? -> Clone is in pre loop and thus original node in main loop ### Cloned Nodes with `clone_up_backedge_goo()` Mess with "Node inside Main Loop" Check Since the cloned nodes in `clone_up_backedge_goo()` are originally from pre loop nodes, our check will fail and we do not rewire these nodes, even though they belong to the main loop: "540 LoadI < IDX" does not hold => we conclude 540 LoadI is a cloned node belonging to the pre loop and not the main loop Applied to our test case, we have the following after `clone_up_backedge_goo()`: ![image](https://github.com/user-attachments/assets/2216f446-cef3-4f66-80a1-f07012160c59) We can see that `540 LoadI`, cloned by `clone_up_backedge_goo()`, is still pinned before the pre loop because we have not rewired it and thus scheduling does not fail with the assert. Even though I could not trigger a failure, I think it is an incorrect pin since the `540 LoadI` belongs to the main loop. ## Proposed Fix - Rewire any nodes created by `clone_up_backedge_goo()` which are pinned to the original loop entry before Assertion Predication to the new loop entry after Assertion Predicate creation. The new loop entry will be the the tail of the last Assertion Predicate (if any). - Update data node rewiring in Assertion Predication processing to also consider nodes from `clone_up_backedge_goo()` correctly. I've implemented a new `NodeInMainLoopBody` class for that purpse. ### Why not just Add Assertion Predicates First? This does not work straight forward because we do not know the init value before applying `clone_up_backedge_goo()` which is interleaved with updating the phi nodes. I've decided to go with the proposed fix instead. ## Testing: - tier1-7 - hs-precheckin-comp - hs-comp-stress ## Deferring to JDK 25? This seems to be an edge case (only found with fuzzing) and it's not entirely clear to me what the impact on product builds is. However, this is a regression in JDK 24 and should be considered to be fixed in JDK 24. But this fix became somewhat more complex to understand and implement. First applying the fix to JDK 25, letting it bake and then only considering it for an update release of JDK 25 could be a possible option I think. Opinions are welcomed. Thanks, Christian ------------- Commit messages: - New NodeInMainLoopBody - 8347018: C2: assert(find_block_for_node(self->in(0)) == early) failed: The home of a memory writer must also be its earliest placement Changes: https://git.openjdk.org/jdk/pull/23071/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23071&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347018 Stats: 154 lines in 4 files changed: 134 ins; 9 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/23071.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23071/head:pull/23071 PR: https://git.openjdk.org/jdk/pull/23071 From epeter at openjdk.org Mon Jan 13 14:20:46 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 13 Jan 2025 14:20:46 GMT Subject: RFR: 8335747: C2: fix overflow case for LoopLimit with constant inputs In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 13:25:38 GMT, Quan Anh Mai wrote: >> `LoopLimitNode::Value` tries to constant-fold when it has constant inputs. However, there can be an overflow in the int-computation, but we check for it with `if (final_con == (jlong)final_int) {` and do not constant fold in that case. >> >> However, there was an `assert` that checked that such an overflow would never be encountered. We already had to make an exception for this assert during PhaseCCP with [JDK-8309266](https://bugs.openjdk.org/browse/JDK-8309266). >> >> Why did we not hit this assert before? >> `LoopLimitNode` needs to have constant inputs. We used to assume that if the constants would lead to an overflow, then the loop-limit-check would also get similar constants, and detect that `limit <= max_int-stride` does not hold, and it would constant-fold away the loop, together with the `LoopLimitNode`. >> >> But now we found a second case: >> https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/src/hotspot/share/opto/loopnode.cpp#L2555-L2563 >> >> In `PhaseIdealLoop::split_thru_phi`, we temporarily split the `LoopLimitNode` through the phi, generating a new `LoopLimitNode` for each branch of the `phi`. We then call `Value` on it to see if that leads us to constant fold one of the branches, which would be considered a "win". >> https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/src/hotspot/share/opto/loopopts.cpp#L87-L105 >> >> In the regression test, we have this example: >> https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/test/hotspot/jtreg/compiler/loopopts/TestLoopLimitOverflowDuringSplitThruPhi.java#L44-L69 >> >> We generate a temporary clone of `LoopLimitNode(init=0, limit=x, stride=4)` (would not constant fold because of variable `x = Phi(1000, 2147483647)`), which happens to be `LoopLimitNode(init=0, limit=2147483647, stride=4)`. We evaluate `Value` on this temporary clone, and hit the overflow case. >> >> Why is it ok to just remove the assert and allow `LoopLimitNode` to overflow? >> We still have the loop limit check, which checks that `limit <= max_int-stride`, and this means we would never enter the loop if we took the `Phi` branch that led to the overflow. >> >> I could not just remove the assert, because in `LoopLimitNode::Ideal` we have this (strange?) check that does not optimize the `LoopLimitNode` if the inputs are constants: >> >> if (in(Init)->is_Con() && in(Limit)->is_Con()) >> return nullptr; // Value >> >> The assumption seems to be that we want `Value`... > > You are right, but I believe the resulting expansion will constant fold regardless. So, why do we need to reject constant folding of the `LoopLimitNode` in the presence of overflow? @merykitty You are probably right, we could probably just constant-fold in `Value`. I mean is it not a little strange anyway: why do we have the optimization twice: once in `Value` and then the lowering in `Ideal`. Feels a little like duplication. We could investigate this in a follow-up RFE, what do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23024#issuecomment-2587226316 From chagedorn at openjdk.org Mon Jan 13 14:22:52 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 Jan 2025 14:22:52 GMT Subject: RFR: 8347554: [BACKOUT] C2: implement optimization for series of Add of unique value In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 11:53:43 GMT, Christian Hagedorn wrote: > The Java Fuzzer found a wrong execution which could be traced back to [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) which went into JDK 24. Since we are close to RDP 2, a backout seems the safest option for JDK 24. A REDO can be done for JDK 25 @tabjy. > > Details about the wrong execution and how to trigger it can be found in the JBS description of this backout. > > The backout applied cleanly. > > Thanks, > Christian Testing was clean. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23066#issuecomment-2587229076 From chagedorn at openjdk.org Mon Jan 13 14:22:52 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 Jan 2025 14:22:52 GMT Subject: Integrated: 8347554: [BACKOUT] C2: implement optimization for series of Add of unique value In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 11:53:43 GMT, Christian Hagedorn wrote: > The Java Fuzzer found a wrong execution which could be traced back to [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) which went into JDK 24. Since we are close to RDP 2, a backout seems the safest option for JDK 24. A REDO can be done for JDK 25 @tabjy. > > Details about the wrong execution and how to trigger it can be found in the JBS description of this backout. > > The backout applied cleanly. > > Thanks, > Christian This pull request has now been integrated. Changeset: 062f2dcf Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/062f2dcfe5b62cc3dd3c292eeebd7a7ac78f849a Stats: 414 lines in 3 files changed: 0 ins; 414 del; 0 mod 8347554: [BACKOUT] C2: implement optimization for series of Add of unique value Reviewed-by: thartmann ------------- PR: https://git.openjdk.org/jdk/pull/23066 From chagedorn at openjdk.org Mon Jan 13 14:24:56 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 Jan 2025 14:24:56 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v16] In-Reply-To: <89yHic-uESOnHEWlgbGBFANceqk6mF6qvPRHoHv9niw=.05ca4b72-3e43-4bb6-921d-8d90e994c823@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> <89yHic-uESOnHEWlgbGBFANceqk6mF6qvPRHoHv9niw=.05ca4b72-3e43-4bb6-921d-8d90e994c823@github.com> Message-ID: <9y7eTACVxtTUIFQ6XIW9Gq81NWUI-hL1Y5_y9degkGg=.3005dd91-7ca3-425d-aee9-2b1eca31e200@github.com> On Fri, 10 Jan 2025 09:01:16 GMT, Theo Weidmann wrote: >> This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: >> >>> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >>> >>> I would like to start a collection of such generators, that can then be used in testing. >>> >>> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >>> >>> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >>> >>> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 >> >> The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. > > Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: > > Fix spacing Drive-by comment: You should update the copyright years to 2025. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22941#issuecomment-2587237243 From dfenacci at openjdk.org Mon Jan 13 14:28:38 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 13 Jan 2025 14:28:38 GMT Subject: RFR: 8347481: C2: Remove the control input of some nodes In-Reply-To: References: Message-ID: On Sun, 12 Jan 2025 13:43:44 GMT, Quan Anh Mai wrote: > Hi, > > While working on [JDK-8347365](https://bugs.openjdk.org/browse/JDK-8347365), I noticed that there are some nodes that have their control inputs being set in a seemingly erroneous manner. This patch removes the control inputs for those nodes. > > Please review this PR, thanks a lot. Nice cleanup @merykitty. Thanks! I was just wondering if there could be more nodes where the control input is wrongly set (possibly enough for a followup issue?) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23055#issuecomment-2587247540 From thartmann at openjdk.org Mon Jan 13 14:37:39 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 13 Jan 2025 14:37:39 GMT Subject: RFR: 8347422: Crash during safepoint handler execution with -XX:+UseAPX In-Reply-To: <4Xq-CpQQh-Oly6J4jhnvN49YG_gQMIu_PYjW0lDoJ9o=.04a4d890-529c-446c-ac45-ea9c291d2f96@github.com> References: <0C5TGgbOGV0xQnTEbtqaZw00hJ9JJneIQrJ91pTHCDc=.e49fc21d-77bb-4825-a5ac-e67f8a0c082a@github.com> <4Xq-CpQQh-Oly6J4jhnvN49YG_gQMIu_PYjW0lDoJ9o=.04a4d890-529c-446c-ac45-ea9c291d2f96@github.com> Message-ID: On Mon, 13 Jan 2025 12:28:29 GMT, Jatin Bhateja wrote: > Technically, randomizing allocation sequence from the same register class is not very useful It's definitely useful to catch bugs where silent register corruption is never detected because that exact register is not used anywhere else. Or for cases where a specific register is not saved although it should be but we don't notice because that register is not used anywhere else. And we had quite a few bugs like this in the past. Wouldn't such a random register selection also randomly prioritize EGPRs and therefore do the same thing that you did manually? And if not, doesn't that mean that we should definitely add such a stress flag? > Yes, but with -XX:+SafepointALot crash hit early. Okay, good then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23035#issuecomment-2587269757 From roland at openjdk.org Mon Jan 13 14:53:56 2025 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 13 Jan 2025 14:53:56 GMT Subject: RFR: 8333697: C2: Hit MemLimit in PhaseCFG::global_code_motion Message-ID: I investigated the failure from the `Test.java` that's attached to the bug. The failure with this test is only reproducible up to 8334060 (Implementation of Late Barrier Expansion for G1) so experiments I describe here are from the source code for the commit right before it. Peak malloc memory usage reported by NMT is: 1.3GB `PhaseCFG::global_code_motion()`, when `OptoRegScheduling` is true, creates a `PhaseIFG` that's, when initialized, allocates `_adjs`: a `maxlrg` array of `IndexSet`s that can contain up to `maxlrg`. `maxlrg` in this case is 122839. An `IndexSet` is an array of pointers to a 256 bit bitset: one `IndexSet` array needs: 122839 / 256 * 8 = 3832 and there are of 122839: 3832 * 122839 = ~470 MB It turns out the `PhaseIFG` object when used from `PhaseCFG::global_code_motion()` doesn't even use the `_adjs` array. So a patch like: diff --git a/src/hotspot/share/opto/chaitin.hpp b/src/hotspot/share/opto/chaitin.hpp index cf02deb6019..4e5333bf181 100644 --- a/src/hotspot/share/opto/chaitin.hpp +++ b/src/hotspot/share/opto/chaitin.hpp @@ -258,7 +258,7 @@ class PhaseIFG : public Phase { VectorSet *_yanked; PhaseIFG( Arena *arena ); - void init( uint maxlrg ); + void init( uint maxlrg, bool no_adjs = false ); // Add edge between a and b. Returns true if actually added. int add_edge( uint a, uint b ); diff --git a/src/hotspot/share/opto/gcm.cpp b/src/hotspot/share/opto/gcm.cpp index ebdefe597ff..fefd75a88c5 100644 --- a/src/hotspot/share/opto/gcm.cpp +++ b/src/hotspot/share/opto/gcm.cpp @@ -1704,7 +1704,9 @@ void PhaseCFG::global_code_motion() { rm_live.reset_to_mark(); // Reclaim working storage IndexSet::reset_memory(C, &live_arena); uint node_size = regalloc._lrg_map.max_lrg_id(); - ifg.init(node_size); // Empty IFG + ifg.init(node_size, true); // Empty IFG regalloc.set_ifg(ifg); regalloc.set_live(live); regalloc.gather_lrg_masks(false); // Collect LRG masks diff --git a/src/hotspot/share/opto/ifg.cpp b/src/hotspot/share/opto/ifg.cpp index d12698121b9..e42121c2254 100644 --- a/src/hotspot/share/opto/ifg.cpp +++ b/src/hotspot/share/opto/ifg.cpp @@ -42,18 +42,24 @@ PhaseIFG::PhaseIFG( Arena *arena ) : Phase(Interference_Graph), _arena(arena) { } -void PhaseIFG::init( uint maxlrg ) { +void PhaseIFG::init( uint maxlrg, bool no_adjs ) { _maxlrg = maxlrg; _yanked = new (_arena) VectorSet(_arena); _is_square = false; // Make uninitialized adjacency lists - _adjs = (IndexSet*)_arena->Amalloc(sizeof(IndexSet)*maxlrg); + if (no_adjs) { + _adjs = nullptr; + } else { + _adjs = (IndexSet*)_arena->Amalloc(sizeof(IndexSet)*maxlrg); + } // Also make empty live range structures _lrgs = (LRG *)_arena->Amalloc( maxlrg * sizeof(LRG) ); memset((void*)_lrgs,0,sizeof(LRG)*maxlrg); // Init all to empty for( uint i = 0; i < maxlrg; i++ ) { - _adjs[i].initialize(maxlrg); + if (_adjs != nullptr) { + _adjs[i].initialize(maxlrg); + } _lrgs[i].Set_All(); } } saves a lot of memory. NMT reports peak malloc memory to be 810 MB then (that saves a bit more than 470MB, not sure why). Instead of the fix above, I propose lazyly allocating the array of pointers to bitsets that `IndexSet` uses (`_blocks` field). The reason for that is that: - for the `PhaseIFG` issue, it pretty much has the same effect as the patch above (the 470 MB of arrays are not allocated). - `PhaseLive` uses arrays of `IndexSet`s as well. One per block. The compilation has 20961 blocks. That's: 122839 / 256 * 20961 = 80MB total of bitblock arrays. I noticed at least one of the `IndexSet` array contains a large proportion of empty `IndexSet`s (the `_defs` field of `PhaseLive`). Lazy allocating bitset arrays is an easy way to save some extra memory. With the patch for this PR, peak malloc memory is reported by NMT to be: 670MB. That's misleading: that number is working from a repo that doesn't have the fix for 8345287 (C2: live in computation is broken) and so has no live ins. Given the patch causes storage for `IndexSet`s to be lazy allocated and 8345287 causes live ins to stay empty, we end up saving space for live ins that should be populated: that's an extra 100MB. So actual peak storage is ~770MB and there's a small extra improvement compared to the 810MB of the patch above. Compilation speed doesn't seem to be affected by this change. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/23075/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23075&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8333697 Stats: 60 lines in 3 files changed: 36 ins; 15 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23075.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23075/head:pull/23075 PR: https://git.openjdk.org/jdk/pull/23075 From qamai at openjdk.org Mon Jan 13 14:55:41 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 13 Jan 2025 14:55:41 GMT Subject: RFR: 8335747: C2: fix overflow case for LoopLimit with constant inputs In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 06:20:08 GMT, Emanuel Peter wrote: > `LoopLimitNode::Value` tries to constant-fold when it has constant inputs. However, there can be an overflow in the int-computation, but we check for it with `if (final_con == (jlong)final_int) {` and do not constant fold in that case. > > However, there was an `assert` that checked that such an overflow would never be encountered. We already had to make an exception for this assert during PhaseCCP with [JDK-8309266](https://bugs.openjdk.org/browse/JDK-8309266). > > Why did we not hit this assert before? > `LoopLimitNode` needs to have constant inputs. We used to assume that if the constants would lead to an overflow, then the loop-limit-check would also get similar constants, and detect that `limit <= max_int-stride` does not hold, and it would constant-fold away the loop, together with the `LoopLimitNode`. > > But now we found a second case: > https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/src/hotspot/share/opto/loopnode.cpp#L2555-L2563 > > In `PhaseIdealLoop::split_thru_phi`, we temporarily split the `LoopLimitNode` through the phi, generating a new `LoopLimitNode` for each branch of the `phi`. We then call `Value` on it to see if that leads us to constant fold one of the branches, which would be considered a "win". > https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/src/hotspot/share/opto/loopopts.cpp#L87-L105 > > In the regression test, we have this example: > https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/test/hotspot/jtreg/compiler/loopopts/TestLoopLimitOverflowDuringSplitThruPhi.java#L44-L69 > > We generate a temporary clone of `LoopLimitNode(init=0, limit=x, stride=4)` (would not constant fold because of variable `x = Phi(1000, 2147483647)`), which happens to be `LoopLimitNode(init=0, limit=2147483647, stride=4)`. We evaluate `Value` on this temporary clone, and hit the overflow case. > > Why is it ok to just remove the assert and allow `LoopLimitNode` to overflow? > We still have the loop limit check, which checks that `limit <= max_int-stride`, and this means we would never enter the loop if we took the `Phi` branch that led to the overflow. > > I could not just remove the assert, because in `LoopLimitNode::Ideal` we have this (strange?) check that does not optimize the `LoopLimitNode` if the inputs are constants: > > if (in(Init)->is_Con() && in(Limit)->is_Con()) > return nullptr; // Value > > The assumption seems to be that we want `Value` to do the constant folding here - but of course we di... I agree, please go ahead. ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/23024#pullrequestreview-2546872357 From chagedorn at openjdk.org Mon Jan 13 15:17:02 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 13 Jan 2025 15:17:02 GMT Subject: [jdk24] RFR: 8347554: [BACKOUT] C2: implement optimization for series of Add of unique value Message-ID: Hi all, This pull request contains a backport of commit [062f2dcf](https://github.com/openjdk/jdk/commit/062f2dcfe5b62cc3dd3c292eeebd7a7ac78f849a) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Christian Hagedorn on 13 Jan 2025 and was reviewed by Tobias Hartmann. Thanks! ------------- Commit messages: - Backport 062f2dcfe5b62cc3dd3c292eeebd7a7ac78f849a Changes: https://git.openjdk.org/jdk/pull/23077/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23077&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347554 Stats: 414 lines in 3 files changed: 0 ins; 414 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23077.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23077/head:pull/23077 PR: https://git.openjdk.org/jdk/pull/23077 From jbhateja at openjdk.org Mon Jan 13 15:19:42 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 13 Jan 2025 15:19:42 GMT Subject: RFR: 8347422: Crash during safepoint handler execution with -XX:+UseAPX In-Reply-To: References: <0C5TGgbOGV0xQnTEbtqaZw00hJ9JJneIQrJ91pTHCDc=.e49fc21d-77bb-4825-a5ac-e67f8a0c082a@github.com> <4Xq-CpQQh-Oly6J4jhnvN49YG_gQMIu_PYjW0lDoJ9o=.04a4d890-529c-446c-ac45-ea9c291d2f96@github.com> Message-ID: <3emAt5aLITW3xyu9QMw6JqqLyEs0gD51Y7Gjvhhdk4A=.20437286-f39b-4316-b625-9395be75dc6e@github.com> On Mon, 13 Jan 2025 14:34:40 GMT, Tobias Hartmann wrote: > > Technically, randomizing allocation sequence from the same register class is not very useful > > It's definitely useful to catch bugs where silent register corruption is never detected because that exact register is not used anywhere else. Or for cases where a specific register is not saved although it should be but we don't notice because that register is not used anywhere else. And we had quite a few bugs like this in the past. > I see, Thanks @TobiHartmann, scope of JDK- 8343294 can be extended to cover EGPRs to catch such silent bugs, to strees validating all JVM components I am taking a brute force approach to begin with. You can assign JDK-8343294 to me if Daniel is not active on that. > Wouldn't such a random register selection also randomly prioritize EGPRs and therefore do the same thing that you did manually? And if not, doesn't that mean that we should definitely add such a stress flag? > > > Yes, but with -XX:+SafepointALot crash hit early. > > Okay, good then. Let me know if this looks ok for now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23035#issuecomment-2587392331 From kxu at openjdk.org Mon Jan 13 15:29:45 2025 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 13 Jan 2025 15:29:45 GMT Subject: RFR: 8347554: [BACKOUT] C2: implement optimization for series of Add of unique value In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 11:53:43 GMT, Christian Hagedorn wrote: > The Java Fuzzer found a wrong execution which could be traced back to [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) which went into JDK 24. Since we are close to RDP 2, a backout seems the safest option for JDK 24. A REDO can be done for JDK 25 @tabjy. > > Details about the wrong execution and how to trigger it can be found in the JBS description of this backout. > > The backout applied cleanly. > > Thanks, > Christian I'll look into this. Sorry for having you to roll back the changes! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23066#issuecomment-2587429951 From dlunden at openjdk.org Mon Jan 13 15:35:47 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 13 Jan 2025 15:35:47 GMT Subject: RFR: 8347422: Crash during safepoint handler execution with -XX:+UseAPX In-Reply-To: <3emAt5aLITW3xyu9QMw6JqqLyEs0gD51Y7Gjvhhdk4A=.20437286-f39b-4316-b625-9395be75dc6e@github.com> References: <0C5TGgbOGV0xQnTEbtqaZw00hJ9JJneIQrJ91pTHCDc=.e49fc21d-77bb-4825-a5ac-e67f8a0c082a@github.com> <4Xq-CpQQh-Oly6J4jhnvN49YG_gQMIu_PYjW0lDoJ9o=.04a4d890-529c-446c-ac45-ea9c291d2f96@github.com> <3emAt5aLITW3xyu9QMw6JqqLyEs0gD51Y7Gjvhhdk4A=.20437286-f39b-4316-b625-9395be75dc6e@github.com> Message-ID: <5P3oKxmxuWEsT_6qx3cWMr7t8aJV-dVVxIxRQRUh4rs=.fecdfa20-945d-46a7-b871-8fd1a5964ee3@github.com> On Mon, 13 Jan 2025 15:16:55 GMT, Jatin Bhateja wrote: >>> Technically, randomizing allocation sequence from the same register class is not very useful >> >> It's definitely useful to catch bugs where silent register corruption is never detected because that exact register is not used anywhere else. Or for cases where a specific register is not saved although it should be but we don't notice because that register is not used anywhere else. And we had quite a few bugs like this in the past. >> >> Wouldn't such a random register selection also randomly prioritize EGPRs and therefore do the same thing that you did manually? And if not, doesn't that mean that we should definitely add such a stress flag? >> >>> Yes, but with -XX:+SafepointALot crash hit early. >> >> Okay, good then. > >> > Technically, randomizing allocation sequence from the same register class is not very useful >> >> It's definitely useful to catch bugs where silent register corruption is never detected because that exact register is not used anywhere else. Or for cases where a specific register is not saved although it should be but we don't notice because that register is not used anywhere else. And we had quite a few bugs like this in the past. >> > > I see, Thanks @TobiHartmann, scope of JDK- 8343294 can be extended to cover EGPRs to catch such silent bugs, to strees validating all JVM components I am taking a brute force approach to begin with. You can assign JDK-8343294 to me if Daniel is not active on that. > >> Wouldn't such a random register selection also randomly prioritize EGPRs and therefore do the same thing that you did manually? And if not, doesn't that mean that we should definitely add such a stress flag? >> >> > Yes, but with -XX:+SafepointALot crash hit early. >> >> Okay, good then. > > Let me know if this looks ok for now. Hi @jatin-bhateja. Feel free to have a look at JDK-8343294 (and assign yourself). Please ping me for the review when the changeset is ready! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23035#issuecomment-2587445078 From psandoz at openjdk.org Mon Jan 13 16:53:45 2025 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 13 Jan 2025 16:53:45 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v9] In-Reply-To: References: <_SCKY9fuTqNDfR6K1y-FuMvursDMuOx39sKrXMj0Tdg=.225da2f1-fcdc-4418-a753-6d7404b4a83e@github.com> Message-ID: On Mon, 13 Jan 2025 09:02:24 GMT, Jatin Bhateja wrote: >> Uncertain on what bits, but i am guessing it's mostly related to the fallback code in the lambda. To avoid the intrinsics operating on Float16 instances we instead "unpack" the carrier (16bits) values and pass those as arguments to the intrinsic. The fallback (when intrinsification is not supported) also accepts those carrier values as arguments and we convert the carriers to floats, operate on then, convert to the carrier, and then back to float16 on the result. >> >> The code in the lambda could potentially be simplified if `Float16Math.fma` accepted six arguments the first three being the carrier values used by the intrinsic, and the subsequent three being the float16 values used by the fallback. Then we could express the code in the original source in the lambda. I believe when intrinsified there would be no penalty for those extra arguments. > > Hi @PaulSandoz , In the current scheme we are passing unboxed carriers to intrinsic entry point, in the fallback implementation carrier type is first converted to floating point value using Float.float16ToFloat API which expects to receive a short type argument, after the operation we again convert float value to carrier type (short) using Float.floatToFloat16 API which expects a float argument, thus our intent here is to perform unboxing and boxing outside the intrinsic thereby avoiding all complexities around boxing by compiler. Even if we pass 3 additional parameters we still need to use Float16.floatValue which invokes Float.float16ToFloat underneath, thus this minor modification on Java side is on account of optimizing the intrinsic interface. Yes, i understand the approach. It's about clarity of the fallback implementation retaining what was expressed in the original code: short res = Float16Math.fma(fa, fb, fc, a, b, c, (a_, b_, c_) -> { double product = (double)(a_.floatValue() * b._floatValue()); return valueOf(product + c_.doubleValue()); }); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1913502565 From sviswanathan at openjdk.org Mon Jan 13 17:50:45 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 13 Jan 2025 17:50:45 GMT Subject: RFR: 8346038: [REDO] - [C1] LIR Operations with one input should be implemented as LIR_Op1 [v2] In-Reply-To: <7nB330iJ42qBoROI2rfnlDj8UYaG469KueJLTfWTEng=.ec85bfcc-11f6-4c68-b076-08e4410d8a2b@github.com> References: <7nB330iJ42qBoROI2rfnlDj8UYaG469KueJLTfWTEng=.ec85bfcc-11f6-4c68-b076-08e4410d8a2b@github.com> Message-ID: On Fri, 10 Jan 2025 23:03:20 GMT, Martin Doerr wrote: >> 1st commit: Same as https://github.com/openjdk/jdk/commit/a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c >> 2nd commit: Removal of special "Knights Landing" code for `lir_abs` and `lir_neg` from C1. >> >> Testing: >> make run-test TEST="test/hotspot/jtreg/compiler" JTREG="VM_OPTIONS=-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting" >> All passed. >> >> This is how this PR changes the C1 code on Knights Landing CPUs (emulated by -XX:+UseKNLSetting): >> >> `lir_abs` without this patch: >> >> vmovsd -0x63(%rip),%xmm1 >> vpandnd %zmm0,%zmm1,%zmm0 >> >> >> `lir_neg` without this patch: >> >> vmovsd -0x63(%rip),%xmm1 >> vpxord %zmm0,%zmm1,%zmm0 >> >> >> (The `vmovsd` loads the `LIR_OprFact::doubleConst(-0.0)`.) >> >> `lir_abs` with this patch: >> >> vandpd 0xa1b213d(%rip),%xmm0,%xmm0 >> >> >> `lir_neg` with this patch: >> >> vxorpd 0xa12585d(%rip),%xmm0,%xmm0 >> >> >> New code is faster on our machine (using -XX:+UseKNLSetting). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Update Copyright year. Changes in commit 2 - commit 6 look good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22709#pullrequestreview-2547356929 From kvn at openjdk.org Mon Jan 13 17:58:35 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 Jan 2025 17:58:35 GMT Subject: [jdk24] RFR: 8347407: [BACKOUT] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v2] In-Reply-To: References: <6zBwDnhH5Ox-I5h-cyrBHf_o_Em6d16ixMVT8jxl8BE=.a9736d27-e6f8-43ca-9b8e-448ebe730fdc@github.com> Message-ID: On Mon, 13 Jan 2025 11:58:21 GMT, Damon Fenacci wrote: >> Hi all, >> >> This pull request contains a backport of commit [b37f1236](https://github.com/openjdk/jdk/commit/b37f12362507fb2cd291a2b44b4777ba76efd35e) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Damon Fenacci on 13 Jan 2025 and was reviewed by Tobias Hartmann and Vladimir Kozlov. >> >> Includes that were re-added by the original backout in these 2 files >> `src/hotspot/share/c1/c1_Compilation.hpp` >> `src/hotspot/share/c1/c1_IR.hpp` >> could not be cleanly applied but are not needed in the backport as the change happened after the jdk24 branch. >> >> Thanks! > > Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: > > - Update c1_IR.hpp > - Update c1_Compilation.hpp Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23065#pullrequestreview-2547373534 From kvn at openjdk.org Mon Jan 13 18:01:48 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 Jan 2025 18:01:48 GMT Subject: [jdk24] RFR: 8347554: [BACKOUT] C2: implement optimization for series of Add of unique value In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 15:12:07 GMT, Christian Hagedorn wrote: > Hi all, > > This pull request contains a backport of commit [062f2dcf](https://github.com/openjdk/jdk/commit/062f2dcfe5b62cc3dd3c292eeebd7a7ac78f849a) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Christian Hagedorn on 13 Jan 2025 and was reviewed by Tobias Hartmann. > > Thanks! Good. Trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23077#pullrequestreview-2547382665 PR Comment: https://git.openjdk.org/jdk/pull/23077#issuecomment-2587804597 From kvn at openjdk.org Mon Jan 13 18:17:53 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 Jan 2025 18:17:53 GMT Subject: RFR: 8344130: C2: Avoid excessive hoisting in scheduler due to minuscule differences in block frequency In-Reply-To: References: Message-ID: On Wed, 18 Dec 2024 13:41:41 GMT, Daniel Lund?n wrote: > `PhaseCFG::is_cheaper_block` can sometimes excessively hoist instructions through blocks due to minuscule differences in block frequency, even when the differences are likely caused by numerical imprecision in the block frequency computations. We saw an example of where such excessive hoisting stressed the register allocator in [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295), but that issue was in fact two issues: one in the matcher (solved in [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295)) and one in the scheduler (this issue). > > ### Changeset > > Add a small delta to the frequency comparison in `PhaseCFG::is_cheaper_block`. Note that a frequency comparison using the delta is already available in the function when making sure a hoist due to latency does not result in a higher (worse) frequency. I cannot see any reason for why we should not use the same delta in the first block frequency comparison. > > I do not include a regression test since I have not found a good one specific to this issue. I have verified that this fix is an alternative solution to solve the failure in [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295) (for which tests are already present). I also documented the verification steps in the issue description in JBS. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/12181425502) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Performance testing using DaCapo, SPECjbb, and SPECjvm on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. No significant improvements nor regressions. Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22810#pullrequestreview-2547433575 From mdoerr at openjdk.org Mon Jan 13 20:06:52 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 13 Jan 2025 20:06:52 GMT Subject: RFR: 8346038: [REDO] - [C1] LIR Operations with one input should be implemented as LIR_Op1 [v2] In-Reply-To: <7nB330iJ42qBoROI2rfnlDj8UYaG469KueJLTfWTEng=.ec85bfcc-11f6-4c68-b076-08e4410d8a2b@github.com> References: <7nB330iJ42qBoROI2rfnlDj8UYaG469KueJLTfWTEng=.ec85bfcc-11f6-4c68-b076-08e4410d8a2b@github.com> Message-ID: On Fri, 10 Jan 2025 23:03:20 GMT, Martin Doerr wrote: >> 1st commit: Same as https://github.com/openjdk/jdk/commit/a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c >> 2nd commit: Removal of special "Knights Landing" code for `lir_abs` and `lir_neg` from C1. >> >> Testing: >> make run-test TEST="test/hotspot/jtreg/compiler" JTREG="VM_OPTIONS=-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting" >> All passed. >> >> This is how this PR changes the C1 code on Knights Landing CPUs (emulated by -XX:+UseKNLSetting): >> >> `lir_abs` without this patch: >> >> vmovsd -0x63(%rip),%xmm1 >> vpandnd %zmm0,%zmm1,%zmm0 >> >> >> `lir_neg` without this patch: >> >> vmovsd -0x63(%rip),%xmm1 >> vpxord %zmm0,%zmm1,%zmm0 >> >> >> (The `vmovsd` loads the `LIR_OprFact::doubleConst(-0.0)`.) >> >> `lir_abs` with this patch: >> >> vandpd 0xa1b213d(%rip),%xmm0,%xmm0 >> >> >> `lir_neg` with this patch: >> >> vxorpd 0xa12585d(%rip),%xmm0,%xmm0 >> >> >> New code is faster on our machine (using -XX:+UseKNLSetting). > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Update Copyright year. Thank you! Let's ship it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22709#issuecomment-2588079362 From mdoerr at openjdk.org Mon Jan 13 20:06:52 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 13 Jan 2025 20:06:52 GMT Subject: Integrated: 8346038: [REDO] - [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: On Thu, 12 Dec 2024 13:12:45 GMT, Martin Doerr wrote: > 1st commit: Same as https://github.com/openjdk/jdk/commit/a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c > 2nd commit: Removal of special "Knights Landing" code for `lir_abs` and `lir_neg` from C1. > > Testing: > make run-test TEST="test/hotspot/jtreg/compiler" JTREG="VM_OPTIONS=-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting" > All passed. > > This is how this PR changes the C1 code on Knights Landing CPUs (emulated by -XX:+UseKNLSetting): > > `lir_abs` without this patch: > > vmovsd -0x63(%rip),%xmm1 > vpandnd %zmm0,%zmm1,%zmm0 > > > `lir_neg` without this patch: > > vmovsd -0x63(%rip),%xmm1 > vpxord %zmm0,%zmm1,%zmm0 > > > (The `vmovsd` loads the `LIR_OprFact::doubleConst(-0.0)`.) > > `lir_abs` with this patch: > > vandpd 0xa1b213d(%rip),%xmm0,%xmm0 > > > `lir_neg` with this patch: > > vxorpd 0xa12585d(%rip),%xmm0,%xmm0 > > > New code is faster on our machine (using -XX:+UseKNLSetting). This pull request has now been integrated. Changeset: 13e1ea53 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/13e1ea53c547900e76a2c7059893bf24b6ee42dc Stats: 258 lines in 12 files changed: 87 ins; 145 del; 26 mod 8346038: [REDO] - [C1] LIR Operations with one input should be implemented as LIR_Op1 Co-authored-by: Sandhya Viswanathan Reviewed-by: kvn, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/22709 From kvn at openjdk.org Mon Jan 13 20:24:49 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 13 Jan 2025 20:24:49 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v4] In-Reply-To: References: Message-ID: On Fri, 3 Jan 2025 08:27:54 GMT, Emanuel Peter wrote: >> **This is a required step towards adding runtime-checks for Aliasing Analysis, especially Important for FFM / MemorySegments.** >> See: https://eme64.github.io/blog/2025/01/01/AutoVectorization-Status.html >> >> I know this one is large, but it consists of a lot of renamings, and new tests. On the whole, the new `VPointer` code is less than the old! >> >> **Goal** >> >> Replace old `VPointer` with a new version that relies on `MemPointer` - which then is a shared utility for both `MergeStores` and `SuperWord / AutoVectorization`. `MemPointer` generally parses pointers, and `VPointer` specializes this facility for the use in loops (`VLoop`). >> >> The old `VPointer` implementation with its recursive pattern matching was quite complicated and difficult to reason about for correctness. The approach in `MemPointer` is much simpler: iteratively decomposing sub-expressions. Further: the new implementation is more powerful at detecting equivalent invariants. >> >> **Future**: with the `MemPointer` implementation of `VPointer`, it should be easier to implement speculative runtime-checks for Aliasing-Analysis [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). The pressing need for this has come from the FFM / MemorySegment folks, like @mcimadamore and @minborg . >> >> **Details** >> >> This looks like a rather big patch, so let me explain the parts. >> - Refactor of `MemPointer` in `mepointer.hpp/cpp`: >> - Added concept of `Base` to `MemPointer`. This is required for the aliasing computation in `VPointer`. >> - `sub_expression_has_native_base_candidate`: add special case to parse through `CastX2P` if we find a native memory base `MemorySegment.address()`, i.e. `jdk.internal.foreign.NativeMemorySegmentImpl.min`. This helps some native memory segment cases to vectorize that did not before. >> - So far `MemPointer` could only answer adjacency queries. But VPointer also needs overlap queries, see the old `VPointer::not_equal` (i.e. can we prove that the two `VPointer` never overlap?). So I had to add a new case to aliasing computation: `NotOrAtDistance`. It is useful to answer the new and better named `MemPointer::never_overlaps_with`. >> - Collapsed together `MemPointerDecomposedForm` and `MemPointer`. It was an unnecessary and unhelpful split. >> - Re-write of `VPointer` based on `MemPointer`: >> - Old pattern: >> - `VPointer[mem: 847 StoreI, base: 37, adr: 37, base[ 37] + offset( 16) + invar( 0) + scale( 4) * iv]` >> - `VPointer[mem: 31... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 116 commits: > > - copyright 2025 > - Merge branch 'master' into JDK-8343685-VPointer-MemPointer > - manual merge > - fix printing > - rename > - fix up print > - add TestEquivalentInvariants.java > - improve documentation > - hide parser via delegation > - Merge branch 'master' into JDK-8343685-VPointer-MemPointer > - ... and 106 more: https://git.openjdk.org/jdk/compare/84e6432b...b64f9295 I have few comments. src/hotspot/share/opto/memnode.cpp line 2951: > 2949: #endif > 2950: const MemPointer pointer_use(NOT_PRODUCT(trace COMMA) use_store); > 2951: const MemPointer pointer_def(NOT_PRODUCT(trace COMMA) def_store); Why you swapped arguments? Main argument will different in debug vs product VMs. src/hotspot/share/opto/mempointer.cpp line 38: > 36: MemPointer(MemPointerParser::parse(NOT_PRODUCT(trace COMMA) > 37: mem, > 38: callback)) {} Again. Why not product argument first? src/hotspot/share/opto/mempointer.cpp line 243: > 241: // is too deep. The constant is chosen arbitrarily, not too large but big > 242: // enough for all normal cases. > 243: if (worklist.length() > 100) { return false; } May be specify size when creating `worklist` so there is no need for resizing when it is grow. src/hotspot/share/opto/mempointer.hpp line 620: > 618: > 619: private: > 620: NOT_PRODUCT( const TraceMemPointer& _trace; ) Why you prefer `_trace` to be first and not last? src/hotspot/share/opto/mempointer.hpp line 677: > 675: assert(pos == summands.length(), "copied all summands"); > 676: > 677: assert(1 <= _size && _size <= 2048 && is_power_of_2(_size), "valid size"); Where 2048 comes from? Do you have a runtime check somewhere too? src/hotspot/share/opto/noOverflowInt.hpp line 109: > 107: } else if (b.is_NaN()) { > 108: return -1; > 109: } This is strange NaN compare results. May be add comment explaining that it is not really float arithmetic "NaN". src/hotspot/share/opto/superword.cpp line 500: > 498: > 499: // We use two comparisons, because a subtraction could underflow. > 500: #define RETURN_CMP_VALUE_IF_NOT_EQUAL(a, b) \ Please use local static function instead of macro - you can't step through macros in debugger. test/hotspot/jtreg/compiler/loopopts/superword/TestEquivalentInvariants.java line 2: > 1: /* > 2: * Copyright (c) 2024, 2025, Oracle and/or its affiliates. All rights reserved. This is new file. Why two years? test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 655: > 653: // FAILS: invariants are sorted differently, because of differently inserted Cast. > 654: // See: JDK-8330274 > 655: // Interestingly, it now passes for native, but not for objects. Should we list new success conditions instead of just commenting old? test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 674: > 672: // FAILS: invariants are sorted differently, because of differently inserted Cast. > 673: // See: JDK-8330274 > 674: // Interestingly, it now passes for native, but not for objects. The same. May be skip these 2 tests. ------------- PR Review: https://git.openjdk.org/jdk/pull/21926#pullrequestreview-2547675401 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1913715311 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1913723129 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1913729662 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1913742360 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1913743816 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1913715176 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1913757811 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1913748892 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1913752684 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1913753662 From dholmes at openjdk.org Tue Jan 14 01:42:53 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Jan 2025 01:42:53 GMT Subject: RFR: 8347627: Compiler replay tests are failing after JDK-8346990 Message-ID: The `%#zx` format specifier only prepends `0x` to non-zero values but we need it for all, so switch to manual `0x%zx`. I claim this as a trivial (and urgent) fix. The only other use of this is in a test where it is fine. Testing: - compiler replay tests now pass - tiers 1-3 (sanity) Thanks ------------- Commit messages: - 8347627: Compiler replay tests are failing after JDK-8346990 Changes: https://git.openjdk.org/jdk/pull/23093/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23093&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347627 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23093/head:pull/23093 PR: https://git.openjdk.org/jdk/pull/23093 From dholmes at openjdk.org Tue Jan 14 01:42:53 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Jan 2025 01:42:53 GMT Subject: RFR: 8347627: Compiler replay tests are failing after JDK-8346990 In-Reply-To: References: Message-ID: On Tue, 14 Jan 2025 01:38:03 GMT, Coleen Phillimore wrote: >> The `%#zx` format specifier only prepends `0x` to non-zero values but we need it for all, so switch to manual `0x%zx`. I claim this as a trivial (and urgent) fix. >> >> The only other use of this is in a test where it is fine. >> >> Testing: >> - compiler replay tests now pass >> - tiers 1-3 (sanity) >> >> Thanks > > Yes this looks good. Thank you so much for finding this bug so quickly! Thanks for the review @coleenp . I will wait for some more testing to complete just to be safe. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23093#issuecomment-2588579453 From coleenp at openjdk.org Tue Jan 14 01:42:53 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 14 Jan 2025 01:42:53 GMT Subject: RFR: 8347627: Compiler replay tests are failing after JDK-8346990 In-Reply-To: References: Message-ID: On Tue, 14 Jan 2025 01:35:34 GMT, David Holmes wrote: > The `%#zx` format specifier only prepends `0x` to non-zero values but we need it for all, so switch to manual `0x%zx`. I claim this as a trivial (and urgent) fix. > > The only other use of this is in a test where it is fine. > > Testing: > - compiler replay tests now pass > - tiers 1-3 (sanity) > > Thanks Yes this looks good. Thank you so much for finding this bug so quickly! ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23093#pullrequestreview-2548395718 From dholmes at openjdk.org Tue Jan 14 02:05:25 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Jan 2025 02:05:25 GMT Subject: RFR: 8347627: Compiler replay tests are failing after JDK-8346990 [v2] In-Reply-To: References: Message-ID: > The `%#zx` format specifier only prepends `0x` to non-zero values but we need it for all, so switch to manual `0x%zx`. I claim this as a trivial (and urgent) fix. > > The only other use of this is in a test where it is fine. > > Testing: > - compiler replay tests now pass > - tiers 1-3 (sanity) > > Thanks David Holmes has updated the pull request incrementally with one additional commit since the last revision: Update test cases for clarity of difference ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23093/files - new: https://git.openjdk.org/jdk/pull/23093/files/a8153b02..73e332ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23093&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23093&range=00-01 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23093.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23093/head:pull/23093 PR: https://git.openjdk.org/jdk/pull/23093 From coleenp at openjdk.org Tue Jan 14 02:05:25 2025 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 14 Jan 2025 02:05:25 GMT Subject: RFR: 8347627: Compiler replay tests are failing after JDK-8346990 [v2] In-Reply-To: References: Message-ID: <-dMJTX7zs7sVOf8yb6b_mXyJtpOYOu_wIRJC8nOOZxE=.6b196bc2-2a89-43a4-8354-a7b9ac1a1034@github.com> On Tue, 14 Jan 2025 02:01:40 GMT, David Holmes wrote: >> The `%#zx` format specifier only prepends `0x` to non-zero values but we need it for all, so switch to manual `0x%zx`. I claim this as a trivial (and urgent) fix. >> >> The only other use of this is in a test where it is fine. >> >> Testing: >> - compiler replay tests now pass >> - tiers 1-3 (sanity) >> >> Thanks > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Update test cases for clarity of difference Looks good! Still trivial and urgent. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23093#pullrequestreview-2548437708 From dholmes at openjdk.org Tue Jan 14 03:22:49 2025 From: dholmes at openjdk.org (David Holmes) Date: Tue, 14 Jan 2025 03:22:49 GMT Subject: Integrated: 8347627: Compiler replay tests are failing after JDK-8346990 In-Reply-To: References: Message-ID: On Tue, 14 Jan 2025 01:35:34 GMT, David Holmes wrote: > The `%#zx` format specifier only prepends `0x` to non-zero values but we need it for all, so switch to manual `0x%zx`. I claim this as a trivial (and urgent) fix. > > The only other use of this is in a test where it is fine. > > Testing: > - compiler replay tests now pass > - tiers 1-3 (sanity) > > Thanks This pull request has now been integrated. Changeset: c1d322ff Author: David Holmes URL: https://git.openjdk.org/jdk/commit/c1d322fff42720146dfb3846bd7d8514b1bdf383 Stats: 4 lines in 2 files changed: 3 ins; 0 del; 1 mod 8347627: Compiler replay tests are failing after JDK-8346990 Reviewed-by: coleenp ------------- PR: https://git.openjdk.org/jdk/pull/23093 From qamai at openjdk.org Tue Jan 14 03:40:34 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 14 Jan 2025 03:40:34 GMT Subject: RFR: 8347481: C2: Remove the control input of some nodes In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 14:26:15 GMT, Damon Fenacci wrote: >> Hi, >> >> While working on [JDK-8347365](https://bugs.openjdk.org/browse/JDK-8347365), I noticed that there are some nodes that have their control inputs being set in a seemingly erroneous manner. This patch removes the control inputs for those nodes. >> >> Please review this PR, thanks a lot. > > Nice cleanup @merykitty. Thanks! > I was just wondering if there could be more nodes where the control input is wrongly set (possibly enough for a followup issue?) @dafedafe Thanks, there's `LoadKlassNode` which very rarely takes a control input but the reason given does not convince me. However, it seems to be a less clear cut compared to these nodes so I think a separate RFE is more suitable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23055#issuecomment-2588851372 From fyang at openjdk.org Tue Jan 14 03:54:46 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 Jan 2025 03:54:46 GMT Subject: RFR: 8347489: RISC-V: Misaligned memory access with COH Message-ID: Hi, please consider this change. We have different base_offset (4 bytes instead of 8 bytes aligned) with COH. This causes misaligned memory accesses for several instrinsics like String.Compare or String.Equals. The reason is that we assume 8-byte alignment and process one 8-byte word starting at the first array element for each iteration in the main loop. As a result, we have performance regressions on platforms with slow misaligned memory accesses like Unmatched and Premier P550 SBCs. Correctness test on linux-riscv64: - [x] tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders") (release) - [x] tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:-UseCompactObjectHeaders") (release) - [x] hotspot:tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders") (fastdebug) - [x] hotspot:tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:-UseCompactObjectHeaders") (fastdebug) Performance test on Premier P550 (-XX:+UseParallelGC -XX:+AlwaysPreTouch -Xms8g -Xmx8g): 1. SPECjbb2005 Score Without Patch 1.1 -XX:-UseCompactObjectHeaders: 32666 1.2 -XX:+UseCompactObjectHeaders: 27610 2. SPECjbb2005 Score With Patch 2.1 -XX:-UseCompactObjectHeaders: 32820 2.2 -XX:+UseCompactObjectHeaders: 34179 ------------- Commit messages: - Add assertions - Comment - 8347489: RISC-V: Misaligned memory access with COH Changes: https://git.openjdk.org/jdk/pull/23053/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23053&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8347489 Stats: 123 lines in 3 files changed: 96 ins; 3 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/23053.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23053/head:pull/23053 PR: https://git.openjdk.org/jdk/pull/23053 From mli at openjdk.org Tue Jan 14 03:54:47 2025 From: mli at openjdk.org (Hamlin Li) Date: Tue, 14 Jan 2025 03:54:47 GMT Subject: RFR: 8347489: RISC-V: Misaligned memory access with COH In-Reply-To: References: Message-ID: On Sun, 12 Jan 2025 03:45:45 GMT, Fei Yang wrote: > Hi, please consider this change. > > We have different base_offset (4 bytes instead of 8 bytes aligned) with COH. This causes misaligned memory accesses for several instrinsics like String.Compare or String.Equals. The reason is that we assume 8-byte alignment and process one 8-byte word starting at the first array element for each iteration in the main loop. As a result, we have performance regressions on platforms with slow misaligned memory accesses like Unmatched and Premier P550 SBCs. > > Correctness test on linux-riscv64: > - [x] tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders") (release) > - [x] tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:-UseCompactObjectHeaders") (release) > - [x] hotspot:tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders") (fastdebug) > - [x] hotspot:tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:-UseCompactObjectHeaders") (fastdebug) > > Performance test on Premier P550 (-XX:+UseParallelGC -XX:+AlwaysPreTouch -Xms8g -Xmx8g): > > 1. SPECjbb2005 Score Without Patch > 1.1 -XX:-UseCompactObjectHeaders: 32666 > 1.2 -XX:+UseCompactObjectHeaders: 27610 > > 2. SPECjbb2005 Score With Patch > 2.1 -XX:-UseCompactObjectHeaders: 32820 > 2.2 -XX:+UseCompactObjectHeaders: 34179 Thanks for the patch. Can you post the performance data in description? And some minor comments. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1415: > 1413: int base_offset1 = arrayOopDesc::base_offset_in_bytes(T_BYTE); > 1414: int base_offset2 = arrayOopDesc::base_offset_in_bytes(T_CHAR); > 1415: An assert of either 4 || 8 would be helpful here? src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1439: > 1437: beq(str1, str2, DONE); > 1438: int base_offset = isLL ? base_offset1 : base_offset2; > 1439: if ((base_offset % 8) != 0) { If `AvoidUnalignedAccesses == false`, do we still need this piece of code? src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1693: > 1691: > 1692: // Load 4 bytes once to compare for alignment before main loop. > 1693: if ((base_offset % 8) != 0) { similar comment here. ------------- PR Review: https://git.openjdk.org/jdk/pull/23053#pullrequestreview-2546113978 PR Review Comment: https://git.openjdk.org/jdk/pull/23053#discussion_r1912912442 PR Review Comment: https://git.openjdk.org/jdk/pull/23053#discussion_r1912897443 PR Review Comment: https://git.openjdk.org/jdk/pull/23053#discussion_r1912900082 From fyang at openjdk.org Tue Jan 14 03:54:48 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 Jan 2025 03:54:48 GMT Subject: RFR: 8347489: RISC-V: Misaligned memory access with COH In-Reply-To: References: Message-ID: <59E928hzasXG5OY51fIeP0IQLNC7v9W8ia6PlMujcQ4=.76123ae3-ca35-48b3-90b8-d99807d88f39@github.com> On Mon, 13 Jan 2025 10:05:31 GMT, Hamlin Li wrote: >> Hi, please consider this change. >> >> We have different base_offset (4 bytes instead of 8 bytes aligned) with COH. This causes misaligned memory accesses for several instrinsics like String.Compare or String.Equals. The reason is that we assume 8-byte alignment and process one 8-byte word starting at the first array element for each iteration in the main loop. As a result, we have performance regressions on platforms with slow misaligned memory accesses like Unmatched and Premier P550 SBCs. >> >> Correctness test on linux-riscv64: >> - [x] tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders") (release) >> - [x] tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:-UseCompactObjectHeaders") (release) >> - [x] hotspot:tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders") (fastdebug) >> - [x] hotspot:tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:-UseCompactObjectHeaders") (fastdebug) >> >> Performance test on Premier P550 (-XX:+UseParallelGC -XX:+AlwaysPreTouch -Xms8g -Xmx8g): >> >> 1. SPECjbb2005 Score Without Patch >> 1.1 -XX:-UseCompactObjectHeaders: 32666 >> 1.2 -XX:+UseCompactObjectHeaders: 27610 >> >> 2. SPECjbb2005 Score With Patch >> 2.1 -XX:-UseCompactObjectHeaders: 32820 >> 2.2 -XX:+UseCompactObjectHeaders: 34179 > > Thanks for the patch. > Can you post the performance data in description? > And some minor comments. @Hamlin-Li Thanks for the suggestions. I have updated accordingly. Note that base_offset is 12 and 16 bytes respectively w/wo COH. And I have also added some specjbb2005 scores in PR description for reference. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23053#issuecomment-2588886451 From epeter at openjdk.org Tue Jan 14 06:36:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Jan 2025 06:36:48 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v4] In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 19:38:45 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 116 commits: >> >> - copyright 2025 >> - Merge branch 'master' into JDK-8343685-VPointer-MemPointer >> - manual merge >> - fix printing >> - rename >> - fix up print >> - add TestEquivalentInvariants.java >> - improve documentation >> - hide parser via delegation >> - Merge branch 'master' into JDK-8343685-VPointer-MemPointer >> - ... and 106 more: https://git.openjdk.org/jdk/compare/84e6432b...b64f9295 > > src/hotspot/share/opto/memnode.cpp line 2951: > >> 2949: #endif >> 2950: const MemPointer pointer_use(NOT_PRODUCT(trace COMMA) use_store); >> 2951: const MemPointer pointer_def(NOT_PRODUCT(trace COMMA) def_store); > > Why you swapped arguments? Main argument will different in debug vs product VMs. Ok, I will put `NOT_PRODUCT` last. I think it was somehow easier, but cannot remember why now. > src/hotspot/share/opto/noOverflowInt.hpp line 109: > >> 107: } else if (b.is_NaN()) { >> 108: return -1; >> 109: } > > This is strange NaN compare results. May be add comment explaining that it is not really float arithmetic "NaN". @vnkozlov At the top of the file I explain the meaning of `NaN`: // Wrapper around jint, which detects overflow. // If any operation overflows, then it returns a NaN. class NoOverflowInt { private: bool _is_NaN; // overflow, uninitialized, etc. jint _value; Is that sufficient? Or would you prefer me to rename the int `NaN` to something else? I added a comment line now, I hope that helps locally: static int cmp(const NoOverflowInt& a, const NoOverflowInt& b) { // Order NaN (overflow, uninitialized, etc) after non-NaN. if (a.is_NaN()) { return b.is_NaN() ? 0 : 1; } else if (b.is_NaN()) { return -1; } if (a.value() < b.value()) { return -1; } if (a.value() > b.value()) { return 1; } return 0; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1914314070 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1914313125 From epeter at openjdk.org Tue Jan 14 06:42:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Jan 2025 06:42:59 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v4] In-Reply-To: References: Message-ID: <1Qb_4hwbUeDfd-DdqmvbZ3QqmI_G9muLAFrX8tKlNMQ=.29c6c0ef-e1b8-4ee3-af5a-a473be8228d0@github.com> On Mon, 13 Jan 2025 20:02:16 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 116 commits: >> >> - copyright 2025 >> - Merge branch 'master' into JDK-8343685-VPointer-MemPointer >> - manual merge >> - fix printing >> - rename >> - fix up print >> - add TestEquivalentInvariants.java >> - improve documentation >> - hide parser via delegation >> - Merge branch 'master' into JDK-8343685-VPointer-MemPointer >> - ... and 106 more: https://git.openjdk.org/jdk/compare/84e6432b...b64f9295 > > src/hotspot/share/opto/mempointer.hpp line 620: > >> 618: >> 619: private: >> 620: NOT_PRODUCT( const TraceMemPointer& _trace; ) > > Why you prefer `_trace` to be first and not last? Ah, I think in a previous version I needed the _trace while initializing other parts... but I don't any more. Nice catch! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1914319474 From epeter at openjdk.org Tue Jan 14 06:47:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Jan 2025 06:47:43 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v4] In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 20:03:38 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 116 commits: >> >> - copyright 2025 >> - Merge branch 'master' into JDK-8343685-VPointer-MemPointer >> - manual merge >> - fix printing >> - rename >> - fix up print >> - add TestEquivalentInvariants.java >> - improve documentation >> - hide parser via delegation >> - Merge branch 'master' into JDK-8343685-VPointer-MemPointer >> - ... and 106 more: https://git.openjdk.org/jdk/compare/84e6432b...b64f9295 > > src/hotspot/share/opto/mempointer.hpp line 677: > >> 675: assert(pos == summands.length(), "copied all summands"); >> 676: >> 677: assert(1 <= _size && _size <= 2048 && is_power_of_2(_size), "valid size"); > > Where 2048 comes from? Do you have a runtime check somewhere too? It is just a sanity check: currently no platform has vectors larger than 2048. Replaced the assert with: `assert(1 <= _size && _size <= 2048 && is_power_of_2(_size), "sanity: no vector is expected to be larger");` > test/hotspot/jtreg/compiler/loopopts/superword/TestEquivalentInvariants.java line 2: > >> 1: /* >> 2: * Copyright (c) 2024, 2025, Oracle and/or its affiliates. All rights reserved. > > This is new file. Why two years? Nice catch. I must have just blindly extended it when I updated the copyright year. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1914321984 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1914323015 From xgong at openjdk.org Tue Jan 14 06:50:41 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 14 Jan 2025 06:50:41 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v6] In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 02:23:30 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. >> >> In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Implement apply_identity src/hotspot/share/opto/phaseX.cpp line 2301: > 2299: // that may undo the changes done during lowering. > 2300: > 2301: return k->LoweredIdeal(this); I'm sorry that I still cannot understand well what this method is expected to do for a node. For example, if we need to add some architecture specific optimization for `MulNode` like AArch64, we can add the lowering code in `lower_node_platform` for AArch64, right? Do we also need to override the `LoweredIdeal()` for `MulNode` ? Thanks! src/hotspot/share/opto/phaseX.cpp line 2310: > 2308: Node* PhaseLowering::lower_node(Node* n) { > 2309: // Apply shared lowering transforms > 2310: Per my understanding, this is a backend specific lowering phase, is there any scenario that a platform in-dependent lowering is needed here? As we already have the common GVN phase for common node idealize, is there any difference for such shared transformations? Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1914323626 PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1914320271 From epeter at openjdk.org Tue Jan 14 07:04:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Jan 2025 07:04:44 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v4] In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 19:44:10 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 116 commits: >> >> - copyright 2025 >> - Merge branch 'master' into JDK-8343685-VPointer-MemPointer >> - manual merge >> - fix printing >> - rename >> - fix up print >> - add TestEquivalentInvariants.java >> - improve documentation >> - hide parser via delegation >> - Merge branch 'master' into JDK-8343685-VPointer-MemPointer >> - ... and 106 more: https://git.openjdk.org/jdk/compare/84e6432b...b64f9295 > > src/hotspot/share/opto/mempointer.cpp line 38: > >> 36: MemPointer(MemPointerParser::parse(NOT_PRODUCT(trace COMMA) >> 37: mem, >> 38: callback)) {} > > Again. Why not product argument first? Ah, here I wanted to do the optional parameter `callback` (default `empty()`). You can only put parameters with default at the last position. I decided to refactor this with 2 constructors, one with and one without `callback` parameter. The one without just delegates to the one with, passing the default empty parameter. It's a little more code but works. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1914337700 From epeter at openjdk.org Tue Jan 14 07:23:55 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Jan 2025 07:23:55 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v4] In-Reply-To: References: Message-ID: On Tue, 14 Jan 2025 06:34:06 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/memnode.cpp line 2951: >> >>> 2949: #endif >>> 2950: const MemPointer pointer_use(NOT_PRODUCT(trace COMMA) use_store); >>> 2951: const MemPointer pointer_def(NOT_PRODUCT(trace COMMA) def_store); >> >> Why you swapped arguments? Main argument will different in debug vs product VMs. > > Ok, I will put `NOT_PRODUCT` last. I think it was somehow easier, but cannot remember why now. Ah, yes. It was because of parameters with default values, see other comments. But I'm now handling it different after your comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1914354669 From dfenacci at openjdk.org Tue Jan 14 07:34:37 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 14 Jan 2025 07:34:37 GMT Subject: [jdk24] RFR: 8347407: [BACKOUT] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) [v2] In-Reply-To: References: <6zBwDnhH5Ox-I5h-cyrBHf_o_Em6d16ixMVT8jxl8BE=.a9736d27-e6f8-43ca-9b8e-448ebe730fdc@github.com> Message-ID: On Mon, 13 Jan 2025 11:54:08 GMT, Tobias Hartmann wrote: >> Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update c1_IR.hpp >> - Update c1_Compilation.hpp > > Looks good and trivial to me. Thanks @TobiHartmann and @vnkozlov for your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23065#issuecomment-2589210615 From epeter at openjdk.org Tue Jan 14 07:34:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Jan 2025 07:34:50 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v4] In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 19:50:24 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 116 commits: >> >> - copyright 2025 >> - Merge branch 'master' into JDK-8343685-VPointer-MemPointer >> - manual merge >> - fix printing >> - rename >> - fix up print >> - add TestEquivalentInvariants.java >> - improve documentation >> - hide parser via delegation >> - Merge branch 'master' into JDK-8343685-VPointer-MemPointer >> - ... and 106 more: https://git.openjdk.org/jdk/compare/84e6432b...b64f9295 > > src/hotspot/share/opto/mempointer.cpp line 243: > >> 241: // is too deep. The constant is chosen arbitrarily, not too large but big >> 242: // enough for all normal cases. >> 243: if (worklist.length() > 100) { return false; } > > May be specify size when creating `worklist` so there is no need for resizing when it is grow. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1914364605 From thartmann at openjdk.org Tue Jan 14 07:36:41 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 14 Jan 2025 07:36:41 GMT Subject: [jdk24] RFR: 8347554: [BACKOUT] C2: implement optimization for series of Add of unique value In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 15:12:07 GMT, Christian Hagedorn wrote: > Hi all, > > This pull request contains a backport of commit [062f2dcf](https://github.com/openjdk/jdk/commit/062f2dcfe5b62cc3dd3c292eeebd7a7ac78f849a) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Christian Hagedorn on 13 Jan 2025 and was reviewed by Tobias Hartmann. > > Thanks! Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23077#pullrequestreview-2548984029 From dfenacci at openjdk.org Tue Jan 14 07:37:44 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 14 Jan 2025 07:37:44 GMT Subject: [jdk24] Integrated: 8347407: [BACKOUT] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) In-Reply-To: <6zBwDnhH5Ox-I5h-cyrBHf_o_Em6d16ixMVT8jxl8BE=.a9736d27-e6f8-43ca-9b8e-448ebe730fdc@github.com> References: <6zBwDnhH5Ox-I5h-cyrBHf_o_Em6d16ixMVT8jxl8BE=.a9736d27-e6f8-43ca-9b8e-448ebe730fdc@github.com> Message-ID: On Mon, 13 Jan 2025 10:23:52 GMT, Damon Fenacci wrote: > Hi all, > > This pull request contains a backport of commit [b37f1236](https://github.com/openjdk/jdk/commit/b37f12362507fb2cd291a2b44b4777ba76efd35e) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Damon Fenacci on 13 Jan 2025 and was reviewed by Tobias Hartmann and Vladimir Kozlov. > > Includes that were re-added by the original backout in these 2 files > `src/hotspot/share/c1/c1_Compilation.hpp` > `src/hotspot/share/c1/c1_IR.hpp` > could not be cleanly applied but are not needed in the backport as the change happened after the jdk24 branch. > > Thanks! This pull request has now been integrated. Changeset: e76cc445 Author: Damon Fenacci URL: https://git.openjdk.org/jdk/commit/e76cc445025bf948f4255fd96d78ec85573ed1f0 Stats: 40 lines in 7 files changed: 3 ins; 26 del; 11 mod 8347407: [BACKOUT] C1/C2 don't handle allocation failure properly during initialization (RuntimeStub::new_runtime_stub fatal crash) Reviewed-by: thartmann, kvn Backport-of: b37f12362507fb2cd291a2b44b4777ba76efd35e ------------- PR: https://git.openjdk.org/jdk/pull/23065 From epeter at openjdk.org Tue Jan 14 07:40:44 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Jan 2025 07:40:44 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v4] In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 20:12:22 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 116 commits: >> >> - copyright 2025 >> - Merge branch 'master' into JDK-8343685-VPointer-MemPointer >> - manual merge >> - fix printing >> - rename >> - fix up print >> - add TestEquivalentInvariants.java >> - improve documentation >> - hide parser via delegation >> - Merge branch 'master' into JDK-8343685-VPointer-MemPointer >> - ... and 106 more: https://git.openjdk.org/jdk/compare/84e6432b...b64f9295 > > test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 655: > >> 653: // FAILS: invariants are sorted differently, because of differently inserted Cast. >> 654: // See: JDK-8330274 >> 655: // Interestingly, it now passes for native, but not for objects. > > Should we list new success conditions instead of just commenting old? I cannot make good conditions currently, sadly. IR `applyIf` can rely on VM flags, CPU features etc. But in my case, it would pass for `native` memory, but not for `array` cases. And that is decided by the test command line arguments of the runs. It gets passed in like `-DmemorySegmentProviderNameForTestVM=Native` or `-DmemorySegmentProviderNameForTestVM=ByteArray` etc. This one is array, and for some reason does not currently parse pointers sufficiently well to vectoirze: ` * @run driver compiler.loopopts.superword.TestMemorySegment ByteArray` But this is native, and vectorizes: ` * @run driver compiler.loopopts.superword.TestMemorySegment Native` @vnkozlov @chhagedorn is there any way I can currently do an `applyIf` for that? I could remove the IR rule rather than comment if, if that is better for you. > test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java line 674: > >> 672: // FAILS: invariants are sorted differently, because of differently inserted Cast. >> 673: // See: JDK-8330274 >> 674: // Interestingly, it now passes for native, but not for objects. > > The same. May be skip these 2 tests. What do you mean by skip? Remove the IR rule rather than comment it out? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1914369689 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1914369953 From epeter at openjdk.org Tue Jan 14 07:46:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Jan 2025 07:46:53 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v4] In-Reply-To: References: Message-ID: <5f71ZGdtgk5antOiUyhln96oHTyvtja5VvKls9rjKGA=.dd8108a4-e00b-4365-9d68-182bf9cf7a70@github.com> On Mon, 13 Jan 2025 20:17:23 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 116 commits: >> >> - copyright 2025 >> - Merge branch 'master' into JDK-8343685-VPointer-MemPointer >> - manual merge >> - fix printing >> - rename >> - fix up print >> - add TestEquivalentInvariants.java >> - improve documentation >> - hide parser via delegation >> - Merge branch 'master' into JDK-8343685-VPointer-MemPointer >> - ... and 106 more: https://git.openjdk.org/jdk/compare/84e6432b...b64f9295 > > src/hotspot/share/opto/superword.cpp line 500: > >> 498: >> 499: // We use two comparisons, because a subtraction could underflow. >> 500: #define RETURN_CMP_VALUE_IF_NOT_EQUAL(a, b) \ > > Please use local static function instead of macro - you can't step through macros in debugger. Hmm, that is tricky here: // We use two comparisons, because a subtraction could underflow. #define RETURN_CMP_VALUE_IF_NOT_EQUAL(a, b) \ if (a < b) { return -1; } \ if (a > b) { return 1; } The macro only returns if we get a non-equal case. But it does not return when the values are equal. This is helpful in code where there are repeated comparisons, and we want to only return once we have a non-equal. Example: RETURN_CMP_VALUE_IF_NOT_EQUAL(a_con, b_con); RETURN_CMP_VALUE_IF_NOT_EQUAL(a->original_index(), b->original_index()); We first compare the `con`, but if they are the same we do not return but compare the `original_index`. It simplifies the code. But I suppose I can refactor it now, especially because I use the macro less often. Before refactoring it was something like: -// To be in the same group, two VPointers must be the same, -// except for the offset. -int VPointer::cmp_for_sort_by_group(const VPointer** p1, const VPointer** p2) { - const VPointer* a = *p1; - const VPointer* b = *p2; - - RETURN_CMP_VALUE_IF_NOT_EQUAL(a->base()->_idx, b->base()->_idx); - RETURN_CMP_VALUE_IF_NOT_EQUAL(a->mem()->Opcode(), b->mem()->Opcode()); - RETURN_CMP_VALUE_IF_NOT_EQUAL(a->scale_in_bytes(), b->scale_in_bytes()); - - int a_inva_idx = a->invar() == nullptr ? 0 : a->invar()->_idx; - int b_inva_idx = b->invar() == nullptr ? 0 : b->invar()->_idx; - RETURN_CMP_VALUE_IF_NOT_EQUAL(a_inva_idx, b_inva_idx); - - return 0; // equal -} ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1914376637 From fyang at openjdk.org Tue Jan 14 07:55:51 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 Jan 2025 07:55:51 GMT Subject: RFR: 8347489: RISC-V: Misaligned memory access with COH [v2] In-Reply-To: References: Message-ID: > Hi, please consider this change. > > We have different base_offset (4 bytes instead of 8 bytes aligned) with COH. This causes misaligned memory accesses for several instrinsics like String.Compare or String.Equals. The reason is that we assume 8-byte alignment and process one 8-byte word starting at the first array element for each iteration in the main loop. As a result, we have performance regressions on platforms with slow misaligned memory accesses like Unmatched and Premier P550 SBCs. > > Correctness test on linux-riscv64: > - [x] tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders") (release) > - [x] tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:-UseCompactObjectHeaders") (release) > - [x] hotspot:tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders") (fastdebug) > - [x] hotspot:tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:-UseCompactObjectHeaders") (fastdebug) > > Performance test on Premier P550 (-XX:+UseParallelGC -XX:+AlwaysPreTouch -Xms8g -Xmx8g): > > 1. SPECjbb2005 Score Without Patch > 1.1 -XX:-UseCompactObjectHeaders: 32666 > 1.2 -XX:+UseCompactObjectHeaders: 27610 > > 2. SPECjbb2005 Score With Patch > 2.1 -XX:-UseCompactObjectHeaders: 32820 > 2.2 -XX:+UseCompactObjectHeaders: 34179 Fei Yang has updated the pull request incrementally with one additional commit since the last revision: Fix assertions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23053/files - new: https://git.openjdk.org/jdk/pull/23053/files/f9efaa3d..23d9add9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23053&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23053&range=00-01 Stats: 16 lines in 2 files changed: 8 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23053.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23053/head:pull/23053 PR: https://git.openjdk.org/jdk/pull/23053 From thartmann at openjdk.org Tue Jan 14 07:57:36 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 14 Jan 2025 07:57:36 GMT Subject: [jdk24] RFR: 8346831: Remove the extra closing parenthesis in CTW Makefile In-Reply-To: References: Message-ID: On Sun, 29 Dec 2024 11:33:43 GMT, Qizheng Xing wrote: > Hi all, > > This pull request contains a backport of commit [79958470](https://github.com/openjdk/jdk/commit/79958470e08ade2d3584748e020bd2e18092c0cf) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Qizheng Xing on 29 Dec 2024 and was reviewed by Chen Liang, Kim Barrett, Leonid Mesnik and Julian Waters. > > Thanks! Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22891#pullrequestreview-2549014531 From xgong at openjdk.org Tue Jan 14 08:11:40 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 14 Jan 2025 08:11:40 GMT Subject: RFR: 8342676: Unsigned Vector Min / Max transforms [v2] In-Reply-To: References: <21riF_Q0FMyzOh_sakTclKfYa-nJm4klfkyHEYi4ctI=.76933a14-fb5e-447e-873a-59a2b870b842@github.com> Message-ID: On Tue, 7 Jan 2025 08:58:12 GMT, Jatin Bhateja wrote: >> Adding following IR transforms for unsigned vector Min / Max nodes. >> >> => UMinV (UMinV(a, b), UMaxV(a, b)) => UMinV(a, b) >> => UMinV (UMinV(a, b), UMaxV(b, a)) => UMinV(a, b) >> => UMaxV (UMinV(a, b), UMaxV(a, b)) => UMaxV(a, b) >> => UMaxV (UMinV(a, b), UMaxV(b, a)) => UMaxV(a, b) >> => UMaxV (a, a) => a >> => UMinV (a, a) => a >> >> New IR validation test accompanies the patch. >> >> This is a follow-up PR for https://github.com/openjdk/jdk/pull/20507 >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Updating copyright year of modified files > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 > - Update IR transforms and tests > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8342676 > - 8342676: Unsigned Vector Min / Max transforms test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 1213: > 1211: > 1212: public static final String UMIN_VB = VECTOR_PREFIX + "UMIN_VB" + POSTFIX; > 1213: static { Suggestion: public static final String UMIN_VB = VECTOR_PREFIX + "UMIN_VB" + POSTFIX; static { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21604#discussion_r1914397192 From epeter at openjdk.org Tue Jan 14 08:15:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Jan 2025 08:15:21 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v5] In-Reply-To: References: Message-ID: > **This is a required step towards adding runtime-checks for Aliasing Analysis, especially Important for FFM / MemorySegments.** > See: https://eme64.github.io/blog/2025/01/01/AutoVectorization-Status.html > > I know this one is large, but it consists of a lot of renamings, and new tests. On the whole, the new `VPointer` code is less than the old! > > **Goal** > > Replace old `VPointer` with a new version that relies on `MemPointer` - which then is a shared utility for both `MergeStores` and `SuperWord / AutoVectorization`. `MemPointer` generally parses pointers, and `VPointer` specializes this facility for the use in loops (`VLoop`). > > The old `VPointer` implementation with its recursive pattern matching was quite complicated and difficult to reason about for correctness. The approach in `MemPointer` is much simpler: iteratively decomposing sub-expressions. Further: the new implementation is more powerful at detecting equivalent invariants. > > **Future**: with the `MemPointer` implementation of `VPointer`, it should be easier to implement speculative runtime-checks for Aliasing-Analysis [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). The pressing need for this has come from the FFM / MemorySegment folks, like @mcimadamore and @minborg . > > **Details** > > This looks like a rather big patch, so let me explain the parts. > - Refactor of `MemPointer` in `mepointer.hpp/cpp`: > - Added concept of `Base` to `MemPointer`. This is required for the aliasing computation in `VPointer`. > - `sub_expression_has_native_base_candidate`: add special case to parse through `CastX2P` if we find a native memory base `MemorySegment.address()`, i.e. `jdk.internal.foreign.NativeMemorySegmentImpl.min`. This helps some native memory segment cases to vectorize that did not before. > - So far `MemPointer` could only answer adjacency queries. But VPointer also needs overlap queries, see the old `VPointer::not_equal` (i.e. can we prove that the two `VPointer` never overlap?). So I had to add a new case to aliasing computation: `NotOrAtDistance`. It is useful to answer the new and better named `MemPointer::never_overlaps_with`. > - Collapsed together `MemPointerDecomposedForm` and `MemPointer`. It was an unnecessary and unhelpful split. > - Re-write of `VPointer` based on `MemPointer`: > - Old pattern: > - `VPointer[mem: 847 StoreI, base: 37, adr: 37, base[ 37] + offset( 16) + invar( 0) + scale( 4) * iv]` > - `VPointer[mem: 3189 LoadB, base: 1, adr: 2273, base[ 1] ... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 123 commits: - Merge branch 'master' into JDK-8343685-VPointer-MemPointer - for vnkozlov part 6 - for vnkozlov part 5 - for vnkozlov part 4 - for vnkozlov part 3 - for vnkozlov part 2 - for vnkozlov part 1 - copyright 2025 - Merge branch 'master' into JDK-8343685-VPointer-MemPointer - manual merge - ... and 113 more: https://git.openjdk.org/jdk/compare/c1d322ff...7f101622 ------------- Changes: https://git.openjdk.org/jdk/pull/21926/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21926&range=04 Stats: 4065 lines in 18 files changed: 1861 ins; 1539 del; 665 mod Patch: https://git.openjdk.org/jdk/pull/21926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21926/head:pull/21926 PR: https://git.openjdk.org/jdk/pull/21926 From epeter at openjdk.org Tue Jan 14 08:15:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Jan 2025 08:15:22 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v4] In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 20:22:06 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 116 commits: >> >> - copyright 2025 >> - Merge branch 'master' into JDK-8343685-VPointer-MemPointer >> - manual merge >> - fix printing >> - rename >> - fix up print >> - add TestEquivalentInvariants.java >> - improve documentation >> - hide parser via delegation >> - Merge branch 'master' into JDK-8343685-VPointer-MemPointer >> - ... and 106 more: https://git.openjdk.org/jdk/compare/84e6432b...b64f9295 > > I have few comments. @vnkozlov thanks for having a look at it! I think I have addressed all your points. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21926#issuecomment-2589269998 From epeter at openjdk.org Tue Jan 14 08:15:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Jan 2025 08:15:22 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v4] In-Reply-To: <5f71ZGdtgk5antOiUyhln96oHTyvtja5VvKls9rjKGA=.dd8108a4-e00b-4365-9d68-182bf9cf7a70@github.com> References: <5f71ZGdtgk5antOiUyhln96oHTyvtja5VvKls9rjKGA=.dd8108a4-e00b-4365-9d68-182bf9cf7a70@github.com> Message-ID: On Tue, 14 Jan 2025 07:44:27 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/superword.cpp line 500: >> >>> 498: >>> 499: // We use two comparisons, because a subtraction could underflow. >>> 500: #define RETURN_CMP_VALUE_IF_NOT_EQUAL(a, b) \ >> >> Please use local static function instead of macro - you can't step through macros in debugger. > > Hmm, that is tricky here: > > // We use two comparisons, because a subtraction could underflow. > #define RETURN_CMP_VALUE_IF_NOT_EQUAL(a, b) \ > if (a < b) { return -1; } \ > if (a > b) { return 1; } > > The macro only returns if we get a non-equal case. But it does not return when the values are equal. > > This is helpful in code where there are repeated comparisons, and we want to only return once we have a non-equal. > > Example: > > RETURN_CMP_VALUE_IF_NOT_EQUAL(a_con, b_con); > > RETURN_CMP_VALUE_IF_NOT_EQUAL(a->original_index(), b->original_index()); > > We first compare the `con`, but if they are the same we do not return but compare the `original_index`. > > It simplifies the code. But I suppose I can refactor it now, especially because I use the macro less often. Before refactoring it was something like: > > -// To be in the same group, two VPointers must be the same, > -// except for the offset. > -int VPointer::cmp_for_sort_by_group(const VPointer** p1, const VPointer** p2) { > - const VPointer* a = *p1; > - const VPointer* b = *p2; > - > - RETURN_CMP_VALUE_IF_NOT_EQUAL(a->base()->_idx, b->base()->_idx); > - RETURN_CMP_VALUE_IF_NOT_EQUAL(a->mem()->Opcode(), b->mem()->Opcode()); > - RETURN_CMP_VALUE_IF_NOT_EQUAL(a->scale_in_bytes(), b->scale_in_bytes()); > - > - int a_inva_idx = a->invar() == nullptr ? 0 : a->invar()->_idx; > - int b_inva_idx = b->invar() == nullptr ? 0 : b->invar()->_idx; > - RETURN_CMP_VALUE_IF_NOT_EQUAL(a_inva_idx, b_inva_idx); > - > - return 0; // equal > -} Ok, this is my solution now: + + // We use two comparisons, because a subtraction could underflow. + template + static int cmp_code(T a, T b) { + if (a < b) { return -1; } + if (a > b) { return 1; } + return 0; + } And use it like this: + int c_con = cmp_code(a_con, b_con); + if (c_con != 0) { return c_con; } Instead of `RETURN_CMP_VALUE_IF_NOT_EQUAL(a_con, b_con);` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1914401876 From xgong at openjdk.org Tue Jan 14 08:24:41 2025 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 14 Jan 2025 08:24:41 GMT Subject: RFR: 8342393: Promote commutative vector IR node sharing [v7] In-Reply-To: <4cKmKYlejtKkkUOV9NPh4V6qeaPyIczG3zrddCC9CxU=.0c3503a8-72d4-4911-94fc-0a30ac9fed29@github.com> References: <7TzEoPWnq71MZZOzF_HBXr59hMAX_eNgu12ouhjalm8=.0dd0ce75-ecdb-4e58-86b4-82fb04eceea8@github.com> <4cKmKYlejtKkkUOV9NPh4V6qeaPyIczG3zrddCC9CxU=.0c3503a8-72d4-4911-94fc-0a30ac9fed29@github.com> Message-ID: On Thu, 9 Jan 2025 11:33:17 GMT, Jatin Bhateja wrote: >> Patch promotes the sharing of commutative vector IR with the same inputs but different input ordering. >> Unlike scalar IR where we perform edge swapping by [sorting inputs](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/addnode.cpp#L122) based on node indices during IR idealization, for vector IR we chose a simpler approach to decorate commutative operations with a special node-level flag during IR construction thus >> obviating any dependency on explicit idealization routines. This flag is later used during GVN hashing to enable node sharing. >> >> Following are the performance stats for JMH micro included with the patch. >> >> >> Granite Rapids (P-core Xeon Server) >> Baseline : >> Benchmark (size) Mode Cnt Score Error Units >> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 8982.549 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 6072.773 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2368.856 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 15215.087 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 11963.554 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 7036.088 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 2906.731 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeShortOperationShairing 1024 thrpt 2 17148.131 ops/ms >> >> Sierra Forest (E-core Xeon Server) >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> VectorCommutativeOperSharingBenchmark.commutativeByteOperationShairing 1024 thrpt 2 2444.359 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeIntOperationShairing 1024 thrpt 2 1710.256 ops/ms >> VectorCommutativeOperSharingBenchmark.commutativeLongOperationShairing 1024 thrpt 2 308.766 ops/ms >> VectorCommutativeOperSharingBenc... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > GHA fix src/hotspot/share/opto/vectornode.hpp line 191: > 189: AddVBNode(Node* in1, Node* in2, const TypeVect* vt) : VectorNode(in1,in2,vt) { > 190: add_flag(Node::Flag_is_commutative_vector_oper); > 191: } So does this still work if these vector nodes are predicated which may append another mask input in future? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22863#discussion_r1914416538 From thartmann at openjdk.org Tue Jan 14 09:12:51 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 14 Jan 2025 09:12:51 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v6] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: <1JzxRMD4fPivgxvQFkWBq66zg8cPZRu7BwwyboXDm6M=.20cbce3b-62a3-4dfb-a232-d6ccd641e481@github.com> On Mon, 16 Dec 2024 16:59:57 GMT, Boris Ulasevich wrote: >> This change relocates mutable data (such as relocations, oops, and metadata) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data from the CodeCache. >> >> The core idea remains the same: use the CodeCache for executable code while moving additional data to the C heap. The primary motivations are improving security and enhancing code density. >> >> Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. >> >> The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): >> - nmethod_count:134000, total_compilation_time: 510460ms >> - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, >> - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB >> >> Functional testing: jtreg on arm/aarch/x86. >> Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. >> >> Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > removing dead code I see this failure in our testing with an internal test: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/workspace/open/src/hotspot/share/code/nmethod.cpp:1513), pid=1863674, tid=1863692 # assert(static_cast(_metadata_offset) == reloc_size + oop_size) failed: failed: 21744 != 87280 Current CompileTask: C2:61370 3693 b 4 org.apache.lucene.analysis.en.KStemData1:: (27623 bytes) Stack: [0x00007f3acf3f5000,0x00007f3acf4f5000], sp=0x00007f3acf4f0a50, free space=1006k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x14d9151] nmethod::nmethod(Method*, CompilerType, int, int, int, int, int, unsigned char*, CodeOffsets*, int, DebugInformationRecorder*, Dependencies*, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, CompLevel, char*, int, JVMCINMethodData*)+0x981 (nmethod.cpp:1513) V [libjvm.so+0x14d94df] nmethod::new_nmethod(methodHandle const&, int, int, CodeOffsets*, int, DebugInformationRecorder*, Dependencies*, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, CompLevel, char*, int, JVMCINMethodData*)+0x22f (nmethod.cpp:1200) V [libjvm.so+0x926cb4] ciEnv::register_method(ciMethod*, int, CodeOffsets*, int, CodeBuffer*, int, OopMapSet*, ExceptionHandlerTable*, ImplicitExceptionTable*, AbstractCompiler*, bool, bool, bool, bool, int)+0x4c4 (ciEnv.cpp:1063) V [libjvm.so+0x157b0d5] PhaseOutput::install_code(ciMethod*, int, AbstractCompiler*, bool, bool)+0x125 (output.cpp:3443) V [libjvm.so+0xa54d02] Compile::Code_Gen()+0x612 (compile.cpp:3033) V [libjvm.so+0xa579cf] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1c6f (compile.cpp:885) V [libjvm.so+0x89f495] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1d5 (c2compiler.cpp:142) V [libjvm.so+0xa63a88] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x928 (compileBroker.cpp:2319) V [libjvm.so+0xa647c8] CompileBroker::compiler_thread_loop()+0x528 (compileBroker.cpp:1977) V [libjvm.so+0xf3478e] JavaThread::thread_main_inner()+0xee (javaThread.cpp:777) V [libjvm.so+0x1880536] Thread::call_run()+0xb6 (thread.cpp:232) V [libjvm.so+0x155a188] thread_native_entry(Thread*)+0x128 (os_linux.cpp:860) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2589382812 From thartmann at openjdk.org Tue Jan 14 09:24:53 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 14 Jan 2025 09:24:53 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v6] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: <-Yk7IjexTa4X8G1Cegq8fcxH6bDhB_BnTDS4uTdSVsQ=.c29895ee-7c1e-467f-af24-6265061fa439@github.com> On Mon, 16 Dec 2024 16:59:57 GMT, Boris Ulasevich wrote: >> This change relocates mutable data (such as relocations, oops, and metadata) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data from the CodeCache. >> >> The core idea remains the same: use the CodeCache for executable code while moving additional data to the C heap. The primary motivations are improving security and enhancing code density. >> >> Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. >> >> The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): >> - nmethod_count:134000, total_compilation_time: 510460ms >> - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, >> - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB >> >> Functional testing: jtreg on arm/aarch/x86. >> Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. >> >> Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > removing dead code Looks like this also triggers with `compiler/escapeAnalysis/TestFindInstMemRecursion.java` on Linux x64 and `-XX:-TieredCompilation -XX:+StressReflectiveCode -XX:-ReduceInitialCardMarks -XX:-ReduceBulkZeroing -XX:-ReduceFieldZeroing`. This change needs more testing before integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21276#issuecomment-2589407250 From vkempik at openjdk.org Tue Jan 14 09:58:46 2025 From: vkempik at openjdk.org (Vladimir Kempik) Date: Tue, 14 Jan 2025 09:58:46 GMT Subject: RFR: 8347489: RISC-V: Misaligned memory access with COH [v2] In-Reply-To: References: Message-ID: On Tue, 14 Jan 2025 07:55:51 GMT, Fei Yang wrote: >> Hi, please consider this change. >> >> We have different base_offset (4 bytes instead of 8 bytes aligned) with COH. This causes misaligned memory accesses for several instrinsics like String.Compare or String.Equals. The reason is that we assume 8-byte alignment and process one 8-byte word starting at the first array element for each iteration in the main loop. As a result, we have performance regressions on platforms with slow misaligned memory accesses like Unmatched and Premier P550 SBCs. >> PS: Same issue is there when `UseCompressedClassPointers` is disabled. >> >> Correctness test on linux-riscv64: >> - [x] tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders") (release) >> - [x] tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:-UseCompactObjectHeaders") (release) >> - [x] hotspot:tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders") (fastdebug) >> - [x] hotspot:tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:-UseCompactObjectHeaders") (fastdebug) >> >> Performance test on Premier P550 (-XX:+UseParallelGC -XX:+AlwaysPreTouch -Xms8g -Xmx8g): >> >> 1. SPECjbb2005 Score Without Patch >> 1.1 -XX:-UseCompactObjectHeaders: 32666 >> 1.2 -XX:+UseCompactObjectHeaders: 27610 >> >> 2. SPECjbb2005 Score With Patch >> 2.1 -XX:-UseCompactObjectHeaders: 32820 >> 2.2 -XX:+UseCompactObjectHeaders: 34179 > > Fei Yang has updated the pull request incrementally with one additional commit since the last revision: > > Fix assertions src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4466: > 4464: > 4465: if (multi_block) { > 4466: __ subi(consts, consts, vset_sew == Assembler::e32 ? 240 : 608); makes the code less readable, maybe keep total_adds but keep the change of addi to subi ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23053#discussion_r1914544385 From dlunden at openjdk.org Tue Jan 14 10:26:51 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 14 Jan 2025 10:26:51 GMT Subject: RFR: 8344130: C2: Avoid excessive hoisting in scheduler due to minuscule differences in block frequency In-Reply-To: References: Message-ID: On Wed, 18 Dec 2024 13:41:41 GMT, Daniel Lund?n wrote: > `PhaseCFG::is_cheaper_block` can sometimes excessively hoist instructions through blocks due to minuscule differences in block frequency, even when the differences are likely caused by numerical imprecision in the block frequency computations. We saw an example of where such excessive hoisting stressed the register allocator in [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295), but that issue was in fact two issues: one in the matcher (solved in [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295)) and one in the scheduler (this issue). > > ### Changeset > > Add a small delta to the frequency comparison in `PhaseCFG::is_cheaper_block`. Note that a frequency comparison using the delta is already available in the function when making sure a hoist due to latency does not result in a higher (worse) frequency. I cannot see any reason for why we should not use the same delta in the first block frequency comparison. > > I do not include a regression test since I have not found a good one specific to this issue. I have verified that this fix is an alternative solution to solve the failure in [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295) (for which tests are already present). I also documented the verification steps in the issue description in JBS. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/12181425502) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Performance testing using DaCapo, SPECjbb, and SPECjvm on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. No significant improvements nor regressions. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22810#issuecomment-2589536153 From dlunden at openjdk.org Tue Jan 14 10:26:52 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 14 Jan 2025 10:26:52 GMT Subject: Integrated: 8344130: C2: Avoid excessive hoisting in scheduler due to minuscule differences in block frequency In-Reply-To: References: Message-ID: <_eOY-VBHonzuNXmcKvfmXJ86naZhRIchIsdYLEq91xw=.0d24f7a4-adbb-48b5-888d-664be240158b@github.com> On Wed, 18 Dec 2024 13:41:41 GMT, Daniel Lund?n wrote: > `PhaseCFG::is_cheaper_block` can sometimes excessively hoist instructions through blocks due to minuscule differences in block frequency, even when the differences are likely caused by numerical imprecision in the block frequency computations. We saw an example of where such excessive hoisting stressed the register allocator in [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295), but that issue was in fact two issues: one in the matcher (solved in [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295)) and one in the scheduler (this issue). > > ### Changeset > > Add a small delta to the frequency comparison in `PhaseCFG::is_cheaper_block`. Note that a frequency comparison using the delta is already available in the function when making sure a hoist due to latency does not result in a higher (worse) frequency. I cannot see any reason for why we should not use the same delta in the first block frequency comparison. > > I do not include a regression test since I have not found a good one specific to this issue. I have verified that this fix is an alternative solution to solve the failure in [JDK-8331295](https://bugs.openjdk.org/browse/JDK-8331295) (for which tests are already present). I also documented the verification steps in the issue description in JBS. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/12181425502) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Performance testing using DaCapo, SPECjbb, and SPECjvm on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. No significant improvements nor regressions. This pull request has now been integrated. Changeset: cbb2b847 Author: Daniel Lund?n URL: https://git.openjdk.org/jdk/commit/cbb2b847e48c970297c2142a0675918b364e7987 Stats: 6 lines in 1 file changed: 3 ins; 1 del; 2 mod 8344130: C2: Avoid excessive hoisting in scheduler due to minuscule differences in block frequency Reviewed-by: rcastanedalo, kvn ------------- PR: https://git.openjdk.org/jdk/pull/22810 From fyang at openjdk.org Tue Jan 14 10:55:03 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 Jan 2025 10:55:03 GMT Subject: RFR: 8347489: RISC-V: Misaligned memory access with COH [v3] In-Reply-To: References: Message-ID: > Hi, please consider this change. > > We have different base_offset (4 bytes instead of 8 bytes aligned) with COH. This causes misaligned memory accesses for several instrinsics like String.Compare or String.Equals. The reason is that we assume 8-byte alignment and process one 8-byte word starting at the first array element for each iteration in the main loop. As a result, we have performance regressions on platforms with slow misaligned memory accesses like Unmatched and Premier P550 SBCs. > PS: Same issue is there when `UseCompressedClassPointers` is disabled. > > Correctness test on linux-riscv64: > - [x] tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders") (release) > - [x] tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:-UseCompactObjectHeaders") (release) > - [x] hotspot:tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders") (fastdebug) > - [x] hotspot:tier1 (TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:-UseCompactObjectHeaders") (fastdebug) > > Performance test on Premier P550 (-XX:+UseParallelGC -XX:+AlwaysPreTouch -Xms8g -Xmx8g): > > 1. SPECjbb2005 Score Without Patch > 1.1 -XX:-UseCompactObjectHeaders: 32666 > 1.2 -XX:+UseCompactObjectHeaders: 27610 > > 2. SPECjbb2005 Score With Patch > 2.1 -XX:-UseCompactObjectHeaders: 32820 > 2.2 -XX:+UseCompactObjectHeaders: 34179 Fei Yang has updated the pull request incrementally with one additional commit since the last revision: Comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23053/files - new: https://git.openjdk.org/jdk/pull/23053/files/23d9add9..caa2d488 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23053&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23053&range=01-02 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23053.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23053/head:pull/23053 PR: https://git.openjdk.org/jdk/pull/23053 From fyang at openjdk.org Tue Jan 14 10:55:04 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 14 Jan 2025 10:55:04 GMT Subject: RFR: 8347489: RISC-V: Misaligned memory access with COH [v2] In-Reply-To: References: Message-ID: On Tue, 14 Jan 2025 09:56:18 GMT, Vladimir Kempik wrote: >> Fei Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix assertions > > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4466: > >> 4464: >> 4465: if (multi_block) { >> 4466: __ subi(consts, consts, vset_sew == Assembler::e32 ? 240 : 608); > > makes the code less readable, maybe keep total_adds but keep the change of addi to subi ? Sure. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23053#discussion_r1914619459 From chagedorn at openjdk.org Tue Jan 14 10:58:39 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 14 Jan 2025 10:58:39 GMT Subject: RFR: 8347554: [BACKOUT] C2: implement optimization for series of Add of unique value In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 15:27:26 GMT, Kangcheng Xu wrote: >> The Java Fuzzer found a wrong execution which could be traced back to [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) which went into JDK 24. Since we are close to RDP 2, a backout seems the safest option for JDK 24. A REDO can be done for JDK 25 @tabjy. >> >> Details about the wrong execution and how to trigger it can be found in the JBS description of this backout. >> >> The backout applied cleanly. >> >> Thanks, >> Christian > > I'll look into this. Sorry for having you to roll back the changes! No worries @tabjy! :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/23066#issuecomment-2589604868 From qamai at openjdk.org Tue Jan 14 11:06:52 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 14 Jan 2025 11:06:52 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v6] In-Reply-To: References: Message-ID: On Tue, 14 Jan 2025 06:45:41 GMT, Xiaohong Gong wrote: >> Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: >> >> Implement apply_identity > > src/hotspot/share/opto/phaseX.cpp line 2301: > >> 2299: // that may undo the changes done during lowering. >> 2300: >> 2301: return k->LoweredIdeal(this); > > I'm sorry that I still cannot understand well what this method is expected to do for a node. For example, if we need to add some architecture specific optimization for `MulNode` like AArch64, we can add the lowering code in `lower_node_platform` for AArch64, right? Do we also need to override the `LoweredIdeal()` for `MulNode` ? Thanks! `lower_node_transform` transforms a node that should not appear in matching to something that can appear there while `LoweredIdeal` transforms a node that may appear in matching to another based on the pattern of its input. For example, consider this Java code: Int256Vector v1; Int256Vector v2 = v1.withLane(4, x); Int256Vector v3 = v2.withLane(5, y); Before lowering we would have (pseudocode for the graph): vector v1; vector v2 = VectorInsert(v1, x, 4); vector v3 = VectorInsert(v2, y, 5); x86 does not know how to insert to a 256-bit vector, so we need to extract the 128-bit lane, insert the element into the lane, then insert the lane into the original vector. Currently, this is done during code emission, suppose we want to do so during lowering, we will have this: vector v1; // [a, b, c, d, e, f, g, h] vector v4 = ExtractVector(v1, 1); // [e, f, g, h] vector v5 = VectorInsert(v4, x, 0); // [x, f, g, h] vector v2 = VectorInsert(v1, v5, 1); // [a, b, c, d, x, f, g, h] vector v6 = ExtractVector(v2, 1); // [x, f, g, h] vector v7 = VectorInsert(v6, y, 1); // [x, y, g, h] vector v3 = VectorInsert(v2, v7, 1); // [a, b, c, d, x, y, g, h] Now using `Identity` we may be able to ensure that `v6 == v5`, this leaves us with: vector v1; // [a, b, c, d, e, f, g, h] vector v4 = ExtractVector(v1, 1); // [e, f, g, h] vector v5 = VectorInsert(v4, x, 0); // [x, f, g, h] vector v2 = VectorInsert(v1, v5, 1); // [a, b, c, d, x, f, g, h] vector v7 = VectorInsert(v5, y, 1); // [x, y, g, h] vector v3 = VectorInsert(v2, v7, 1); // [a, b, c, d, x, y, g, h] Ideally, we would want to transform `v3` into `VectorInsert(v1, v7, 1)` because then we can elide `v2`. This can be done using `LoweredIdeal`. So to your question, I think `LoweredIdeal` would be a better choice, this aligns pretty well with our current method of doing it in `Ideal`, too. > src/hotspot/share/opto/phaseX.cpp line 2310: > >> 2308: Node* PhaseLowering::lower_node(Node* n) { >> 2309: // Apply shared lowering transforms >> 2310: > > Per my understanding, this is a backend specific lowering phase, is there any scenario that a platform in-dependent lowering is needed here? As we already have the common GVN phase for common node idealize, is there any difference for such shared transformations? Thanks! Lowering is not idealisation so I think having backend independent lowering is fine in case we need it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1914633526 PR Review Comment: https://git.openjdk.org/jdk/pull/21599#discussion_r1914634733 From epeter at openjdk.org Tue Jan 14 11:58:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Jan 2025 11:58:41 GMT Subject: RFR: 8335747: C2: fix overflow case for LoopLimit with constant inputs In-Reply-To: References: Message-ID: <0cn00I7riM8zzrsAUK6GihWPQ9j4NajBUnTA_6q5bf0=.09e73ab2-66b0-4459-b752-b52b4fbc6f52@github.com> On Fri, 10 Jan 2025 06:20:08 GMT, Emanuel Peter wrote: > `LoopLimitNode::Value` tries to constant-fold when it has constant inputs. However, there can be an overflow in the int-computation, but we check for it with `if (final_con == (jlong)final_int) {` and do not constant fold in that case. > > However, there was an `assert` that checked that such an overflow would never be encountered. We already had to make an exception for this assert during PhaseCCP with [JDK-8309266](https://bugs.openjdk.org/browse/JDK-8309266). > > Why did we not hit this assert before? > `LoopLimitNode` needs to have constant inputs. We used to assume that if the constants would lead to an overflow, then the loop-limit-check would also get similar constants, and detect that `limit <= max_int-stride` does not hold, and it would constant-fold away the loop, together with the `LoopLimitNode`. > > But now we found a second case: > https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/src/hotspot/share/opto/loopnode.cpp#L2555-L2563 > > In `PhaseIdealLoop::split_thru_phi`, we temporarily split the `LoopLimitNode` through the phi, generating a new `LoopLimitNode` for each branch of the `phi`. We then call `Value` on it to see if that leads us to constant fold one of the branches, which would be considered a "win". > https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/src/hotspot/share/opto/loopopts.cpp#L87-L105 > > In the regression test, we have this example: > https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/test/hotspot/jtreg/compiler/loopopts/TestLoopLimitOverflowDuringSplitThruPhi.java#L44-L69 > > We generate a temporary clone of `LoopLimitNode(init=0, limit=x, stride=4)` (would not constant fold because of variable `x = Phi(1000, 2147483647)`), which happens to be `LoopLimitNode(init=0, limit=2147483647, stride=4)`. We evaluate `Value` on this temporary clone, and hit the overflow case. > > Why is it ok to just remove the assert and allow `LoopLimitNode` to overflow? > We still have the loop limit check, which checks that `limit <= max_int-stride`, and this means we would never enter the loop if we took the `Phi` branch that led to the overflow. > > I could not just remove the assert, because in `LoopLimitNode::Ideal` we have this (strange?) check that does not optimize the `LoopLimitNode` if the inputs are constants: > > if (in(Init)->is_Con() && in(Limit)->is_Con()) > return nullptr; // Value > > The assumption seems to be that we want `Value` to do the constant folding here - but of course we di... I filed https://bugs.openjdk.org/browse/JDK-8347701 ------------- PR Comment: https://git.openjdk.org/jdk/pull/23024#issuecomment-2589720218 From epeter at openjdk.org Tue Jan 14 11:58:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Jan 2025 11:58:43 GMT Subject: Integrated: 8335747: C2: fix overflow case for LoopLimit with constant inputs In-Reply-To: References: Message-ID: On Fri, 10 Jan 2025 06:20:08 GMT, Emanuel Peter wrote: > `LoopLimitNode::Value` tries to constant-fold when it has constant inputs. However, there can be an overflow in the int-computation, but we check for it with `if (final_con == (jlong)final_int) {` and do not constant fold in that case. > > However, there was an `assert` that checked that such an overflow would never be encountered. We already had to make an exception for this assert during PhaseCCP with [JDK-8309266](https://bugs.openjdk.org/browse/JDK-8309266). > > Why did we not hit this assert before? > `LoopLimitNode` needs to have constant inputs. We used to assume that if the constants would lead to an overflow, then the loop-limit-check would also get similar constants, and detect that `limit <= max_int-stride` does not hold, and it would constant-fold away the loop, together with the `LoopLimitNode`. > > But now we found a second case: > https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/src/hotspot/share/opto/loopnode.cpp#L2555-L2563 > > In `PhaseIdealLoop::split_thru_phi`, we temporarily split the `LoopLimitNode` through the phi, generating a new `LoopLimitNode` for each branch of the `phi`. We then call `Value` on it to see if that leads us to constant fold one of the branches, which would be considered a "win". > https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/src/hotspot/share/opto/loopopts.cpp#L87-L105 > > In the regression test, we have this example: > https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/test/hotspot/jtreg/compiler/loopopts/TestLoopLimitOverflowDuringSplitThruPhi.java#L44-L69 > > We generate a temporary clone of `LoopLimitNode(init=0, limit=x, stride=4)` (would not constant fold because of variable `x = Phi(1000, 2147483647)`), which happens to be `LoopLimitNode(init=0, limit=2147483647, stride=4)`. We evaluate `Value` on this temporary clone, and hit the overflow case. > > Why is it ok to just remove the assert and allow `LoopLimitNode` to overflow? > We still have the loop limit check, which checks that `limit <= max_int-stride`, and this means we would never enter the loop if we took the `Phi` branch that led to the overflow. > > I could not just remove the assert, because in `LoopLimitNode::Ideal` we have this (strange?) check that does not optimize the `LoopLimitNode` if the inputs are constants: > > if (in(Init)->is_Con() && in(Limit)->is_Con()) > return nullptr; // Value > > The assumption seems to be that we want `Value` to do the constant folding here - but of course we di... This pull request has now been integrated. Changeset: f0af830f Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/f0af830f850669af411a3893f783e4b9917ed318 Stats: 85 lines in 2 files changed: 76 ins; 3 del; 6 mod 8335747: C2: fix overflow case for LoopLimit with constant inputs Reviewed-by: kvn, qamai ------------- PR: https://git.openjdk.org/jdk/pull/23024 From epeter at openjdk.org Tue Jan 14 11:58:41 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 14 Jan 2025 11:58:41 GMT Subject: RFR: 8335747: C2: fix overflow case for LoopLimit with constant inputs In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 13:25:38 GMT, Quan Anh Mai wrote: >> `LoopLimitNode::Value` tries to constant-fold when it has constant inputs. However, there can be an overflow in the int-computation, but we check for it with `if (final_con == (jlong)final_int) {` and do not constant fold in that case. >> >> However, there was an `assert` that checked that such an overflow would never be encountered. We already had to make an exception for this assert during PhaseCCP with [JDK-8309266](https://bugs.openjdk.org/browse/JDK-8309266). >> >> Why did we not hit this assert before? >> `LoopLimitNode` needs to have constant inputs. We used to assume that if the constants would lead to an overflow, then the loop-limit-check would also get similar constants, and detect that `limit <= max_int-stride` does not hold, and it would constant-fold away the loop, together with the `LoopLimitNode`. >> >> But now we found a second case: >> https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/src/hotspot/share/opto/loopnode.cpp#L2555-L2563 >> >> In `PhaseIdealLoop::split_thru_phi`, we temporarily split the `LoopLimitNode` through the phi, generating a new `LoopLimitNode` for each branch of the `phi`. We then call `Value` on it to see if that leads us to constant fold one of the branches, which would be considered a "win". >> https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/src/hotspot/share/opto/loopopts.cpp#L87-L105 >> >> In the regression test, we have this example: >> https://github.com/openjdk/jdk/blob/d93d1a3b58728f7978bbd5824b1bf6493b42561e/test/hotspot/jtreg/compiler/loopopts/TestLoopLimitOverflowDuringSplitThruPhi.java#L44-L69 >> >> We generate a temporary clone of `LoopLimitNode(init=0, limit=x, stride=4)` (would not constant fold because of variable `x = Phi(1000, 2147483647)`), which happens to be `LoopLimitNode(init=0, limit=2147483647, stride=4)`. We evaluate `Value` on this temporary clone, and hit the overflow case. >> >> Why is it ok to just remove the assert and allow `LoopLimitNode` to overflow? >> We still have the loop limit check, which checks that `limit <= max_int-stride`, and this means we would never enter the loop if we took the `Phi` branch that led to the overflow. >> >> I could not just remove the assert, because in `LoopLimitNode::Ideal` we have this (strange?) check that does not optimize the `LoopLimitNode` if the inputs are constants: >> >> if (in(Init)->is_Con() && in(Limit)->is_Con()) >> return nullptr; // Value >> >> The assumption seems to be that we want `Value`... > > You are right, but I believe the resulting expansion will constant fold regardless. So, why do we need to reject constant folding of the `LoopLimitNode` in the presence of overflow? Thanks @merykitty @vnkozlov for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23024#issuecomment-2589720728 From roland at openjdk.org Tue Jan 14 12:46:44 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 14 Jan 2025 12:46:44 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v8] In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 13:27:58 GMT, Quan Anh Mai wrote: > Can you inject the iteration count into the created loop so that it can avoid strip mining? Some loops that are not covered by this change would likely benefit from not being strip mined if possible. So I think it would be better to address that separately (and make sure it plays well with this change). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2589818895 From roland at openjdk.org Tue Jan 14 12:50:49 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 14 Jan 2025 12:50:49 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v8] In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 12:44:25 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: > > - refactor > - Merge branch 'master' into JDK-8342692 > - Merge branch 'master' into JDK-8342692 > - Merge branch 'master' into JDK-8342692 > - Merge branch 'master' into JDK-8342692 > - review > - reviews > - Update src/hotspot/share/opto/loopTransform.cpp > > Co-authored-by: Emanuel Peter > - Merge branch 'master' into JDK-8342692 > - whitespaces > - ... and 19 more: https://git.openjdk.org/jdk/compare/3b9732ed...0f137359 There are some failures with `compiler/loopopts/superword/TestMemorySegment.java` in the test runs. Given https://github.com/openjdk/jdk/pull/21926 makes some changes that affect that test, I think it's easier to ignore those failures for now and revisit once 21926 integrates. AFAICT `TestMemorySegment.java` will need some small adjustments then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2589828361 From jbhateja at openjdk.org Tue Jan 14 13:09:45 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 14 Jan 2025 13:09:45 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v9] In-Reply-To: References: <_SCKY9fuTqNDfR6K1y-FuMvursDMuOx39sKrXMj0Tdg=.225da2f1-fcdc-4418-a753-6d7404b4a83e@github.com> Message-ID: On Mon, 13 Jan 2025 16:51:02 GMT, Paul Sandoz wrote: >> Hi @PaulSandoz , In the current scheme we are passing unboxed carriers to intrinsic entry point, in the fallback implementation carrier type is first converted to floating point value using Float.float16ToFloat API which expects to receive a short type argument, after the operation we again convert float value to carrier type (short) using Float.floatToFloat16 API which expects a float argument, thus our intent here is to perform unboxing and boxing outside the intrinsic thereby avoiding all complexities around boxing by compiler. Even if we pass 3 additional parameters we still need to use Float16.floatValue which invokes Float.float16ToFloat underneath, thus this minor modification on Java side is on account of optimizing the intrinsic interface. > > Yes, i understand the approach. It's about clarity of the fallback implementation retaining what was expressed in the original code: > > short res = Float16Math.fma(fa, fb, fc, a, b, c, > (a_, b_, c_) -> { > double product = (double)(a_.floatValue() * b._floatValue()); > return valueOf(product + c_.doubleValue()); > }); Hi @PaulSandoz , In above code snippet the return type 'short' of intrinsic call does not comply with the value being returned which is of box type, thereby mandating addition glue code. Regular primitive type boxing APIs are lazily intrinsified, thereby generating an intrinsifiable Call IR during parsing. LoadNode?s idealization can fetch a boxed value from the input of boxing call IR and directly forward it to users. Q1. What is the problem in directly passing Float16 boxes to FMA and SQRT intrinsic entry points? A. The compiler will have to unbox them before the actual operation. There are multiple schemes to perform unboxing, such as name-based, offset-based, and index-based field lookup. Vector API unbox expansion uses an offset-based payload field lookup, for this it bookkeeps the payload?s offset over runtime representation of VectorPayload class created as part of VM initialization. However, VM can only bookkeep this information for classes that are part of java.base module, Float16 being part of incubation module cannot use offset-based field lookup. Thus only viable alternative is to unbox using field name/index based lookup. For this compiler will first verify that the incoming oop is of Float16 type and then use a hardcoded name-based lookup to Load the field value. This looks fragile as it establishes an unwanted dependency b/w Float16 field names and compiler implementation, same applies to index-based lookup as index values are dependent onthe combined field count of class and instance-specific fields, thus any addition or deletion of a class-level static helper field before the field of interest can invalidate any hardcoded index value used by the compiler. All in all, for safe and reliable unboxing by compiler, it's necessary to create an upfront VM representation like vector_VectorPayload. Q2. What are the pros and cons of passing both the unboxed value and boxed values to the intrinsic entry point? A. Pros: - This will save unsafe unboxing implementation if the holder class is not part of java.base module. - We can leverage existing box intrinsification infrastructure which directly forwards the embedded values to its users. - Also, it will minimize the changes in the Java side implementation. Cons: - It's suboptimal in case the call is neither intrinsified or inlined, as it will add additional spills before the call. Q3. Primitive box class boxing API ?valueOf? accepts an argument of the corresponding primitive type. How different are Float16 boxing APIs. A. Unlike primitive box classes, Float16 has multiple boxing APIs and none of them accept a short type argument. public static Float16 valueOf(int value) public static Float16 valueOf(long value) public static Float16 valueOf(float f) public static Float16 valueOf(double d) public static Float16 valueOf(String s) throws NumberFormatException public static Float16 valueOf(BigDecimal v) public static Float16 valueOf(BigInteger bi) Thus, we need to add special handling to first downcast the parameter value to short type carrier otherwise it will pose problems in forwarding the boxed values. Existing LoadNode idealization directly forwards the input of unboxed Call IR to its users. To use existing idealization, we need to massage the input of unboxed Call IR to the exact carrier size, so it?s not a meager one-line change in the following methods to enable seamless intrinsification of Float16 boxing APIs. bool ciMethod::is_boxing_method() const bool ciMethod::is_unboxing_method() const Given the above observations passing 3 additional box arguments to intrinsic and returning a box value needs additional changes in the compiler while minor re-structuring in Java implementation packed with in the glue logic looks like a reasonable approach. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1914782512 From tweidmann at openjdk.org Tue Jan 14 14:30:08 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Tue, 14 Jan 2025 14:30:08 GMT Subject: RFR: 8325030: PhaseMacroExpand::value_from_mem_phi assert with "unknown node on this path" Message-ID: The following code triggers an assert: class Test { static class A { int a; } public static void main(String[] strArr) { int i2 = 0; for (int i = 0; i < 50; ++i) try { synchronized (new A()) { synchronized (Test.class) { for (int var19 = 0; var19 < Integer.valueOf(i2);) { Integer.valueOf(var19); } } } for (int var8 = 0; var8 < 10000; ++var8) ; } catch (ArithmeticException a_e) { } } } # Internal Error (/Users/theo/jdk/open/src/hotspot/share/opto/macro.cpp:435), pid=37385, tid=27651 # assert(false) failed: unknown node on this path when run with `-XX:-ProfileExceptionHandlers`. With different flag configurations this can be reproduced back as far as JDK 11. (See issue in JBS.) The issue is caused during OSR compilation in macro expansion (`PhaseMacroExpand::eliminate_macro_nodes`): During `eliminate_macro_nodes`, the boxing call `Integer.valueOf(var19)` (which can be seen in the graph below as node 359) is eliminated: Screenshot 2025-01-14 at 14 56 27 Note that on the catch path we have MemBarReleaseLock node (462) whose control and memory input become top after 359 CallStaticJava is eliminated. This elimination is correct per se. `PhaseMacroExpand::eliminate_macro_nodes` will then continue with the next macro node in its list, the allocation `new A()`, which it tries to eliminate using scalar replacement. Note that IGVN does not run in-between this macro elimination attempts. During scalar replacement, while finding the values for each field and walking along the memory edges, 462 MemBarReleaseLock, whose memory input is still top, is hit in `PhaseMacroExpand::value_from_mem_phi`. `PhaseMacroExpand::value_from_mem_phi` does not expect top and the assert triggers. A naive solution for this problem would be to run IGVN in between each macro elimination attempt in order to ensure the entire catch path with 462 MemBarReleaseLock is removed. This does indeed solve the problem but the performance impact of running IGVN in between each macro elimination is unclear. Instead, we observe that `PhaseMacroExpand::value_from_mem_phi` tries to convert a memory phi to a value phi node. It is fine for a PhiNode to have top as an input, so our fix is to simply to adding top as the input to the phi node in the case it is found in `PhaseMacroExpand::value_from_mem_phi`. After all macro eliminations another IGVN run occurs. In this run, the catch path with 462 MemBarReleaseLock dies and the Top propagates down into the Region node belonging to the phi node into which we stored top. Through idealization this path (and thus input) is removed from the RegionNode and PhiNode. ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/23104/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23104&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325030 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23104.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23104/head:pull/23104 PR: https://git.openjdk.org/jdk/pull/23104 From chagedorn at openjdk.org Tue Jan 14 15:23:55 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 14 Jan 2025 15:23:55 GMT Subject: [jdk24] RFR: 8347554: [BACKOUT] C2: implement optimization for series of Add of unique value In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 15:12:07 GMT, Christian Hagedorn wrote: > Hi all, > > This pull request contains a backport of commit [062f2dcf](https://github.com/openjdk/jdk/commit/062f2dcfe5b62cc3dd3c292eeebd7a7ac78f849a) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Christian Hagedorn on 13 Jan 2025 and was reviewed by Tobias Hartmann. > > Thanks! Thanks Vladimir and Tobias for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23077#issuecomment-2590222775 From chagedorn at openjdk.org Tue Jan 14 15:23:56 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 14 Jan 2025 15:23:56 GMT Subject: [jdk24] Integrated: 8347554: [BACKOUT] C2: implement optimization for series of Add of unique value In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 15:12:07 GMT, Christian Hagedorn wrote: > Hi all, > > This pull request contains a backport of commit [062f2dcf](https://github.com/openjdk/jdk/commit/062f2dcfe5b62cc3dd3c292eeebd7a7ac78f849a) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Christian Hagedorn on 13 Jan 2025 and was reviewed by Tobias Hartmann. > > Thanks! This pull request has now been integrated. Changeset: f42e2c10 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/f42e2c10c6f3e0f17bf9ada10dea0b98a29a39c8 Stats: 414 lines in 3 files changed: 0 ins; 414 del; 0 mod 8347554: [BACKOUT] C2: implement optimization for series of Add of unique value Reviewed-by: kvn, thartmann Backport-of: 062f2dcfe5b62cc3dd3c292eeebd7a7ac78f849a ------------- PR: https://git.openjdk.org/jdk/pull/23077 From tweidmann at openjdk.org Tue Jan 14 15:35:07 2025 From: tweidmann at openjdk.org (Theo Weidmann) Date: Tue, 14 Jan 2025 15:35:07 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v17] In-Reply-To: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> References: <9KAt8WCvEM-gqrqLzmAq63yolr4hu_hghU4q-9Uf1I4=.24ed2501-7e3f-46fd-a4bc-1d6fe448a17a@github.com> Message-ID: > This PR is a refactoring and partial rewrite of https://github.com/openjdk/jdk/pull/22716 by @eme64. The goals remain the same: > >> For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. >> >> I would like to start a collection of such generators, that can then be used in testing. >> >> The goal is to grow this collection in the future, and add new types. For example byte, char, short, or even Float16. >> >> This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. >> >> Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 > > The refactoring makes use of generics, rendering the generators library more flexible by default, by allowing it work with arbitrary types (with special features for Comparable types), improving the composability of different generators and streamlining the client API for simplicity. This allows test authors to quickly compose their own distributions and generators if necessary. An overview of this functionality is provided in the `Generators` javadoc. Theo Weidmann has updated the pull request incrementally with one additional commit since the last revision: Remove unnecessary type postfixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22941/files - new: https://git.openjdk.org/jdk/pull/22941/files/9b05b639..a071500e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22941&range=15-16 Stats: 12 lines in 2 files changed: 0 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/22941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22941/head:pull/22941 PR: https://git.openjdk.org/jdk/pull/22941 From chagedorn at openjdk.org Tue Jan 14 16:24:43 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 14 Jan 2025 16:24:43 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation [v17] In-Reply-To: